Troubleshoot Control Systems Effectively
EP Editorial Staff | July 13, 2018
Refine your approach and get to solutions faster with three proven methods.
By Jay Steinman, Huffman Engineering
Troubleshooting skills can be of great value when working with and debugging control systems, but success can sometimes be elusive. When an issue occurs in a system, troubleshooting the problem could be as straightforward as following a simple trail of clues to a quick solution. Or, it could mean struggling through a jungle of obscure, intermittent issues, i.e., occurring only at certain times, that can leave even the best problem solver scratching his or her head. In either case, taking the following methodical approaches can ease the process.
FIRST THINGS FIRST
Remember that understanding is key to successful troubleshooting. Before digging into a control-system issue, you must first assess and categorize it. Try to reproduce it and note what steps are required to do so. While this task, in itself, can be difficult and time-consuming, an effective solution cannot be developed without a full understanding
of the problem.
When determining how to reproduce a problem, you will also want to understand how repeatable it is. For example, when the issue is reproduced, does it occur 100% of the time or only sometimes? If it repeats less than 100% of the time, perform some trials and note the frequency of the recurrence.
Intermittent control-system issues can be some of the most difficult problems to troubleshoot and correct. While they are still repeatable, the various factors responsible for that repeatability are either outside a troubleshooter’s control or extremely tough to identify. Regardless of the issues that a control system might present, the following three methods can help lead to solutions.
1. DIVIDE AND CONQUER
In a typical control system, multiple components are working together. When issues arise, they could be caused by any one of those components or the communications between them. If you are unsure of an issue’s root cause, devise a few tests to help divide the possibilities. Such tests might include replacing a sensor with a spare or borrowed part from another machine. If the issue persists, you would know that the sensor is likely functioning properly and, thus, proceed to look at the rest of the system. Another test might involve replacing a managed Ethernet switch with an unmanaged switch. If that resolves the issue, you could assume the managed switch is misconfigured and, therefore, rule out the rest of the system. The tests you perform will be unique to every system and issue, but results of a few good ones that divide the possible sources of problems can help to quickly isolate the root cause.
In some circumstances, it may be possible to simplify the system or follow the data. This is particularly helpful when debugging code or communications. To do this, simplify the system as much as possible by disconnecting devices and/or disabling code. Is the issue resolved? If so, start adding back devices or code slowly and checking each time. When the issue reappears, you will have isolated the cause. Alternatively, you may be able to follow the data or signals at each connection point to see if it is getting lost somewhere along the line.
2. TRIAL AND ERROR
Use of trial and error to solve a control-system problem can be very helpful. The best approach is to change only one variable at a time and observe the effect it has on the issue. When using this method, be sure to document the change and result with each trial. Otherwise, you could find yourself after, say, performing 15 trials, trying to remember the result of the third one—to no avail.
The trial-and-error method works best for issues that can be repeated rapidly. For slower issues, such as those that are time dependent, it’s possible to attempt several changes with each trial and record the result. Minimize the number of changes in each trial and make no more than two or three changes at a time. If a trial with multiple changes resolves the issue, then you can independently test the two or three changes to determine which is the root cause. Keep in mind when changing multiple variables at a time, they could be dependent upon each other and further complicate troubleshooting attempts.
3. LOG DATA
Computer log files are a good place to begin troubleshooting if a control-system issue involves any servers, Windows-based HMIs, or other devices that keep log files. Every Windows computer has a tool called Event Viewer that can be found in the Administrative Tools of the control panel. While the Event Viewer contains a substantial amount of information that is likely irrelevant to the issue, it’s necessary to sort and filter. Often the “System” and “Application” logs provide the most value. Certain applications may also maintain their own log files, which can provide invaluable troubleshooting insight.
Many PLCs also have the ability to log data points and create trend charts on the fly. This method is effective for any timing or PLC-code-related issues. Set up a trend to capture relevant I/O (input/output) points and internal PLC tags that are used in the logic, then reproduce the issue and see what the trend looks like. When using this method, pay attention to the rate at which data is being collected as it could alter results or provide a false view of the situation.
For intermittent issues, setting up some type of a data logger may also help to capture the issue when you’re not physically present. If the operators are able to observe the issue, have them note the exact time when the issue occurs so you can go back and examine the log files.
KEEP AN OPEN MIND
A methodical approach to troubleshooting can help reduce the time and effort it takes to determine the root cause of an issue and find a solution. Remember to invest time in understanding why the issue occurred and how the solution resolved it. That investment will help make you a better troubleshooter and prepare you to deal with similar issues in the future.
One final note: Keep an open mind. During troubleshooting, there may be an aspect of the issue that you initially misunderstood and, in turn, could prevent you from finding a solution. When you feel that you are at the end of the road with no other options, take a step back and re-evaluate the issue from the beginning. Correcting a simple misunderstanding can make all the difference. EP
Jay Steinman is a mechanical engineer with control-systems integrator Huffman Engineering Inc., Lincoln, NE. He holds a Bachelor’s degree in Mechanical Engineering and a Master’s in Engineering Management from the Univ. of Nebraska (nebraska.edu). For information on a range of control-system issues, including troubleshooting, visit huffmaneng.com or email firstname.lastname@example.org.