Professional Practice
Skills
PPS-22: Troubleshooting
(Adapted
from Course Notes 4N4,
Pre-class assignment
What is It?
Troubleshooting is a specialized
form of the Six Step problem solving approach that is designed to help diagnose
problems. It is particularly well suited
to working with problems that arise in process industries and in failure
analyses.
Troubleshooting by an engineer is
analogous to differential diagnosis by a doctor. In both cases, the expert is called in to
deal with a problem or breakdown. In both
cases, determining the root cause is preferred to making the symptoms go
away. If the doc gives you aspirin for
shoulder pain, and you die of a heart attack, you are going to be upset. Well, you are going to be dead, but somebody else
might be upset.
New Concepts
Why Do It?
A number of
mechanical engineers are responsible for maintenance or production. For these engineers, deviations from normal
conditions, or outright breakdowns and failures are the important issues. Diagnosing the problem, or troubleshooting,
is the critical step before the problem can be solve.
Troubleshooting
skills are also important to homeowners, shade tree mechanics, or anyone who
regularly uses that most cranky of 21st century conveniences, the
personal computer.
How to Do It
Good
troubleshooters combine technique and knowledge. We are going to focus on technique in this
unit, since that is applicable to any troubleshooting problem. The technique is a fine-tuned version of our
old friend, the Six Step Method (Engage, Define, Explore, Plan, Do It, Look
Back).
This unit
lays out the Six Steps as they apply to trouble shooting. In-class exercises will be used to practice
specific skill. A future unit provides a
workshop opportunity to put it all together.
Engage
In this
step we will
When
presented with a problem on the production line (especially an expensive
production shutdown), you may feel a rapid increase in pulse and blood pressure
with accompanying onset of perspiration.
This fight or flight response may be useful or it may be harmful.
That
adrenaline is useful if you have a significant safety problem. If you have an overpressure boiler or a
hazardous chemical discharge, you may need to evacuate the building, call the
authorities and run like hell. If it is
a safety critical issue, you apply your emergency training, and start whatever
shutdown procedures you have learned.
If the
problem is not safety critical, (perhaps a drilling operation is leaving the
holes slightly undersized), then the adrenaline can be harmful. You need to resist the urge to do the first
thing you think of. Instead, apply your
stress reduction training, take a deep breath and remember this Six Step
Approach. Remind yourself that you can
get to the root cause if you just stay organized and on track.
Define
In this
step we will
The
definition of the
Like the
doctor you will need to do a thorough history and physical exam. Talk to the people involved, collect data on
what happened (especially the sequence of events), locate and save data
histories from computers or data loggers.
Examine the hardware and software involved.
The
You may
want to add cost-effective to the
list, especially for production situations.
For the example of the undersized holes, our goal may be to get machine
to make holes within specification by the next shift. For larger or more chronic problems, the time
frame could be much longer and the goal statement more complex.
In some
production problems, we may be able to shift to a temporary Safe State in which we know the process
will work, but the economics may not be ideal.
This could involve a reduction in speeds and feeds for a machining
operation, use of 100% inspection to assure no bad parts get to customers, or
substitution of a more expensive but more reliable material or tool.
This Safe State is a bridge between the Current State (not working right) and
the Ideal State (problem fixed), and
may keep the bosses off your back long enough to find the root cause.
Explore
In this
stage you
This stage
is a little harder to describe, since it relies heavily on your knowledge and
experience. It assumes you know the
process, have access to information sources, and have a good grasp of
engineering fundamentals.
To review
the fundamentals
To check
information
To review
trends and relevant changes
Plan
Now it is
time to
Use of a
spreadsheet type form (shown below) is of particular value in the plan stage.
|
Working Hypotheses |
Initial Evidence + =support X = eliminate - = neutral |
Diagnostic Action + =support X = eliminate - = neutral |
||||
|
|
a |
b |
c |
A |
B |
C |
|
Hypothesis
1 |
|
|
|
|
|
|
|
Hypothesis
2 |
|
|
|
|
|
|
We start by
brainstorming all the things that could cause the problem and list those as our
possible hypotheses. The big danger in
this section is to fall in love with the first or best hypothesis that you
have. Try to stay open-minded and let
the process lead you to the best hypothesis.
Each
significant piece of initial evidence (a temperature or visual markings on a
fracture surface) is given a column (lower case a, b, c) in the Initial Evidence category. For each of the hypotheses, you consider
whether that piece of information supports the hypothesis, eliminates the
hypothesis, or is neutral. This is
indicated by the appropriate mark in the cell that is the intersection of the
hypothesis row and the evidence column.
Once you
have considered the initial evidence, you probably will have more than one
viable hypothesis. This is the point at
which the doctor orders more tests and the patient may need to remove their
clothes and bend over. Fortunately, you
are the doctor in this case, so it is the workers in the plant who have to
worry.
Diagnostic
actions often involve changes to production lines, off-line test, or extra work
by fellow employees. Like medical tests
they may be expensive, time consuming, or annoying. And like medical tests they should only be
done if they are likely to tell us something that is worth the time and money.
The
following spreadsheet table helps us decide what tests to run and in what
sequence.
|
Diagnostic Action |
Price and Timing |
What will test tell us? |
||
|
Cost ($) |
Time (hr) |
Sequence |
||
|
A |
|
|
|
|
|
B |
|
|
|
|
|
C |
|
|
|
|
The
Diagnostic Actions A, B, C in this table are identical to the Diagnostic
Actions in the previous table. Here we
list each action and estimate the cost in dollars and time in hours to
accomplish the action. We also answer
the question What sill the test tell us? The answer to that question should be in
the form If test results are __, then
hypothesis 2 is eliminated and hypotheses 3 and 5 are supported. If the test doesn’t serve to eliminate or
strongly support a hypothesis, you may want to reconsider it.
So, what
about the Sequence column? Once you have
a list of diagnostic tests, you need to consider which to run first. You may pick the one with the most diagnostic
power, or you may want to go with cheap and easy. The sequence really depends upon the
situation. Remember that administrators
hate to see production lines shut down or modified, so there may be costs that
can’t be quantified in time and money.
You
continue to perform diagnostic actions until 1) there is only one hypothesis
standing, or 2) you have good enough supporting evidence to select a best
hypothesis and move on to implement a solution.
Like in science, you cannot prove a hypothesis to be true. You can only find a best hypothesis.
If you have
a couple of strong contenders, and no way to choose between them, you may use a
shotgun solution. In a shotgun solution,
you address both possible causes, and problem will be solved. You may never know which of the two was the
true cause, but at least the problem is solved.
If your car is not getting sufficient fuel to the cylinders and it is
the day before a long trip, you may go ahead and replace both the fuel filter
and the fuel pump without trying each I turn.
Do It
In this
step, you use your diagnosis or best hypothesis to implement a solution
(achieve the
You should
Look Back
In the
evaluation stage we ask ourselves
If we did
not fix the Root Cause, our solution may be a temporary fix while we address
other issues. Note that some Root Causes
may be problems for which you do not have ownership. Investigations of the Challenger and
Taking
stock of what you learned can help anchor the knowledge and convert the
incident into valuable experience. Look
at what went well and what you would do differently in the future.
Another way
to use what you learned is to try to prevent similar problems. This may be through
Learning Objectives
In-Class Exercise
Exercise 1
Form into groups of 2-4
For the situation (with defined
problem) presented by the instructor in class,
Exercise 2
Form into groups of 2-4
|
Working Hypotheses |
Initial Evidence + =support X = eliminate - = neutral |
|||||
|
|
a |
b |
c |
d |
e |
f |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Exercise 3
For the surviving hypotheses from
Exercise 2
|
Diagnostic Action |
Price and Timing |
What will test tell us? |
||
|
Cost ($) |
Time (hr) |
Sequence |
||
|
A |
|
|
|
|
|
B |
|
|
|
|
|
C |
|
|
|
|
In-class exercise, Triads
In this exercise, you will play one
of three roles, Troubleshooter, Observer, or Information Expert. There will be three sessions, so each person
will play each role once.
Troubleshooter’s Role
The troubleshooter will
To do this the troubleshooter will
Request for diagnostic action will
Information Expert’s Role
Before the session, the information
expert
During the session, the information
expert
Observer’s Role
On the ____ Form, the observer will
At the end of the session
After all three sessions, turn in
Feedback Form (long version)
Listener _______________________
1.
At
the outset of this unit, place a “B” in each category to indicate your self
assessment of your initial, or baseline skill level.
2.
At
the end of the unit place an “A” in each category to indicate your self
assessment of your skill level after practicing the skill. Be prepared to provide documentation for your
assessment.
|
Novice (less successful) |
Beginner (shows few expert behaviors) (1-2) |
Good Start (some expert behavior) (3-4) |
Getting There (many expert behaviors) (5-6) |
Almost There (mostly expert behavior) (7-8) |
Expert (shows all expert behavior) (9-10) |
Expert (more successful) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reflection of the Listener
What did I
learn from this?
Which of
the skills do I do pretty well? (List
Evidence)
Which skills could use some work?
(List Evidence)