FMEA and Functional FMEA as a generic analysis method

FMEA and Functional FMEA
A generic approach
Tor Stålhane, NTNU, IDI
FMEA problems
Trammel and Davis:
• Have to go through the FMEA process with a lot of
unimportant failure modes.
• Component-by-component review takes a considerable
amount of time => teams get frustrated since most
components assessed had minimal or no impact on the
system
Kmenta and Ishii:
• Performed too late, FMEA does not affect key product /
process decisions
• FMEA does not capture key failures
• FMEA is often an afterthought “checklist exercise”
• The process is tedious
• The Risk Priority Number gives a distorted measure of risk.
Our solution – a generic approach
Generic failure modes instead of component specific
failure modes to obtain:
• A consistent, sufficient and uniform set of failure
modes.
• Simplified FMEA, using only a sufficient level of details.
• Not having to invent failure modes => free analyst to
focus on failure causes, consequences and prevention.
Generic failure modes => reuse of
• Results from components and component assemblies.
• Actions which are used to mitigate the failure modes.
Using generic failure modes
Be ware: Generic failure modes
• Are not a replacement for using you head
• Are most useful in the early stages where we
still have a lot of choices when it come to
– architecture
– barrier solutions
• Should be used as guide words in the analysis
Generic failure modes - CESAR
Component type
Software systems - control system,
e.g., a PLC
Hardware component, e.g. a pump
or a sensor
Failure mode
Omission – something is not done, no
action
Commission – something more is
done
Wrong action
Too late – right action but too late
No action
Wrong action
Generic failure modes - NRC
ID
Failure mode
A1 Fail to perform the
function at the
required time
A2 Fail to perform the
function with correct
value
A3 Performance of an
unwanted function
A4 Interference or
unexpected coupling
with another module
Elaboration
Deviation from
requirement in time
domain
Deviation from
requirement in value
domain
Deviation from
expected performance
Deviation from
expected system
performance due to
module interaction
Remarks
Omission, No action,
No output, Reacts too
late
Wrong output
Commission, Wrong
action
Commission
Harrison’s human failure modes
No
1
2
3
4
5
6
Failure mode
Errors of omission
Comments
Something that should have been done is
not done
Errors of commission Something more has been done is not done
Errors of sequence Actions has been done in a wrong sequence
Errors of repetition An actions has been done too many / few
times
Too much / little
Too much or too little has been done
Too early / late /
An action was done too early / too late / for
long
too long
Enables the inclusion of operators in the safety analysis
Simple FMEA – early phases
Failure effect
on the next Recommendations
Unit description
Failure mode Failure cause
level
Failure description
FMEA in the early phases of product development has
much in common with Preliminary Hazard Analysis – PHA
“Recommendations” should be used for
• Possible barriers
• Design changes
• Requirements changes
Functional FMEA
Function
Functional
Effects
failure mode
Function description
Current
detection
Cause
method
Comments
Generic functional failure modes – should be used as
guidewords:
• Over
• Under
• No
• Intermittent
• Unintended
FMEA vs. Functional FMEA – 1
We can use
• FMEA when focusing on the components’
inner working
• Functional FMEA when focusing on the
functionality that a component provides to its
environment - usually parameter values
FMEA vs. Functional FMEA – 2
FMEA for each control unit. Functional FMEA for the
boiler, which provides the real value / service to the users
7
3
8
6
1
2
8
4
5
Extended FMEA – Trammell and Davis
Project title
FMEA type
Design
System
Group
Prepared by
Core team
Requirement
Failure mode
Effect of
failure
SEV
Cause or
mechanism
OCC
Current design
and control
DET
RPN
Recommendations
SEV
OCC
• SEV: severity of failure mode if it occurs
• OCC: rate of occurrence of the failure mode for
• Current design
• After design changes
• DET: rate of detection – how often do we detect this failure mode if it occurs.
• RPN: SEV * P(event | not detected) * P(not detected)
DET
RPN
Why generic fault trees
Several organizations have developed generic fault
trees for systems in their domains.
A fault tree
• do not identify how a failure propagates through
a system to cause an accident
• can be used to see which failures can cause the
top event in the fault tree.
An FMEA can show whether the root cause events
can occur for the system under consideration.
Generic fault tree – water supply failure
Generic fault tree – blowout
Combining FMEA and FTA
Generic hazards
Murphy’s law:
“If something can go wrong, it will”.
However: What can go wrong, depends on the
system’s operating environment.
A generic hazard list is a list of hazards that are
possible in a certain domain or environment
Simple hazard lists
Hazard list item
Task description / location
Reference no.
Hazard description
Date last
reviewed
Risk Assessment
Matrix score
Initial
Controls / Barriers
Reference Owner
Status
Current
Grow your own hazard list
You should start with a list of the domain-specific
generic hazards.
Add items identified from brainstorming and
experience from:
• Developers
• Maintenance personnel
• Users
Remember to include all barriers – also those that
are just planned for.
From hazard to catastrophe
To analyse a potentially dangerous event we
need to consider
• Where will it hurt us – e.g. people, assets,
environment or reputation
• How likely is it that it will hurt us – the
accident likelihood
• How can we prevent the accident – barriers
and controllability
Likelihood
Reputation
Environment
Assets
People
Severity
Consequences
0
No injury or
No damage
health effect
No effect
No impact
1
Slight injury
or health
effect
Slight
damage
Slight effect
Slight impact
2
Minor injury
Minor
or health
damage
effect
Minor effect
Minor
impact
3
Major injury
Moderate
or health
damage
effect
Moderate
effect
Moderate
impact
4
PTD or more
Major
than 3
damage
fatalities
Major effect
Major
impact
5
More than 3 Massive
fatalities
damage
Massive
effect
Massive
impact
A
Never heard
of in the
industry
B
Heard of in
the industry
C
D
Has
happened in
the
organization
or more than
once per year
in the
industry
Has
happened at
the location
or more than
once per year
in the
organization
E
Has
happened
more than
one per year
at the
location
Controls / barriers
Controls and barriers are used actively in some
risk assessment models – e.g. ISO 26262.
All hazard list items should include all barriers
that are
• Tried – been used before
• Possible – known from literature, other
industries etc.
Controls, barriers and the FMEA results
• Prevention. Change the design or
implementation to remove or reduce the
probability of the failure’s occurrence =>
preventing a risk (potential problem) from
becoming a real problem
• Handling. Prevent the failure’s consequences
• Reduction. Reduce or control the failure’s
consequences.
Barriers
Failure propagation
From failure to failure mode
Cause – consequence diagram
Sensor
error
Wrong
temperature
Increase
heat
Too high
pressure
Too hot
vessel
Wrong
command
Safety valve OK
Sensor s.a. low
N
Y
Combustibles
N
Y
Y
Explosion
CU
EQ
EN
Fire
N
Input Focused FMEA – IF-FMEA
Component ID
Output failure Description
mode
FM1
Description
of FM1
FM2
…
List of input sources
Component
Component
input deviation malfunction
Input deviations Component
that can cause failures that can
FM1
cause FM1
…
…
Input
source 1
Component
Input
source 2
Input
source 3
Output
l
Temp.
sensor
Temp.
controller
Pressure
sensor
Component: Temp. controller
Output failure
Description
mode
On / Off signal
to heating unit
Generic functional failure
modes – should be used
as guidewords:
• Over
• Under
• No
Input sources: Temp. sensor, Pressure sensor
Input deviation Component
l
malfunction
Omission: fail to
update based on
The controller
sensor signals
should have kept
Commission:
Turn on when
inactive (no
Under – too low
something not
should have kept change needed) from temp and/or
related to temp or
or should have
pressure
off
pressure input is
turned the heat
done instead
on
Wrong: controller
sends wrong signal
…
…
…
…
Example – initial hospital system
X-ray machine
PC
Network
X-ray
Data
base
FMEA with generic failure modes
Unit description
PC and medic
X-ray machine
and operator
Failure description
Failure mode Failure cause
Wrong action
No action
Wrong action
No action
Network
Wrong action
No action
X-ray database
Wrong action
No action
Failure effect on
the next level
Get wrong X-ray
Get no info
X-ray linked to
wrong person
Don’t work at all
Recommendations
Bar-code reader
for patient ID
Reliability
requirements
Sends info to
Reliability
wrong address
requirements
Reliability
No traffic
requirements
Local, temporary
database
Returns wrong
Reliability
info
requirements
No info returned Mirror database
Example – final hospital system
X-ray machine
PC
Network
Temp
Data
base
Controller
X-ray
Data
base
Mirror
Data
base
Bar
code
reader