The FMEA in a GAMP 5 Context

The primary risk assessment tool for GAMP 5 computerised system validation is the FMEA — Failure Mode and Effects Analysis. It is a bottom-up, systematic method: you enumerate potential failure modes, assess each one across three dimensions, and calculate a Risk Priority Number (RPN) that drives your validation testing strategy.

The governing standard for the risk methodology is ICH Q9(R1) — not ISO 14971, which applies to medical devices. The QLean Risk Management Plan (RMP-SYS-001) and Risk Assessment (RA-SYS-001) are both built around the ICH Q9(R1) framework, with FMEA as the primary analysis tool.

Understanding how to score a FMEA correctly is one of the most practically valuable skills an automation engineer can develop on a pharma project. A well-executed risk assessment is the justification for every testing decision you will make. A poorly executed one produces either over-testing (expensive, no added value) or under-testing (non-compliant, potentially dangerous).

RPN Formula

RPN = Severity (S) × Occurrence (O) × Detection (D)
Each dimension is scored 1–5. Minimum RPN = 1. Maximum RPN = 125. The scoring scales must be defined and agreed by the full project team before the FMEA workshop begins.

The Three Scoring Dimensions

Each failure mode is scored across three independent dimensions. Each score runs from 1 (best) to 5 (worst). The critical discipline is that all three dimensions are scored against current controls only — controls that already exist in the design at the time of scoring, not controls that are planned or aspirational.

Dimension 01
Severity
5
Catastrophic
Direct impact on patient safety or product quality. Release of non-conforming product to patient.
4
Critical
Regulatory non-compliance, product rejection, or significant GMP violation. No direct patient harm.
3
Major
Significant operational or quality impact. Production interrupted. Formal investigation required.
2
Minor
Limited impact. Workaround available. No product quality or data integrity impact.
1
Negligible
No operational, quality, or regulatory impact. Nuisance-level issue.
Dimension 02
Occurrence
5
Very High
≥1 in 10 process runs. Failure has occurred frequently in comparable systems.
4
High
∼1 in 100 process runs. Failure observed occasionally in similar systems.
3
Moderate
∼1 in 1,000 process runs. Rare; possible based on system complexity.
2
Low
∼1 in 10,000 process runs. Very unlikely; no known occurrences in comparable systems.
1
Very Low
≤1 in 100,000 process runs. Remote possibility; theoretical failure mode only.
Dimension 03
Detection
5
None
No detection mechanism exists. Failure discovered only at or after product impact.
4
Reactive
Detection occurs AFTER failure has impacted the process. Downstream testing or audit review.
3
Active
Detection concurrent with failure. SCADA alarm triggers when parameter goes out of range.
2
Proactive
Detection BEFORE impact. Pre-alarm, trend monitoring, or redundant measurement.
1
Preventive
Automatic prevention: system cannot proceed in failed state. Interlock or validated check.
Critical Rule on Detection Scoring

Detection is scored inversely to intuition — D=5 means worst detection (none), D=1 means best detection (automatic prevention). You may only credit a detection control that is explicitly designed into the system or defined in an approved procedure. A SCADA alarm that is planned but not yet designed is not a detection control at scoring time.

RPN Thresholds and What They Drive

The RPN alone means nothing without pre-defined acceptance thresholds. These thresholds must be agreed and locked in the Risk Management Plan before the FMEA workshop begins. Adjusting thresholds after scoring to reclassify a risk as acceptable is a GMP data integrity violation — and experienced auditors have seen it before.

RPN Range Risk Level Action Required Testing Priority
36–125 (or S=5) HIGH Unacceptable. Must be mitigated. Residual risk must be Medium or lower before system release. QA Manager approval required for any risk acceptance. 100% coverage mandatory — multiple test cases, worst-case and boundary conditions. Cannot be risk-accepted without QA Manager approval.
16–35 MEDIUM Evaluate for mitigation. Implement controls where feasible. Document rationale if accepted without mitigation. QA sign-off required on any risk-accepted items. High — test case required in OQ. Primary function + key boundary conditions.
5–15 LOW Accept with documented justification. Monitor during operation. Design review may substitute for test execution. Standard — included in test scope; sampling approach acceptable. Vendor documentation leveraged where appropriate.
1–4 VERY LOW Accept without further action. Record in Risk Register for completeness. No test case required. Documentation only — Risk Register entry with QA countersignature sufficient. No protocol test case required.
S=5 Override Rule

Any failure mode with Severity = 5 (Catastrophic) is automatically treated as HIGH risk and requires mitigation — regardless of the calculated RPN. A S=5 failure mode with excellent detection (D=1) and very low occurrence (O=1) produces RPN=5, which would normally be LOW. The override rule blocks that classification. This rule cannot be waived. It reflects the principle that some patient safety risks cannot be accepted on a probability argument alone.

How Risk Drives OQ Test Coverage

This is the mechanism that makes risk-based validation work. A HIGH function gets 100% mandatory test coverage — multiple test cases covering all boundary conditions and failure modes. A VERY LOW function requires only a Risk Register entry. The risk score is your formal justification for why one function has ten test cases and another has none. Without this documented justification, your testing strategy is arbitrary — and that is exactly what an inspector will say.

RPN HEATMAP — SEVERITY × OCCURRENCE × DETECTION (D=3 SHOWN) OCCURRENCE ↓ SEVERITY → S=1 S=2 S=3 S=4 S=5 O=5 15 30 45 60 75 O=4 12 24 36 48 60 O=3 9 18 27 36 45 O=2 6 12 18 24 30 O=1 3 6 9 12 15 LOW / VERY LOW MEDIUM / HIGH CRITICAL RPN = S × O × D. VALUES SHOWN FOR D=3 (ACTIVE DETECTION). CHANGE D TO 1 OR 5 TO SEE FULL RANGE IMPACT.
// RPN HEATMAP — S × O × D WITH D=3 (ACTIVE DETECTION). SAME S AND O SCORES PRODUCE VERY DIFFERENT RPNs DEPENDING ON DETECTION CAPABILITY. THIS IS WHY D IS THE MOST MANIPULATED DIMENSION IN WEAK RISK ASSESSMENTS.

A Worked Example

Take a real failure mode from a purified water SCADA system: “Conductivity alarm CT-101 fails to trigger when conductivity exceeds 1.3 µS/cm during production.”

The effect: out-of-specification water may be distributed to manufacturing points of use without operator intervention. Scoring this:

Initial RPN = 5 × 3 × 2 = 30 — MEDIUM by raw RPN. However, the S=5 Override Rule applies: any Severity=5 failure mode is automatically treated as HIGH regardless of RPN. This failure mode is therefore HIGH, and mandatory OQ test coverage applies.

After mitigation (redundant sensor with cross-validation alarm, enhanced calibration frequency): Residual S=5 (unchanged — the consequence of distributing contaminated water cannot be mitigated away), O reduces to 2, D improves to 1 (automatic divert interlock). Residual RPN = 5 × 2 × 1 = 10 — LOW. The S=5 override still applies, so the item remains HIGH-monitored, but with formally documented residual risk acceptance by QA.

Note on Severity

Severity reflects the worst credible impact if the failure mode occurs and is undetected. You cannot reduce Severity by implementing a detection control — Severity is an inherent property of the failure mode. A team that reduces Severity scores after adding alarms is producing an invalid risk assessment.

Five Mistakes That Invalidate a Risk Assessment

1. Scoring Detection on planned controls, not existing ones
Scores reflect what is in the design now. A planned alarm is not yet a control. Fix: only credit controls explicitly specified in the FDS or SDS at time of scoring.
2. Adjusting thresholds after scoring to reclassify risks
Moving the HIGH threshold from RPN 36 to RPN 50 after scoring to reclassify uncomfortable results is a GMP documentation integrity violation. Fix: lock thresholds in the RMP before the workshop. They require Change Control to modify.
3. Not rescoring after mitigation
Implementing a mitigation without rescoring leaves the residual risk undefined. The register shows only initial risk, not current risk. Fix: for every mitigation, restate S, O, D individually with written justification and recalculate residual RPN.
4. Scoring without multi-disciplinary consensus
Two engineers scoring the same failure mode independently will produce different RPNs. The risk assessment loses credibility. Fix: hold a structured FMEA workshop with the full project team — System Owner, Process Engineer, Automation Engineer, QA, IT/OT — and document the agreed reasoning for each score.
5. Not updating the risk assessment during the lifecycle
Completing the RA once and never revisiting it means the risk register does not reflect the current validated state. Fix: review and update the RA at every periodic review (PRR-SYS-001) and after any significant change (CCR-SYS-001).

Eight GMP Risk Categories for PLC/SCADA Systems

A thorough risk assessment covers all relevant failure mode categories. The RMP-SYS-001 defines eight categories that must be addressed for any GMP-relevant PLC/SCADA system:

In the QLean Framework

The QLean Risk Assessment (RA-SYS-001) is a pre-structured FMEA workbook with all eight risk categories, example failure modes for each, and scoring formulas built in. The Risk Management Plan (RMP-SYS-001) defines the scoring scales and thresholds shown in this article, ready to be agreed by the project team before the FMEA workshop. You are not starting from a blank spreadsheet.