The FMEA in a GAMP 5 Context
The primary risk assessment tool for GAMP 5 computerised system validation is the FMEA — Failure Mode and Effects Analysis. It is a bottom-up, systematic method: you enumerate potential failure modes, assess each one across three dimensions, and calculate a Risk Priority Number (RPN) that drives your validation testing strategy.
The governing standard for the risk methodology is ICH Q9(R1) — not ISO 14971, which applies to medical devices. The QLean Risk Management Plan (RMP-SYS-001) and Risk Assessment (RA-SYS-001) are both built around the ICH Q9(R1) framework, with FMEA as the primary analysis tool.
Understanding how to score a FMEA correctly is one of the most practically valuable skills an automation engineer can develop on a pharma project. A well-executed risk assessment is the justification for every testing decision you will make. A poorly executed one produces either over-testing (expensive, no added value) or under-testing (non-compliant, potentially dangerous).
RPN = Severity (S) × Occurrence (O) × Detection (D)
Each dimension is scored 1–5. Minimum RPN = 1. Maximum RPN = 125. The scoring scales must be defined and agreed by the full project team before the FMEA workshop begins.
The Three Scoring Dimensions
Each failure mode is scored across three independent dimensions. Each score runs from 1 (best) to 5 (worst). The critical discipline is that all three dimensions are scored against current controls only — controls that already exist in the design at the time of scoring, not controls that are planned or aspirational.
Detection is scored inversely to intuition — D=5 means worst detection (none), D=1 means best detection (automatic prevention). You may only credit a detection control that is explicitly designed into the system or defined in an approved procedure. A SCADA alarm that is planned but not yet designed is not a detection control at scoring time.
RPN Thresholds and What They Drive
The RPN alone means nothing without pre-defined acceptance thresholds. These thresholds must be agreed and locked in the Risk Management Plan before the FMEA workshop begins. Adjusting thresholds after scoring to reclassify a risk as acceptable is a GMP data integrity violation — and experienced auditors have seen it before.
| RPN Range | Risk Level | Action Required | Testing Priority |
|---|---|---|---|
| 36–125 (or S=5) | HIGH | Unacceptable. Must be mitigated. Residual risk must be Medium or lower before system release. QA Manager approval required for any risk acceptance. | 100% coverage mandatory — multiple test cases, worst-case and boundary conditions. Cannot be risk-accepted without QA Manager approval. |
| 16–35 | MEDIUM | Evaluate for mitigation. Implement controls where feasible. Document rationale if accepted without mitigation. QA sign-off required on any risk-accepted items. | High — test case required in OQ. Primary function + key boundary conditions. |
| 5–15 | LOW | Accept with documented justification. Monitor during operation. Design review may substitute for test execution. | Standard — included in test scope; sampling approach acceptable. Vendor documentation leveraged where appropriate. |
| 1–4 | VERY LOW | Accept without further action. Record in Risk Register for completeness. No test case required. | Documentation only — Risk Register entry with QA countersignature sufficient. No protocol test case required. |
Any failure mode with Severity = 5 (Catastrophic) is automatically treated as HIGH risk and requires mitigation — regardless of the calculated RPN. A S=5 failure mode with excellent detection (D=1) and very low occurrence (O=1) produces RPN=5, which would normally be LOW. The override rule blocks that classification. This rule cannot be waived. It reflects the principle that some patient safety risks cannot be accepted on a probability argument alone.
This is the mechanism that makes risk-based validation work. A HIGH function gets 100% mandatory test coverage — multiple test cases covering all boundary conditions and failure modes. A VERY LOW function requires only a Risk Register entry. The risk score is your formal justification for why one function has ten test cases and another has none. Without this documented justification, your testing strategy is arbitrary — and that is exactly what an inspector will say.
A Worked Example
Take a real failure mode from a purified water SCADA system: “Conductivity alarm CT-101 fails to trigger when conductivity exceeds 1.3 µS/cm during production.”
The effect: out-of-specification water may be distributed to manufacturing points of use without operator intervention. Scoring this:
- Severity = 5 (Catastrophic) — non-conforming water reaching manufacturing directly impacts product quality and patient safety
- Occurrence = 3 (Moderate) — rare but possible; alarm setpoint misconfiguration or instrument calibration drift are both plausible failure causes
- Detection = 2 (Good) — a pre-alarm at 80% of the critical conductivity limit exists, flagging the issue before the full limit is breached
Initial RPN = 5 × 3 × 2 = 30 — MEDIUM by raw RPN. However, the S=5 Override Rule applies: any Severity=5 failure mode is automatically treated as HIGH regardless of RPN. This failure mode is therefore HIGH, and mandatory OQ test coverage applies.
After mitigation (redundant sensor with cross-validation alarm, enhanced calibration frequency): Residual S=5 (unchanged — the consequence of distributing contaminated water cannot be mitigated away), O reduces to 2, D improves to 1 (automatic divert interlock). Residual RPN = 5 × 2 × 1 = 10 — LOW. The S=5 override still applies, so the item remains HIGH-monitored, but with formally documented residual risk acceptance by QA.
Severity reflects the worst credible impact if the failure mode occurs and is undetected. You cannot reduce Severity by implementing a detection control — Severity is an inherent property of the failure mode. A team that reduces Severity scores after adding alarms is producing an invalid risk assessment.
Five Mistakes That Invalidate a Risk Assessment
Eight GMP Risk Categories for PLC/SCADA Systems
A thorough risk assessment covers all relevant failure mode categories. The RMP-SYS-001 defines eight categories that must be addressed for any GMP-relevant PLC/SCADA system:
- Process Control Risks — PID loop failure, setpoint drift, incorrect control sequences, actuator failures
- Data Integrity Risks — audit trail failures, timestamp errors, data gaps, records modifiable without trail (ALCOA+)
- Access Control Risks — shared logins, privilege escalation, session timeout failures
- Alarm Management Risks — critical alarm not triggering, alarm not persisting until acknowledged, history not retained
- Communication / Interface Risks — PLC-SCADA communication loss, historian not recording, LIMS interface corruption
- Cybersecurity Risks — unsecured remote access, default credentials, OT/IT boundary failures
- Backup and Recovery Risks — backup not running, restore fails, data loss exceeds RPO
- Infrastructure Risks — server failure, OS corruption, UPS failure, network switch failure
The QLean Risk Assessment (RA-SYS-001) is a pre-structured FMEA workbook with all eight risk categories, example failure modes for each, and scoring formulas built in. The Risk Management Plan (RMP-SYS-001) defines the scoring scales and thresholds shown in this article, ready to be agreed by the project team before the FMEA workshop. You are not starting from a blank spreadsheet.