How Complex Systems Fail
Category: Library
#Systems #complexity #failure
How Complex Systems Fail Richard Cook, University of Chicago.
Articulation of the steps that contribute to failure by the nature of complex systems. Focused on systems such as transport, health care, and power generation, with a corresponding implicit view of the hazardous nature of those systems.
When reading for software engineering I transposed bugs for accidents and quality for safety.
Quotes & Takeaways
- By their nature complex systems are broken systems.
- Catastrophic failures occur because of combinations of failures. Individually, those failures are seen as normal and the complex system has developed various mechanisms of mitigating them in normal operation.
- ∴ there is no root cause
- Hindsight bias is a significant hindrance → "hindsight bias remains the primary obstacle to accident investigation
- Safety (quality) "is an emergent property of systems" (not components in systems).
- "More robust system performance is likely to arise in systems where operators can discern the “edge of the envelope”" — i.e. you need experience of what happens on the wrong side of the line to operate effectively on the right side of the line. Given that complex systems are ever changing (including the people and experience within them) this implies that any software engineering organizations effectiveness will be cyclical and that we should ∴ plan for that.
For software development I found Cook's #11 fascinating:
Organizations are ambiguous, often intentionally, about the relationship between production targets, efficient use of resources, economy and costs of operations, and acceptable risks of low and high consequence accidents. All ambiguity is resolved by actions of practitioners at the sharp end of the system. After an accident, practitioner actions may be regarded as ‘errors’ or ‘violations’ but these evaluations are heavily biased by hindsight and ignore the other driving forces, especially production pressure.