System accident


A system accident is an "unanticipated interaction of multiple failures" in a complex system. This complexity can either be of technology or of human organizations, and is frequently both. A system accident can be easy to see in hindsight, but extremely difficult in foresight because there are simply too many action pathways to seriously consider all of them. Charles Perrow first developed these ideas in the mid-1980s. William Langewiesche in the late 1990s wrote, "the control and operation of some of the riskiest technologies require organizations so complex that serious failures are virtually guaranteed to occur."
Safety systems themselves are sometimes the added complexity which leads to this type of accident.
Once an enterprise passes a certain point in size, with many employees, specialization, backup systems, double-checking, detailed manuals, and formal communication, employees can all too easily recourse to protocol, habit, and "being right." Rather like attempting to watch a complicated movie in a language one is unfamiliar with, the narrative thread of what is going on can be lost. And since real world accidents almost always have multiple causes, other phenomena such as groupthink can also be occurring at the same time. In particular, it is a mark of a dysfunctional organization to simply blame the last person who touched something.
In 2012 Charles Perrow wrote, "A normal accident is where everyone tries very hard to play safe, but unexpected interaction of two or more failures, causes a cascade of failures."
Charles Perrow uses the term normal accident to emphasize that, given the current level of technology, such accidents are highly likely over a number of years or decades.
James T. Reason extended this approach with human reliability and the Swiss cheese model, now widely accepted in aviation safety and healthcare.
There is an aspect of an animal devouring its own tail, in that more formality and effort to get it exactly right can actually make the situation worse. For example, the more organizational riga-ma-role involved in adjusting to changing conditions, the more employees will likely delay reporting such changes, "problems," and unexpected conditions.
These accidents often resemble Rube Goldberg devices in the way that small errors of judgment, flaws in technology, and insignificant damages combine to form an emergent disaster.
William Langewiesche writes about, "an entire pretend reality that includes unworkable chains of command, unlearnable training programs, unreadable manuals, and the fiction of regulations, checks, and controls."
An opposing idea is that of the high reliability organization.

Scott Sagan

has multiple publications discussing the reliability of complex systems, especially regarding nuclear weapons. The Limits of Safety provided an extensive review of close calls during the Cold War that could have resulted in a nuclear war by accident.

Possible system accidents

[Apollo 13] space flight, 1970

Apollo 13 Review Board:

Three Mile Island">Three Mile Island accident">Three Mile Island, 1979

Charles Perrow:
"It resembled other accidents in nuclear plants and in other high risk, complex and highly interdependent operator-machine systems; none of the accidents were caused by management or operator ineptness or by poor government regulation, though these characteristics existed and should have been expected. I maintained that the accident was normal, because in complex systems there are bound to be multiple faults that cannot be avoided by planning and that operators cannot immediately comprehend."

ValuJet (AirTran) 592">Valujet Flight 592">ValuJet (AirTran) 592, Everglades, 1996

William Langewiesche:
He points out that in "the huge MD-80 maintenance manual . . . By diligently pursuing his options, the mechanic could have found his way to a different part of the manual and learned that . . . must be disposed of in accordance with local regulatory compliances and using authorized procedures."
----
Brian Stimpson:
Step 2. The unmarked cardboard boxes, stored for weeks on a parts rack, were taken over to SabreTech's shipping and receiving department and left on the floor in an area assigned to ValuJet property.
Step 3. Continental Airlines, a potential SabreTech customer, was planning an inspection of the facility, so a SabreTech shipping clerk was instructed to clean up the work place. He decided to send the oxygen generators to ValuJet's headquarters in Atlanta and labelled the boxes "aircraft parts". He had shipped ValuJet material to Atlanta before without formal approval. Furthermore, he misunderstood the green tags to indicate "unserviceable" or "out of service" and jumped to the conclusion that the generators were empty.
Step 4. The shipping clerk made up a load for the forward cargo hold of the five boxes plus two large main tires and a smaller nose tire. He instructed a co-worker to prepare a shipping ticket stating "oxygen canisters - empty". The co-worker wrote, "Oxy Canisters" followed by "Empty" in quotation marks. The tires were also listed.
Step 5. A day or two later the boxes were delivered to the ValuJet ramp agent for acceptance on Flight 592. The shipping ticket listing tires and oxygen canisters should have caught his attention but didn't. The canisters were then loaded against federal regulations, as ValuJet was not registered to transport hazardous materials. It is possible that, in the ramp agent's mind, the possibility of SabreTech workers sending him hazardous cargo was inconceivable.

2008 financial institution near-meltdown

In a 2014 monograph, economist Alan Blinder stated that complicated financial instruments made it hard for potential investors to judge whether the price was reasonable. In a section entitled "Lesson # 6: Excessive complexity is not just anti-competitive, it’s dangerous," he further stated, "But the greater hazard may come from opacity. When investors don’t understand the risks that inhere in the securities they buy, big mistakes can be made--especially if rating agencies tell you they are triple-A, to wit, safe enough for grandma. When the crash comes, losses may therefore be much larger than investors dreamed imaginable. Markets may dry up as no one knows what these securities are really worth. Panic may set in. Thus complexity per se is a source of risk."

[Sinking of MV Sewol]

Possible future applications of concept

Five-fold increase in airplane safety since 1980s, but flight systems sometimes switch to unexpected "modes" on their own

In an article entitle "The Human Factor", William Langewiesche talks the 2009 crash of Air France Flight 447 over the mid-Atlantic. He points out that, since the 1980s when the transition to automated cockpit systems began, safety has improved fivefold. Langwiesche writes, "In the privacy of the cockpit and beyond public view, pilots have been relegated to mundane roles as system managers." He quotes engineer Earl Wiener who takes the humorous statement attributed to the Duchess of Windsor that one can never be too rich or too thin, and adds "or too careful about what you put into a digital flight-guidance system." Wiener says that the effect of automation is typically to reduce the workload when it is light, but to increase it when it's heavy.
Boeing Engineer Delmar Fadden said that once capacities are added to flight management systems, they become impossibly expensive to remove because of certification requirements. But if unused, may in a sense lurk in the depths unseen.
Langewiesche cites industrial engineer Nadine Sarter who writes about "automation surprises," often related to system modes the pilot does not fully understand or that the system switches to on its own. In fact, one of the more common questions asked in cockpits today is, "What’s it doing now?" In response to this, Langewiesche again points to the fivefold increase in safety and writes, "No one can rationally advocate a return to the glamour of the past."

Healthier interplay between theory and practice in which safety rules are sometimes changed?


From the article "A New Accident Model for Engineering Safer Systems," by Nancy Leveson, in Safety Science, April 2004:
"However, instructions and written procedures are almost never followed exactly as operators strive to become more efficient and productive and to deal with time pressures. . . . . even in such highly constrained and high-risk environments as nuclear power plants, modification of instructions is repeatedly found and the violation of rules appears to be quite rational, given the actual workload and timing constraints under which the operators must do their job. In these situations, a basic conflict exists between error as seen as a deviation from the normative procedure and error as seen as a deviation from the rational and normally used effective procedure."