Safety III


Safety III: A Systems Approach to Safety and Resilience




  1. Leveson是航太工程背景,她所陳述的Safety III或許可謂是航太軍工產業的嚴謹理想,強調在統計量化的基礎上建力模型與進行simulation,相對適用在PSM、產品安全
  2. Hollnagel乃至於Reason是社會心理背景,陳述的”系統模型(心智模式)”或Safety II,是一般組織與高階主管看待安全與意外的認知與盲點,可謂是質性研究,相對適用在職場安全管理
  3. 業界的安全水準與認知可謂多半停留在Safety I(“相信”所有事故都可以預防、人為疏失佔98%、行為安全與指差呼喚、Heinrich….);想起吳聰智老師臉書的分享:如果你走得太超前,你的士兵可能會把你誤認為敵人- W. Welsby

Does Safety-I Exist?

Erik Hollnagel提出的“Safety-I”概念,但對Safety-I的描述具有誤導性=>與各行業中實際安全工程實踐的現實不符。這是稻草人謬誤,他曲解工程師對於安全的信念和實務(認定工程師相信他們可以預見並防止所有事故,或者認為人類只是系統組件),以使自己的立場看起來更合理。




指差呼喚(pointing and calling)看似白癡+把人貶低為工具,但對於預防恍神與防呆就是有用有效


Differences between Workplace Safety and Product/System Safety



Hollnagel對工作場所安全主題的討論,如Heinrich's Triangle、Taylorism和行為主義,這些被認為與產品/系統安全無關。它強調了在過去100年中,工作場所安全和產品/系統安全之間在實踐和從業者之間的歷史上缺乏重疊,儘管在工人因工作場所中使用的工程產品而受傷時可能存在聯繫。

Workplace and Product/System Safety History







Without workers’ compensation laws, employees had to sue and collect damages for injuries under common law. In the United States, the employer almost could not lose because common law precedents established that an employer did not have to pay injured employees if:

1. The employee contributed at all to the cause of the accident: Contributory negligence held that if the employee was responsible or even partly responsible for an accident, the employer was not liable,

2. Another employee contributed to the accident: The fellow-servant doctrine held that an employer was not responsible if an employee’s injury resulted from the negligence of a co-worker, and

3. The employee knew of the hazards involved in the accident before the injury and still agreed to work in the conditions for pay: The assumption-of-risk doctrine held that an injured employee presumably knew of and accepted the risks associated with the job before accepting the position


討論了安全工程的歷史演變,強調了從專注於職場安全轉向將安全納入產品設計的過渡。最初,工程師們在工業時代開始認識到安全設計的必要性,當時設計的機器存在危險,導致人員死亡。早期的安全措施包括戴維安全燈和喬治•韋斯汀豪斯基於鐵路上開發的壓縮空氣制動器,突顯了將安全與功能性並重考慮在設計中的轉變[Roberts, 1984]。

隨著工程設備在工作場所中的使用增加,安全工程和職場安全密切相關。像是工廠事故預防協會這樣的組織在研究事故和促進安全改進方面發揮了關鍵作用。到了19世紀末,工程師開始認識到從一開始將安全納入設計的重要性,而不僅僅是事後添加護罩[Roberts, 1984]。

職場安全和產品/系統安全之間的分歧大約發生在1930年代左右。1929年海因里希對工業事故進行的研究突顯了不安全行為和條件的普遍性,引發了對事故中人為因素的關注。這一轉變引發了有關工人與不安全機械在事故中角色的爭論,有些人將許多事故歸因於人為錯誤[Heinrich, 1931]。

儘管強調人為錯誤,但早期就認識到將安全設計到機械和設備中以減輕人為缺陷的重要性。這種對安全設計的積極態度旨在使個人在進行任務之前必須進行安全關鍵操作,這反映了工業機械設計中的基本原則[Hansen, 1915]。




















主要目標是從原始設計中消除危害。危害可以通過消除系統操作中的危險狀態(創建固有安全設計)或消除與該狀態相關聯的負面後果(損失)來消除 - 如果一個狀態不能導致任何潛在損失,那就不是危害。當然,從哲學上講,幾乎沒有什麼是不可能的。從實際的工程角度來看,一些物理條件或事件的發生是如此遙遠,以至於考慮它們是不合理的。危害消除可能涉及用一種無危害或較少危害的材料替換另一種材料。例如,用不易燃材料替換易燃材料,或用無毒物質替換有毒物質。




Finally, the last resort is to design to reduce potential damage in the case of an accident, such as alarms and warning systems, contingency planning, providing escape routes (e.g., lifeboats, fire escapes, and community evacuation), or designs that limit damage (e.g., blowout panels, collapsible steering columns on cars, or shear pins on motor-driven equipment).

Human factors engineering and human-centered design: In human factors engineering, psychological concepts are applied to engineering designs to prevent human errors and provide human operators with the ability to safely control the system. More focused human-centered design concepts started to be developed in the 1980s and were first applied in the aviation community. In this approach to engineering design, the role of humans in the control of systems is the focus from the beginning of engineered system concept development.

Operations: Systems must not only be designed to be safe, but they must also be operated safely. Operational safety involves considerations for operability in the original design and for managing operations to ensure proper training of operators, identifying and handling leading and lagging indicators of risk, management of change procedures, maintenance procedures, etc. Data collection and analysis during operations has played an important role in improving design and operational safety.

Management and Policy: Emphasis on the design of Safety Management Systems is a relatively recent emphasis in system safety engineering, dating back to the middle of the last century. 

Accident investigation and analysis: Every industry investigates accidents, but it is usually a small part of the safety engineering effort.

Regulation and Licensing: Regulation may involve rules enforced by an oversight agency, voluntary standards, or certification/licensing of new systems. Regulation usually involves some type of approval of new systems before they are allowed to be used. It also almost always includes oversight into the operation of the systems to ensure that assumptions about operation (such as maintenance assumptions) made during analysis, design, and certification of the system hold in the operational environment and that changes over time are not leading to increasing levels of risk. If such dangerous conditions are caught in time, accidents can be prevented. Examples of ways that oversight agencies collect information during operations include licensee event reports in nuclear power plants, aviation safety reporting systems, and auditing of airline and airport operations.











To understand the unique aspects of the System Safety approach and differentiate it from the other approaches to safety developed in parallel but independently for such industries as civil aviation and nuclear power, a few basic concepts can be identified:

• System Safety emphasizes building in safety, not adding it on to a completed design or trying to assure it after the design is complete.

From 70 to 90% of the design decisions that affect safety will be made in the concept development project phase . The degree to which it is economically feasible to eliminate a hazard rather than to control it depends on the stage in system development at which the hazard is identified and considered. Early integration of safety considerations into the system development process allows maximum safety with minimal negative impacts. The alternative is to design the system or product, identify the hazards, and then add on protective equipment to control the hazards when they occur—which usually is more expensive and less effective. Waiting until operations and then expecting human operators to deal with hazards—perhaps by assuming they can be flexible and adaptable, as in Safety-II—is the most dangerous approach. 

• System safety deals with systems as a whole rather than with subsystems or components. Safety is a system property, not a component property.

• System safety takes a larger view of hazards than just failures. Serious accidents can occur while system components are all functioning exactly as specified—that is, without failure. In addition, the engineering approaches to preventing failures (increasing reliability) and preventing hazards (increasing safety) are different and sometimes conflict.

• System safety emphasizes analysis rather than past experience and standards.

Standards and codes of practice incorporate experience and knowledge about how to reduce hazards, usually accumulated over long periods of time and resulting from previous mistakes. While such standards and learning from experience are essential in all aspects of engineering, including safety, the pace of change today does not allow for such experience to accumulate and for proven designs to be used. System safety analysis attempts to anticipate and prevent accidents and near-misses before they occur. 

• System safety emphasizes qualitative rather than quantitative approaches. System safety places major emphasis on identifying hazards as early as possible in the design stage and then designing to eliminate or control those hazards. In these early stages, quantitative information usually does not exist. And our technology and innovations are proceeding so fast that historical information may not exist nor be useful. The accuracy of quantitative analyses is also questionable. The majority of factors in accidents cannot be evaluated in numerical terms, and those that can will often receive undue weighting in decisions based on absolute measures.

In addition, quantitative evaluations usually are based on unrealistic assumptions that are often unstated, such as that accidents are caused by failures, failures are random, testing is perfect, failures and errors are statistically independent, and the system is designed, constructed, operated, maintained, and managed according to good engineering standards. Some components of high technology systems may be new or may not have been produced and used in sufficient quantity to provide an accurate probabilistic history of failure. Surprisingly few scientific experiments (given the length of the time they have been used) have been performed to determine the accuracy of probabilistic risk assessment, but the results of the few that have been done have not been encouraging.

• System safety recognizes the importance of tradeoffs and conflicts in system design. Nothing is absolutely safe, and safety is not the only or usually even the primary goal in building systems. Most of the time, safety acts as a constraint on how  the system goals (mission) may be achieved and on the possible system designs. Safety may conflict with other goals such as operational effectiveness, performance, ease of use, time, and cost. System safety techniques focus on providing information for decision making about risk management tradeoffs.

• System safety is more than just system engineering. System safety concerns extend beyond the traditional boundaries of engineering to include such things as political and social processes, the interests and attitudes of management, attitudes and motivations of designs and operators, human factors and cognitive psychology, the effects of the legal system on accident investigations and free exchange of information, certification and licensing of critical employees and systems, and public sentiment 

A Comparison of Safety-I, Safety-II and Safety- III

Definition of Safety
MIL-STD-882 (in all its versions since 1969), safety is defined as “freedom from conditions that can cause death, injury, occupational illness, damage to or loss of equipment or property, or damage to the environment.” The standard Oxford English Dictionary defines safety as “being protected from danger or harm.”

Complications arise with the introduction of the term “risk” for measuring safety. Most often, risk is defined as a combination of the severity and likelihood of an unwanted outcome. One problem that arises is that by defining something, such as risk, as only one way to measure it, any alternatives then become impossible including a non-probabilistic measurement. It would be better to define risk as an assessment of safety and then allow different approaches to performing that assessment.
傳統觀點衡量的是危害與風險,希望風險與危害盡可能的低“as low as reasonably practicable” (ALARP) (沒人知到多低是合理的)=>這是哲學與政治,而非科學


更糟的是Hollnagel 的Safety II認為同樣的行為可能“go wrong” and “go right”,其實沒人知道對錯(好或壞視情境而定)

Hollnagel認為:Safety I將安全定義為“盡可能少的事情出錯”,而他的SafetyII將安全定義為“盡可能多的事情順利進行”。安全工程不使用這些術語。首先,我不知道什麼是「東西」。



Safety is usually associated with constraints rather than goals or requirements. However, that is not always true. If a goal of the system is to ensure safety, the requirements will also involve safety. As an example, one goal of an air traffic control system is to ensure that aircraft maintain a safe distance from other aircraft, obstacles, and dangerous weather conditions. 


Now that making “things go right” has been shown to have little to do with safety, let’s consider the things that can “go wrong.” Surely, that must have something to do with safety. However, as with the things that can go right, there are an extremely large number of things that can go wrong that we don’t care about, i.e., they are unrelated to the system requirements or constraints. Within this large set of things that can go “wrong,” there are two subsets that can be called failures and accidents. Because anything that goes wrong could be declared to be a “failure,” even if it has nothing to do with the requirements for the system, engineers define failure in terms of the system specification, i.e., a failure is the nonperformance or inability of the system or component to perform its specified function for a specified time under specified environmental conditions.

  • 對員工或主管而言對的事情,在工安主管眼中是錯的e.g., 用偷工減料的方式,沒出事(好歹沒鬧上檯面)完成工作
  • 對員工或主管而言錯的事情,在工安主管眼中是對的 e.g., 不戴防護具作的同仁,發生了被化學品噴濺的工作傷害(合情合理)


What is a System?
System: A set of things (referred to as system components) that act together as a whole to achieve some common goal, objective, or end.


Sociotechnical Systems(無邊無際)

All useful systems are sociotechnical. Underlying every technology is at least one basic science, although the technology may be well developed long before the science emerges. Overlying every technical or civil system is a social system that provides purpose, goals, and decision criteria
No technical systems exist in a non-societal vacuum.

Safety-II 不是一種社會技術方法,因為技術幾乎被完全忽略。

從Sociotechnical Systems的角度來談安全,的確是更周延;但也更難以具焦、說服他人(高階主管很忙、心力有限)+收歛歸納問題所在

Decomposition and Emergence
Sociotechnical Systems的特性


For this assumption to be true, the following sub-assumptions must be true:
• Each component or subsystem operates independently.
• Components act the same when examined singly as when playing their part in the whole
• Components/events are not subject to feedback loops and nonlinear interactions.
• Interactions can be examined pairwise.
These assumptions are not usually true for complex systems today.

湧現突發(emergent)是Sociotechnical Systems的一個特性


Safety Management Principle

Investigation/Reporting Databases
感覺Safety II與Hollnagel其實是質性研究=>歸納管理心法與心態認知
Safety III與Leveson則是建模=>要有積累大量具體案例,定義出預期變異性,特別是在複雜系統(如航空)中,無法預見的情況和“未知的未知”在事故中扮演重要角色。

Learning from Failure in Engineering

沒有root cause的觀點其實非常震撼與嚇人=>不要修理犯錯的人,修理那個讓人犯錯的系統




1. 除了利用過去的經驗之外,我們還可以進行危害分析來識別許多事故原因。
2. 在許多情況下,但並非全部,這些原因是可以消除和控制的。
3. 如果它們無法被消除或控制,那麼必須做出困難的(而且通常是跨科學的)決定,決定是否應該構建該系統,或者僅在所有利益相關者都可以接受風險的有限情況下使用該系統。
4. 通常大部分事故是可以預防的,但不是全部。
工程師有責任概述危害(風險),但只有利害關係人才能決定系統的命運。權衡是否值得冒險?誰做出這樣的決定? P.65

The Linear Chain-of-Failure Events Model
前因導致後果,過程中間有and or

理工工程背景、講求明確定義與系統建模的Leveson對於Reason 與Hollnagel提出的模型(心智模式)超有意見(其實是雞同鴨講)

Limitations of the Linear Chain of Events Model in General
  1. 每個失校故障都是獨立事件(沒有系統性因素或CCF)
  2. 組織因素(績效、時間、成本等壓力因素,主管霸凌與管理暴政)通常是最大的CCF
  3. 每個失效故障常有蝴蝶與骨牌效應+牽一髮動全身(很多高階主管很無恥與無知:看不見、當成不知道+視而不見)
  4. 過度簡化模型+只看橫斷面的機率,讓人對於可靠度與安全性有錯誤的假設
  5. 簡單線性的因果關係描述,讓人難以看見複雜的交互作用


a. 單一線性,單一投入與產出
b. 多因一果,多投入一產出
c. 調節與中介
d. 交互作用:增強或抵消


Leveson 心目中的安全管理

哈,發現很多工程師與幹部,吃香蕉扮演猴子,Accountability, Responsibilities, Authority三者皆無

放大到整個國家社會Sociotechnical Systems來看,會讓人覺得Leveson也是活在自己的象牙塔中=只適用在她熟悉軍工航太領域

例如page 102中的(不可能)期望
The important principle when taking a systems view of safety is that the system must be designed to allow successful resilience by human operators. There are three requirements to accomplish this goal:
  1. The humans in the system, including operators, managers, government overseers, etc. must be aware that a hazardous state has occurred. The hazardous state must be observable within the time period necessary to prevent a loss. The same is true for software or any type of automated system controller that we expect to respond to hazards. Without knowing that a hazard exists, then it is not possible to control it or to respond to minimize damage.
  2. Accurate information about the current state of the system must be available in a timely manner. The operator must have the information necessary to solve problems.
  3. The system design must allow the flexibility required to be resilient. If there are no actions that the controller (human operator or manager, software, social structure) has available to recover from a hazard, then the human can be resilient—in terms of knowing what to do—but not be able to respond in any effective way. A simple example is the “undo” button in many software applications. More specifically, if a hazard does occur, perhaps because of errors on the part of the operators themselves, either they must have a means to reverse the errors or move to a non-hazardous state. Or other parts of the system must have this capability.
