The
Limits to Safety? Culture, Politics, Learning and Man-Made Disasters
Nick Pidgeon
Journal of Contingencies and Crisis Management
Volume 5 Number 1 March 1997
Man-Made Disasters: Organizations as Vulnerable Socio-Technical Systems
A significant disruption or collapse of the
existing cultural beliefs and norms about hazards, and for dealing with them
and their impacts. All organizations operate with such cultural beliefs and
norms, which might be formally laid down in rules and procedures, or more
tacitly taken for granted and embedded within working practices.
In Turner's terms, disaster is then
differentiated from an accident by the recognition (often accompanied by
considerable surprise) that there has been some critical divergence between
those assumptions and the `true' state of affairs.
MMD also highlights how system vulnerability
often arises from unintended and complex interactions between contributory preconditions,
each of which would be unlikely, singly, to defeat the established safety
systems.
This point was explored later by Perrow
(1984) in his more deterministic account of the causes of normal accidents in
technological systems.
為何組織會對潛在與蓄積中的風險視而不見?
Four classes of information difficulties
are central to this cultural process of defective reality testing and denial.
They stem from the attempts of both individuals and organizations to deal with
problems that are, in foresight at least, highly uncertain and ill-structured.
1. Critical errors and events may initially
remain latent, or are misunderstood, because of wrong assumptions about their
significance. This leads to a selective problem representation at the level of
the organization as a whole, a situation which, in turn, structures the
interpretations and decisions of the organization's individual members. Such a
representation may arise through organizational rigidity of beliefs about what is
and is not to be counted a `hazard'.
A related syndrome described in MMD is the
`decoy phenomenon'. Here personnel who are dealing directly with risk and
hazard management, or others who suspect there is something amiss, may be
distracted or misled into thinking the situation has been resolved by attention
to related (that is decoy) events.
2. Dangerous preconditions may also go unnoticed
because of the inherent difficulties of handling information in ill-structured
and constantly changing situations, leading to a condition described by Turner
as variable disjunction of information. Here the problem may becomes so
complex, vague or dynamic and the information that is available at any one time
dispersed across many locations and parties- that different individuals and organizations
can only ever hold a partial(and often very different and changing) interpretation
of the situation.
3. Uncertainty may also arise about how to
deal with formal violations of safety regulations. Violations might occur
because regulations are ambiguous, in conflict with other goals such as the
needs of production, or thought to be outdated because of technological
advance. Alternatively, safety waivers may be in operation, allowing relaxation
of regulations under certain circumstances as occurred in the case of the Space
Shuttle Challenger O-ring seals.
4. Finally, when things do start to go
obviously wrong, the outcomes are often worse than they might have been because
those involved will tend to minimize danger as it emerges, or to deny that
danger threatens them.
`the `radius of
foresight' is much shorter than the `radius of action''.
shaping blindness to certain
forms of hazard
Political Design for Political Problems?
Can institutional resilience be a realistic
design goal, through changes to an organization's safety culture?
A first issue to resolve
is the problem of warnings.
Few would probably disagree that foresight
is indeed limited and, as such, the identification of `signals' in advance of a
major failure is problematic. But just how
limited? For if the identification of
system vulnerability sets an impossible task then high reliability cannot be
achieved irrespective of politics.
On a more pragmatic level, one needs to know
whether differences in safety performance observed across contexts and in foresight
are more than mere error variance. Most of the time, as Sagan's (1993) account
only too readily
illustrates, it is a matter of judgement as
to whether the current safety glass is half-empty or half-full. Certainly,
careful observation and measurement of theoretically relevant events- unsafe
acts; known barriers to communication; diffusion and fragmentation of
responsibilities; financial constraints- is one route to follow and with some
success (Wagenaar et al, 1994), although it remains to be seen precisely which empirical
questions will differentiate vulnerable from resilient systems.
High
reliability organizations (HROs)
Kathleen M. Sutcliffe
Best Practice & Research Clinical
Anaesthesiology 25 (2011) 133–144
如何定義何謂HROs
One can identify this subset by answering
the question, ‘how many times could this organisation have failed resulting in catastrophic
consequences that it did not?’ If the answer is on the order of tens of
thousands of times, the organisation is ‘high reliability’”.
呵呵,實務的角度而言,出事不奇怪,最奇怪的是很多很鳥的為什麼不出事
there are no safe organisations because
past performance cannot determine the future safety of any organisation.
應該正名
Reliability-seeking organisations are not distinguished
by their absolute errors or accident rate, but rather by their “effective
management of innately risky technologies through organisational control of
both hazard and probability.
Competing approaches to achieving reliability
Prevention
Prevention or anticipation requires that organisational
members try to anticipate and identify the events and occurrences that must not
happen, identify all possible causal precursor events or conditions that may
lead to them and then create a set of procedures for avoiding them.
Studies show how HROs are obsessed with
detailed operating procedures, contingency plans, rules, protocols and
guidelines as well as using the tools of science and technology to better
control the behaviour of organisational members to avoid errors and mistakes.
Nevertheless, research also shows that
adherence to rules and procedures alone will not prevent incidents. There are limits to the logic of prevention.
One limitation is that unvarying procedures
cannot handle what they do not anticipate. Moreover, even if procedures could
be written for every situation, there are costs of added complexity that come
with too many rules.
Resilience
HROs are unique in that they understand
that reliability is not the outcome of organizational invariance, but rather,
results from a continuous management of fluctuations in job performance and
human interactions. To be able to become alert and aware of these inevitable
fluctuations, to cope with, circumscribe or contain untoward events, such as
mistakes or errors, ‘as they occur’ and before their effects escalate and
ramify, HROs also build capabilities for resilience.
Resilience involves three abilities:
(1) the ability to absorb strain and preserve
functioning in spite of the presence of adversity (e.g., rapid change,
ineffective leadership, performance and production pressures, increasing
demands from stakeholders);
(2) an ability to recover or bounce back
from untoward events – as the team, unit, system becomes better able to absorb a
surprise and stretch rather than collapse; and (3) an ability to learn and grow
from previous episodes of resilient action.
HRO的特徵
Mindful organising (Situation Awareness)forms
a basis for individuals to interact continuously as they develop, refine and
update a shared understanding of the situation they face and their capabilities
to act on that understanding. Mindful organizing proactively triggers actions
that forestall and limit errors and crises.
Building a group and organisational
culture, where it is the norm for people to respectfully interact. Second, they
foster a culture where people interrelate heedfully so that they become more
consciously aware of how their work fits in with the work of others and the goals
of the system. Third, HROs establish a set of practices that enable them to
track small failures, resist oversimplification of what they face, remain
sensitive to current operations, maintain capabilities for resilience and take
advantage of shifting locations of expertise.
Applying
HRO and resilience engineering to construction: Barriers and opportunities
Eleanor J. Harvey, Patrick Waterson, Andrew
R.J. Dainty
Safety Science 2016
安全觀念演進的脈絡
NAT源起於三浬島核災事故
HRO源起於對抗NAT,針對航母運作進行了五年的現場觀察與訪談
High reliability organizations are
characterized by their capacity to respond, learn, and feedback quickly through
accurate communications, and their flexibility to improvise by recombining
resources, skills and experience.
RE則混雜許多先前觀念-人因、組織文化、系統安全
The characteristics of a resilient organisation
are less well-defined than high reliability organisations, but the RE community
believes any organisation can become resilient, with different industries managing
stability and flexibility in different ways
以上理論都是概念,難以定義哪些組織是所謂HRO and RE那些組織不是HRO and RE,遑論進一步實證比較確認其效益effect size
先前與傳統的觀念
Before the age of systems safety, accidents
were believed to have a root cause - a technical malfunction or individual failure
on which events could be blamed.
怪罪犯錯的人與不給力的設備有好處:This simplistic model is emotionally satisfying and has legal and
financial benefits.
The prominence of the ‘Zero Accidents’
discourse also confirms this model.
Accidents in HRO are described in causal
terms, as the result of an unfortunate combination of a number of errors;
hence, detecting failures as they develop through sensitivity to weak signals
is advocated.(欸,信號越微弱,型一型二誤判風險越高)
Based on this interpretation, risk analysis
depends upon the systematic identification of causal chains and implies safety
is a static commodity that can be quantified, not a dynamic process.
For RE, safety is a dynamic process, human
behaviour cannot be categorised in a bimodal way and the causes of accidents
are far more subtle and complex – nothing worth reporting happens. Instead,
accidents are caused by an undetectable ‘‘drift into failure” which is a
natural part of operations in resource constrained environments.
The efficiency thoroughness trade off
(ETTO) principle (the tendency to sacrifice thoroughness for efficiency) is key
to understanding the ‘drift’ that means failure can develop out of normal
behaviour. Humans have a natural tendency towards efficiency (Hollnagel, 2009).
Rational decision-making is also limited by context, subject to social and
cultural factors (Perrow, 1984), and constrained by finite cognitive resources
so people “muddle through” making what they perceive to be ‘‘sensible
adjustments to cope with current and future
situational demand.
呵呵,RE的觀點竟然跟NAT很像
Drift,
adaptation, resilience and reliability: Toward an empirical clarification
Kenneth A.
Pettersen, Paul R. Schulman,
Safety Science 2016
本文的關鍵問題
How can we differentiate
adaptation and resilience from an organizational drift which undermines
reliability and safety?
思思有兩種,resilience也有好幾種
Resilience has alternately been conceived
as
“rebound” from failures;
‘‘robustness” (absorbing shocks without major
failures);
‘‘graceful extensionality” (extending
boundaries or ‘‘stretching” organizational capacity to reduce brittleness and cope
with surprises); and
a sustained adaptability
事前的預料與準備
Precursor resilience which is about monitoring and keeping operations within a bandwidth
of conditions and acting quickly to restore these conditions rapidly as a way
of managing risk.
遭遇事故後的恢復力、回復速度
Restoration resilience which consists of rapid actions to resume operations after
temporary disruption.
系統適應力
Recovery resilience, which is about putting damaged systems back together to establish
a ‘‘new normal” at least as reliable and robust as before, if not improved.
呵呵,很多組織在熱力學第二定律底下逐線衰敗的調適adaptations,其實它的以上三種resilience 都是越來越差(如同人年紀大了,認知記憶、骨骼密度與肌耐力/應變反應都只會衰退)
“adaptations” actually become a negative drift
in relation to the pursuit of larger reliability and safety goals in these
organizations.
所謂的調適adaptations
正面來說就是在有限資源與種種限制下,用馬蓋先式的就地取財與隨機應變完成任務
負面來說,可謂是集體的駝鳥心態、偷工減料與共犯結構(有多少資源、大家的心態到哪,事情就做到哪)
好有犀利的一段話
No organization is exempt from drifting
into failure.
The reason is that routes to failure trace
through the structures, processes and tasks that are necessary to make an
organization successful.
Failure does not come from the occasional,
abnormal dysfunction or breakdown of these structures, processes and tasks, but
is an inevitable by-product of their normal functioning.
The same characteristics that guarantee the
fulfillment of the organization’s mandate will turn out to be responsible for
undermining that mandate.[Dekker, 2011, p. xiii]
HRO與RE兩個理論背後的潛在衝突
High Reliability Organization(提升可靠度=減少變異):包括教育訓練提升人的認知與能力、建立各種SOP與checklist、查核與防呆的procedure
Resilience Engineering(加大組織與系統的容錯與承擔變異的能耐):例如工程上的layer of protection, redundancy and backup;管理上預先發想各種極端情境,以擬定應變與BCP、進行各項準備與替代方案
而這種作法,背後必須做出一些取捨、無法兼顧
提高可靠度=標準化與嚴格管控(=失去應變與自作主張的彈性)
提升韌度的flexibility, 備用備份的投資,從某些角度與對錙銖必較的經營管理而言,可謂是沒有效率與效益的浪費
High reliability organizations are
characterized by well understood and relatively stable technologies. They
feature elaborate analysis, anticipatory planning and modeling of their
technical systems (in American nuclear plants, for example, it is a violation of
federal regulations to operate them ‘‘outside of analysis”). This analysis and
anticipation is reflected in detailed procedures which govern most of the work.
innovation to careful system-wide analysis
(Schulman, 1993).
HRO達成安全的技法
HROs in fact significantly reduce
uncertainty by means of (at least) the following features:
Reliability
and safety goals are clear and well monitored. There is a strong shared
recognition of the stakes of system failure. Further, no production or output
goal is allowed to come before safety and reliability. HROs will shut down
operations rather than operate in unsafe conditions, including uncertainty, and
there is public and political support for this priority. 不安全就中斷生產
There is careful management of bandwidths
in organization and operation. Tasks and the operation of the organization
and its technical systems are kept well within the limits of its known reliability
envelope. There is also effective ‘‘precursor
resilience” (as we will describe) to
monitor and restore operations within specified bandwidths (Roe et al., 2002). 不挑戰與測試極限
Protection
of social structures (e.g. social networks are carefully managed and new
members are trained and socialized over long periods of time).團隊組織成員的人際關係
Skepticism concerning change. Improvement is
seen as a necessity but change to accomplish this improvement is approached with
caution. If practical change is introduced this will be done under a systemic
and not simply a localized perspective (Binci and Cerruti, 2012). The dominant
attitude concerning change in HROs we have observed is to always assure that
every step toward improvement must keep the organization at least as reliable
as it currently is. HRO managers do not take risks to reduce risks. 對於改變與創新保守.
Organizational drift and risk of catastrophic failure
In apparently safe states it is difficult
to maintain a commitment to system safety over time as safety goals will be
compromised, particularly under conditions of scarce resources.
When a technology is viewed as reliable and
many successes and no failures are observed, there is increased confidence
coupled with political incentives to revise the likelihood of failure downwards
in order to justify shifting resources to other activities. With an improved safety
record and long periods of safe performance, resources gradually shift away
from safety toward support of efficiency goals. This leads to reduced safety
margins and a drive to drift away from safety concerns that may eventually lead
an organization toward increases in vulnerabilities and allow another
catastrophe.
the Cycle of Failure
評估Drift要問的問題
1. Drift with respect to what?
Is drift connected to a shift in goals,
values, psychology, protocols or practices? (E.g. goal displacement from safety
to efficiency, or a cognitive change from formal to schema-based decision-making.)
2. Drift with respect to whom?
Who is making behavioral changes? Who is
going to be making representational errors because of them: operators,
directors, regulators, stakeholders, and/or the public?
能夠用 Safety Change Management來對抗組織的Drift 與熱力學第二定律的衰敗?
Safety
change management – A new method for integrated management of organizational
and technical changes
Marko Gerbec
Safety Science xxx (2016)
哪些是所謂關鍵的改變?
a. The technical/technological and
organizational changes are interconnected in an organization, so changes should
be managed in an integrated way.
b. The complexity and propagation of the
impacts likely spans over more than one organizational level, so the changes shall
be managed considering implications on all relevant levels.
c. The ‘‘pure” technical/technological
impact(s), as well as organizational issues impacted at various management levels,
shall be clearly identified, categorized and subject to careful safety
evaluation, planning and documentation.
單純技術與材料的改變不重要,重要的是那些攸關各階層與不同權責單位的改變
The overall purpose is to prevent risk information gaps
among the stakeholders in a change, thus the proposed approach will build
on the concept of situational awareness/Common operational picture.
各階層要看的
呵呵,構想立意良好,執行起來工程浩大複雜
要讓安全人員對公司的產品組合說三道四?!
沒有留言:
張貼留言