Has a resilient performance approach to systems development

ESSENTIALS OF SAFETY BLOG 12/14

Spending time knowing what to do when things start moving away from going right, knowing what to look for or being able to monitor things that need to be in place to ensure things go right, knowing what has happened and being able to learn from the experience, and knowing what to expect or being able to anticipate developments into the future, and then including that knowledge into your systems, leadership routines and conversations will go a long way to ensure we get it right, each and every time.

This is the twelfth of a dozen or so blogs covering the Essentials of Safety that I talked about in the first blog of this series. We have covered an introduction – which we called Essentials of Safety, Understands their ‘Why’, Chooses and displays their attitude, Adopts a growth mindset – including a learning mindset, Has a high level of understanding and curiosity about how work is actually done, Understands their own and others’ expectations, Understands the Limitations and use of Situational Awareness, Listens Generously, Plans work using risk intelligence, Controls Risk and Applies a Non-Directive Coaching Style to Interactions.

The other blogs in the series are:

  • Adopts an authentic leadership approach when leading others.
  • Bonus – The oscillations of safety in modern, complex workplaces.

Has a Resilient Performance Approach to Systems Development

Before I talk about how we can use the ideas of resilience to help create and monitor the systems that help drive safe work, I will recap what resilience engineering is all about.

Resilience engineering, at least in terms of how it fits in with our current conversation, has four potentials of interest to us, and these need to be encouraged, measured, and talked about. These are the potentials to Respond, Monitor, Learn, and Anticipate:

Respond: Knowing what to do when trouble goes down or is about to go down.

Monitor: Knowing what to look for or being able to monitor things that could go wrong.

Learn: Knowing what has happened and being able to learn from the experience.

Anticipate: Knowing what to expect or being able to anticipate develop- ments into the future.

We should think about resilience as we have our field leadership conversations and interactions. This means checking the resiliency of our procedures and teams doing the work. We should assess whether the teams have thought about what needs to go right, what could go wrong, are keeping an eye on what is going on as issues develop, and have plans to bounce back from the face of adversity back into safe work without things going south. We need to identify resilient performance and celebrate it, understand it, and learn from it. In other words, we need to establish how much of Work-As-Normal represents resilient performance on a day-to-day basis.

Resilience is pretty much the same as saying we are coping with complex- ity and is all about successfully coping with the unexpected – including, and maybe especially, at the time before the unexpected materialises as an unintended outcome or incident.

Historically, we have tended to use the ideas of resilience to do three things: prevent something from going wrong; prevent things from getting worse; and recover from stuff that has gone wrong. I suggest trying to focus more on using resilience to help make sure stuff goes right in the first place rather than just on preventing things going wrong. This aligns with my view of what safety is. As I mentioned in the introduction, ‘Safety’ is about maximising things going right rather than an absence of things going wrong. It is about people in a system, not a system driving people. It is about maintaining the balance between thinking and doing.

To this end, I have thought about a slightly different focus for the explanations of what the four potentials are. The intent was not to change them but to focus them more positively:

Respond: Knowing what to do when things start moving away from going right.

Monitor: Knowing what to look for or being able to monitor things that need to be in place to ensure things go right.

Learn: Knowing what has happened to make things go right and being able to learn from the experience.

Anticipate: Knowing what to expect or being able to anticipate developments into the future.

The only real way of determining if resilience is present in a system or a workplace is to get out there and witness resilient performance on a day-to-day basis. Remember that resilient performance is always about balancing a trade-off between goals. It is to be found every day, if we only look for it and encourage it.

Specifically, we should look at our systems – by which I mean safety systems, procedures, leadership behaviours, individual and team thinking, and how they all interact – and explore whether we have set up a system that drives Responding, Monitoring, Learning, and Anticipating as work is being planned and undertaken. This is mainly done through field leadership conversations and whenever we review, audit, verify, or in any other way check, the effectiveness, usefulness, and accuracy of our systems. Here I do not just mean the high-level corporate systems but also the low-level procedure and work instruction level of systems that are used each and every day in the workplace.

How we create and modify our safety systems over time always needs to have a lens of resilience engineering over it. You could say that we need to attain a culture of resilience, if there is such a thing.

System Safety as an idea or concept is not new. Neither is Process Safety, nor is Safety in Design. I will not spend time here digging into what each of these looks like. I will say, however, that what we need to do within these frameworks is to build in resilience early, as well as approaches that emphasise the handling of complexity and the elimination or control of hazards, allowing humans to fail safely. Do not wait until the failure has occurred to work out what tweaks we need to make to the system, but build the system to include that eventuality.

It is arrogant for us to believe that we can set up a system that will be completely resilient. Resilience lies in resilient performance. So what we should do is to try to set up a system that encourages people to be aware of weak signals and give them the skills to adapt to them, to make sure it continues to go right. This is how we can build resilience into our systems.

We want to include stuff that helps us understand when things are going well, or becoming brittle – before they break.

It is also worth mentioning here that systems are not things that are ‘applied’ to people, but that systems include people. This tends to be forgotten when we talk about system development.

In this section, I want to describe a possible system approach and how it got to look like it does.

When developing systems of work, or tweaking existing elements of our systems, we should consider how a number of different things and processes interact and/or act together when exposed to a number of different influences at the same time. In other words, we need to apply both systems thinking and complexity thinking, often simultaneously. It is often the interrelationships between the various elements of a system that, only when considered together, make any sense.

It is useful to summarise the drivers and intents of the different parts of a safety system and how they interact. This requires a considerable amount of work but is essential because the reasons behind those procedures are easily forgotten as people move on and times change.

Historical/corporate/collective memory such as ‘I seem to recall we had an incident a number of years ago related to this stuff …’ is not sufficient to retain learning. As we see in the paper by Fanta et al., the reliance on humans passing on learning to others is a faulty mechanism for retaining fear of catastrophe. Individuals will tend to remember details and incidents in their recent past, but individuals move on, and only the second-hand memory may continue. Instead of relying on this, we can build the information into the story behind the elements of the system. We can try to make these stories emotive, powerful, and visual to maximise learning for the current system users.

The systems need to be designed (well, they need to be intended to be designed) to allow people to fail safely. We know that it is not possible for 100% of our brilliant people to be 100% focussed and perform 100% accurately 100% of the time. There is one guarantee in life, and especially in the workplace, and that is that people will fail. Give someone a spanner and ask them to climb onto a scaffold, and they will drop it at some point in time. Give someone a syringe and there will be stick injuries every now and again. Look at babies learning to walk – a clear example to show that failure is a critical part of learning.

As we create the systems that we intend to use, we need to fully recognise that Work-As-Done does not always equal Work-As-Written and that procedures by their very nature cannot handle the unanticipated things that pop up in the real world. We can also intentionally and actively integrate into the systems, both HF/E (Human Factors/Ergonomics) and a detailed study of Work- As-Normal – how the real world works.

A useful lens through which to view the world as we are building systems is to apply some thinking to what could or should go on. It is useful to explore the links and relationships between functions within a system and how variability in one function can impact others. The Safety Oscillation Model described further on in this chapter was built after considering such a review.

In a nutshell, I broke down a system into a set of functions that relate to each other and then I attempted to draw how they interacted. Then I explored sources of variability within each function and explored how it may play a part in other functions within the system. In many ways, the resultant model talks to how we think the systems would work as a completely interrelated system rather than simply a whole lot of separate bits. This played a key role in developing the Safety Oscillations Model detail.

Key Takeaway: Spending time knowing what to do when things start moving away from going right, knowing what to look for or being able to monitor things that need to be in place to ensure things go right, knowing what has happened and being able to learn from the experience, and knowing what to expect or being able to anticipate developments into the future, and then including that knowledge into your systems, leadership routines and conversations will go a long way to ensure we get it right, each and every time.