This article provides a brief overview on how leading organizations utilize risk management business processes and various insurance processes such as audits and assessments to identify their biggest risks, associated critical controls, and assure the efficacy of those controls.
This article focuses on what works well in helping an organization become operationally excellent in terms of:
- management system frameworks
- rules of risk management
- identification and auditing of critical controls
Let’s start with a few facts
It remains shocking that:
- 60% of all operational losses result from preventable causes
- 80% of incidents are repeat incidents
- upto 30% of an organization’s costs are wasted fixing the same issues
Obviously, the companies that are operationally excellent have a significant advantage over those that experience these types of issues. Think how much more competitive organizations could be if they could apply these resources instead to things like new equipment, R&D, and Marketing.
So what makes an organization great?
- learn from their mistakes and the mistakes of their peers
- conduct quality audits and incident investigations
- share their incident data to help prevent repeat incidents
- maintain past risk management programs and data analytics
- seek to understand their business processes, risks and controls
- don’t hide bad news or risks from their management, shareholders or the public
- proactively look for risk
- invest in the controls to manage those risks
- maximize their opportunities
Organizations with robust ISO 9001, ISO 14001, OHSAS 18001 and lean manufacturing processes are the most successful. Why is that? Because they :
- go beyond simple regulatory compliance
- empower their people and recognize the importance of safety in a customer first culture
- understand their risks and opportunities
- understand their business and operations processes
- have the right metrics to drive and reward the right behaviours
- have simple, up-to-date procedures that they actually follow
- have competent staff, contractors and suppliers
- have a good management system framework and
- audit, a lot
And what constitutes a good Management System Framework?
Most organizations today are moving towards an integrated management system rather than a separate system for health, safety and environment, which certainly makes more sense. And most management systems are built on the “plan, do, check, act” (PDCA) improved model that was developed by Deming back in the early 1980s. Different organizations have different elements in the PDCA, with the number of elements ranging from 10 to 100 or sometimes even more. Below is an example of one such PDCA model with 18 elements.
This article focuses on the elements highlighted in the picture;
- Legal requirements and commitments
- Risk identification, assessment, and management
Let’s talk about legal requirements and commitments first
The element dealing with legal and other requirements outlines requirements to establish a business process to identify legal and other commitments. The other commitments include promises that organizations make in their permit license applications and agreements to a regulator, to a First Nations group, or to a host of other third parties. It also includes requirements to ensure that the organization has evidence that it keeps this information up-to-date and that it is operating in compliance. These commitments are usually captured in a legal register which is now called a Compliance Obligation.
There are lots of good examples of effectively designed legal registers, but at a minimum, they should document:
- What is the requirement?
- How, why and where is the requirement applicable to the organization’s operations?
- Who is responsible for demonstrating compliance?
- What is the evidence that the organization is operating in compliance?
Evidence in usually in the form of records for waste management generation, shipment, air or water monitoring, training, inspections etc. and best of all, audit.
To learn what a good Legal Register/Compliance Obligation looks like read this article.
(Nimonik can show examples of and help you develop an efficient and ideal legal register/compliance obligation. Nimonik can even provide audit protocols for situations where you do not have a good existing evidence of compliance and/or you want the added level of assurance that an audit provides.)
Unfortunately, a lot of the companies do not have a legal register in place. They claim that because they have never been charged, they must be in compliance. Well, that’s not the way it works when one goes to the court. To protect itself, an organization needs evidence of control, such as:
- informing workers of the hazards
- inspection records
- maintenance records
And a good legal register/ compliance obligation provides a framework to capture that important data and helps an organization to know exactly what it has to be in compliance with, who is responsible for doing that, what controls are in place and what is the evidence to support those claims.
Not maintaining a legal register can be pretty costly. Think of BP and the billions of dollars they are spending now for non-compliance when it spilled millions of barrels of oil into the Gulf of Mexico. Or Suncor that lost hundreds of millions of dollars a few years ago in shutting production because it actually failed to install a piece of pollution prevention equipment that it promised in a permit application.
One should consider the potential consequences of non-compliance as an input to the legal register and the audit planning process. Activities with highest potential consequences for non-compliance should be audited first.
Risk identification, assessment, and management
The most critical element in the management system framework addresses the risk identification and management business process. This element is one of the foundations of an operationally excellent management system. If an organization has good risk and control data, it will have better information to help allocate its scarce resources against the right opportunities and highest risks.
The image below illustrates the framework developed by the ISO to:
- establish a risk policy, standard and supporting procedures
- appropriate roles and responsibilities
- training and monitoring program to support the business process
- the methodologies to analyze and assess the risks and then treat them usually by putting controls in place to nature and depth suitable to mitigate them to a level where the company can live with the risk
An organization must use ISO standards whenever possible. ISO standards contain the best thinking from around the world and an organization can always go beyond its requirements where it makes business sense.
The concept of risk includes the following five components:
1) hazard inherent in an activity otherwise deemed beneficial
2) an undesirable event, which brings out the hazard
3) adverse consequence of the undesirable event
4) uncertainty whether the undesirable event will happen or not
5) perception about the combination of the above
One of the important concepts here is the consideration of inherent risks, the risk with no controls present. After we apply control treatments, we are left with residual risk. A more visual way to look at risk is to consider the risk controls or treatment measures as layers of protection. Let’s follow the image below in the example of driving a car.
What is the risk inherent in driving? Momentum. What is the potential undesirable event? A collision, of varying severity. What are the protective layers? Well, it could be driver training, obeying the speed limit, rules around distracted driving, drug and alcohol policy, valid license, etc. A whole series of preventive layers. What is a mitigating control? Seatbelts, inherently safer design for the vehicle, airbag, safety framing. What’s the recovery control? Insurance policies.
To help determine the inherent and residual risks after control treatments most organizations deploy a risk matrix. The image below is an example.
In this particular risk matrix, the risk receptors are considered in four groups that are highlighted in green:
- Health & Safety
One should then pick the risk receptor with the highest risk rating.
To illustrate, let’s pick an event which is:
- likely to happen and
- with the nastiest consequences
operating a pipeline!
One of the worst potential risks is a large spill to a sensitive ecosystem, such as a lake, a river, ocean, etc. The main receptors, in this case, would be Environmental, Financial and Reputation. The inherent risk without controls is high both in terms of likelihood and consequence (billions of dollars in cleanup costs, fines, loss of value and reputation). These scenarios place the risk somewhere in the upper right quadrant (red) as an unacceptable level one risk. The intent then is to look at controls in place and see if they are in nature and depth sufficient to lower the likelihood of an eventual severity, to an extent that it can be removed from the upper right unacceptable level, one portion of the matrix down, to a more acceptable level two or three ranked risk.
Controls to reduce the risk can be the design of the pipeline, the route selected, the quality and thickness of the steel, the quality of the welding, the implementation of the pipeline integrity programs, smart pigging, regular inspections, lead detection programs, emergency response programs etc. With such controls in place, the residual risk may be acceptable.
One should score all four receptors in most cases and select the one with the highest rating.
This colour coded risk ranking map is ideal for presentation to the management as it provides an easy way to prioritize and justify the actions needed to invest in additional controls. It is important to show both, inherent risk and residual risk especially for level one risk, because if all one ever does is present to management just the remediated level two or three risks, one is, in fact, masking the biggest company risks and neglecting the efficacy of critical controls.
There are lots of methods available to help identify hazards and undesirable events. These include simple things like:
- field level risk assessments
- job safety analysis
- Failure mode effect analysis (FMEA)
- PHAs (process hazard analysis)
- layers of protection analysis
There are lots of sophisticated and simple tools that can be applied based on the nature of the risk.
The methodology that I prefer as a key piece of the audit planning process is something called The Bow-Tie Analysis. In the middle, is the business activity and the risk event being considered. On the left-hand side are the threat causes and preventative controls. And on the right-hand side are the recovery preparedness controls and potential consequences.
This methodology helps one visualize what controls are critical to either prevent the event, reduce its impact or recover quickly from it. And it’s a great way for risk and control owners to communicate that information as training or to their management teams.
Let’s look at an example of a completed bow-tie for a fictitious east coast marine operator, looking at the potential release of a product to a waterway. In the image below you can see the listing of threats and the current state preventative and recovery controls. You can also see the consequence readings for multiple receptors on the right.
At the end of any risk assessment remains a list of risks that are undesirable as for the consequences and likelihood delineated either using team judgment, expert opinion or detailed quantitative analysis. This is called a Risk Inventory or a Risk Registry.
There are many templates available to capture data, but at a minimum, one should capture:
- risk being addressed
- risk location
- risk owner
- inherent risk rating
- existing control treatments
- description of the control type
- residual risk rating
- termination of acceptance or plan of action to add more controls
Hierarchy of hazard control
The executive should always be most interested in the adequacy and efficacy of control treatments for the level one and two risks.
The most effective controls, of course, are those that eliminate the risk. The next most effective controls are the engineering controls. So for instance, one might add machine guards, interlocks or barriers, safety instrument systems. Or one might change the process. For example, the electronic industry moving from using hazardous solvents to clear a circuit board to a water-based cleaning system. The next in the hierarchy are administrative controls such as operating procedures and training. The last resort and least effective is personal protective equipment.
Control adequacy for level one risk
The nature and depth of the control should be commensurate with the financial consequences. One should be looking for a lot of engineering controls and a lot of stringent administrative controls. Having only administrative controls and PPEs indicates that the organization is probably already in trouble. More mature companies assign a numerical value to the different categories of controls to ensure that the risk owners can then properly downgrade a level one risk to a level two risk, using controls that are lower in the hierarchy.
Methodologies such as HAZOPs, PHAs, and LOPAs, give immediate value and risk reduction through the identification of gaps in control adequacy. Risk registries are great in capturing data on orphan children where nobody has been assigned to manage a particular risk or control.
The management gets paid to determine the company’s risk tolerance. To engineer out a risk totally can be quite costly. But it is important to at least identify the risk and apply reasonable controls to manage it on an ongoing basis and know those controls are working.
Identifying critical controls
To identify critical controls, The International Council on Mining and Minerals, suggests answering the following questions:
- is the control performance observable, manageable, and auditable?
- is the control crucial to preventing the event or minimizing the consequences of the event?
- is it the only control or is it backed by another control in the event it fails?
- would the control’s absence or failure significantly increase the risk despite the existence of other controls?
- does the control address multiple cases or mitigate multiple consequences of the hazard?
In other words, if a control appears in multiple places on the bow-tie or in a number of bow-ties, this may indicate it’s a critical control. So if one answers yes to most of these questions, then that helps determine whether the control is critical. But in many ways, it’s common sense.
Risk and control owners should know the critical controls and it’s their responsibility to have data to show that the controls are working with efficacy.
It’s the auditor’s job to help control owners through this process and audit the most critical controls to ascertain that the control owners have processes in place that ensure efficacy on an ongoing basis.
The Swiss Cheese model
This model suggests that multiple critical controls have to fail for really bad incidents to happen. Serious incidents happen when multiple controls with multiple layers of protection fail. All the failures have to line up under some unusual circumstances or operating conditions for a serious incident to happen.
The BP spill provides a great example of the Swiss Cheese Model of critical control failures. And in fact, in this case, 8 different critical control areas failed and lined up for the incident to occur.
BP actually had VPs on board the rig that day celebrating personal safety performance. Those VPs I don’t think had a risk registry or process safety risk and controls in each of these different categories in place that might have driven them to ask the right questions. And if they had, perhaps those controls might have been audited and the disaster averted.
Let’s first look at the most important inputs to an audit program
- incident analysis
- key performance indicator analysis
- incidents, both within the company and the industry
- failed critical controls
- major operational risks and control reviews
- OEMS (operations excellence management system) self-assessments and audits
- principal risks
- consultation with business units
- value drivers
- prior audit insights
- external risks
- risk registries that contain all the level one risks and associated controls
- legal registers/compliance obligations
Frequency of audits
Units with hazardous processes should be audited on at least a 3-year frequency and lower risk operations can be audited in a 5-years frequency.
Types of annual audits
Some organizations do, for example, three categories of audits annually:
- OEMs audits-where internal auditors deep dive on the weaknesses seen within the self-assessments for the operations sector management system
- Process safety audits-on process hazard analysis, mechanical integrity, quality insurance, inspection reportings and all the extra requirements that fall into process safety
- Risk-based audits, which are environmental, safety, emerging risks etc.
All programs in an organization have limited budgets but it is critical to ascertain that the critical controls are in place and are operating with efficacy.
Management should ensure that their front-line operational leaders clearly identify their hazards, risks and associated controls and have assigned an owner to each risk.
The first line of defence is the front-line staff. It is their responsibility to maintain evidence that the controls are in place, are adequate for the nature and depth of the risk and are working properly.
The second line of defence is the management above the risk and control owner. It is their responsibility to validate that the front-line leaders have accepted the responsibility and that they as leaders have provided the front-line leaders with the resources needed to close any incidents, audit findings or fund additional controls that are needed.
The third line of defence is the corporate audit teams, the internal health, safety, environment and quality audit teams. It is their responsibility to start with the level one inherent risk controls and work their way down.
In an ideal world, front-line leaders should assess the efficacy of the risks and controls under their purview and auditors should only confirm that all is well. That’s really where every organization should be in the next five years if not sooner!