Bang! UK Passport Office hits the kerb

Her Majesty's Passport OfficeThe UK’s Passport Office is in difficulties. They have a backlog that is resulting in customers’ passport applications being delayed. This is not a mere internal procedural inconvenience. The public has noticed the problem and started complaining. Emergency measures are being put in place to deal with the backlog. Politicians have become involved and are looking over their shoulders at their careers.

It is a typical organisational mess. There is a problem. Resources are thrown at it. Personalities wager their reputations. Any hero able to solve the problem will be feted and rewarded. There will be blame and punishment. Solutions will involve huge cost. The costs will be passed on to the customer because, in the end, there is no one else to pay.

A suggestion for investigation

From the outside, it is impossible to know the realities of what has caused the problem at HM Passport Office. However, I think I can respectfully and tentatively suggest some questions to ask in any inquiry as to how the mess occurred.

  • Had any surprising variation in passport processing occurred before the crisis hit?
  • If so, what action, if any, was taken?
  • Why was the action ineffective?
  • If no surprising variation was observed, were the managers measuring “upstream” indicators of process performance in addition to mere volumes?
  • Was historic data routinely interrogated to find signals among the noise?
  • If signals were only observed once it was too late to protect the customer, was the issuing process only marginally capable?

“Managing the passport issuing process on historical data is like …”

… trying to drive a car by watching the line in the rear-view mirror.

Myron Tribus

And, of course, that is what HM Passport Office and every manager has to do. There is only historical data. There is no data on the future. You cannot see out of the windscreen of the organisational SUV. Management is about subjecting the historic experience base to continual, rigorous statistical criticism to separate signal from noise. It is about having a good rear view mirror.

A properly managed, capable process will operate reliably, well within customer expectations. In process management terms, the Voice of the Process will be reliably aligned with the Voice of the Customer.

Forever improving the capability of the process gives it the elbow room or “rattle space” within which signals can occur that the customer never perceives. Those signals could represent changes in customer behaviour, problems within the organisation, or external events that have an impact. But the fact that they are unnoticed by the customer does not mean those signals are unimportant or can be neglected. It is by taking action to investigate those signals when they are detected, and by making necessary adjustments to work processes, that a future crisis can be averted.

While the customer is unaffected, the problem can be thoroughly investigated, solutions considered calmly and alternative remedies tested. Because the problem is invisible to the outside world there will be no sense of panic, political pressure, cash-flow deficit, reputational damage or destruction of employee engagement. The matter can be addressed soundly and privately.

Continual statistical analysis is the “rear view mirror”. It gives an historical picture as to how well the Voice of the Process emulates the Voice of the Customer. Coupled with a “roadmap” of the business, some supportive data from the “speedometer” and a little basic numeracy, the “rear view mirror” enables sensible predictions to be made about the near future.

Without that historical data, properly presented on live process behaviour charts to provide running statistical insight, then there is no rear view mirror. That is when the only business guidance is the Bang! when the organisation hits the kerb.

It looks like that is what happened at HM Passport Office. Everything was fine until the customers started complaining to the press. Bang! That’s how it looks to the customer and that is the only reality that counts.


The dark side of discipline

W Edwards Deming was very impressed with Japanese railways. In Out of the Crisis (1986) he wrote this.

The economy of a single plan that will work is obvious. As an example, may I cite a proposed itinerary in Japan:

          1725 h Leave Taku City.
          1923 h Arrive Hakata.
Change trains.
          1924 h Leave Hakata [for Osaka, at 210 km/hr]

Only one minute to change trains? You don’t need a whole minute. You will have 30 seconds left over. No alternate plan was necessary.

My friend Bob King … while in Japan in November 1983 received these instructions to reach by train a company that he was to visit.

          0903 h Board the train. Pay no attention to trains at 0858, 0901.
          0957 h Off.

No further instruction was needed.

Deming seemed to assume that these outcomes were delivered by a capable and, moreover, stable system. That may well have been the case in 1983. However, by 2005 matters had drifted.

Aftermath of the Amagasaki rail crashThe other night I watched, recorded from the BBC, the documentary Brakeless: Why Trains Crash about the Amagasaki rail crash on 25 April 2005. I fear that it is no longer available in BBC iPlayer. However, most of the documentaries in this BBC Storyville strand are independently produced and usually have some limited theatrical release or are available elsewhere. I now see that the documentary is available here on Dailymotion.

The documentary painted a system of “discipline” on the railway where drivers were held directly responsible for outcomes, overridingly punctuality. This was not a documentary aimed at engineers but the first thing missing for me was any risk assessment of the way the railway was run. Perhaps it was there but it is difficult to see what thought process would lead to a failure to mitigate the risks of production pressures.

However, beyond that, for me the documentary raised some important issues of process discipline. We must be very careful when we make anyone working within a process responsible for its outputs. That sounds a strange thing to say but Paul Jennings at Rolls-Royce always used to remind me You can’t work on outcomes.

The difficulty that the Amagasaki train drivers had was that the railway was inherently subject to sources of variation over which the drivers had no control. In the face of those sources of variation, they were pressured to maintain the discipline of a punctual timetable. They way they did that was to transgress other dimensions of process discipline, in the Amagasaki case, speed limits.

Anybody at work must diligently follow the process given to them. But if that process does not deliver the intended outcome then that is the responsibility of the manager who owns the process, not the worker. When a worker, with the best of intentions, seeks independently to modify the process, they are in a poor position, constrained as they are by their own bounded rationality. They will inevitably by trapped by System 1 thinking.

Of course, it is great when workers can get involved with the manager’s efforts to align the voice of the process with the voice of the customer. However, the experimentation stops when they start operating the process live.

Fundamentally, it is a moral certainty that purblind pursuit of a target will lead to over-adjustment by the worker, what Deming called “tampering”. That in turn leads to increased costs, aggravated risk and vitiated consumer satisfaction.

A personal brush with existential risk

Blutdruck.jpgI visited my GP (family physician) last week on a minor matter which I am glad to say is now cleared up totally. However, the receptionist was very eager to point out that I had not taken up my earlier invitation to a cardiovascular assessment. I suspect there was some financial incentive for the practice. I responded that I was uninterested. I knew the general lifestyle advice being handed out and how I felt about it. However, she insisted and it seemed she would never book me in for my substantive complaint unless I agreed. So I agreed.

I had my blood pressure measured (ok), and good and bad cholesterol (both ok which was a surprise). Finally, the nurse gave me a percentage risk of cardiovascular disease. The number wasn’t explained and I had to ask if the number quoted was the annual risk of contracting cardiovascular disease (that’s what I had assumed) or something else. However, it turned out to be the total risk over the next decade. The quoted risk was much lower than I would have guessed so I feel emboldened in my lifestyle. The campaign’s efforts to get me to mend my ways backfired.

Of course, I should not take this sort of thing at face value. The nurse was unable to provide me with any pseudo-R2 for the logistic regression or even the Hosmer–Lemeshow statistic for that matter.

I make light of the matter but logistic regression is very much in vogue at the moment. It provides some of the trickiest issues in analysing model quality and any user would be naïve to rely on it as a basis for action without understanding whether it really was explaining any variation in outcome. Issues of stability and predictability (see Rear View tab at the head of this page) get even less attention because of their difficulty. However, issues of model quality and exchangeability do not go away because they are alien to the analysis.

When governments offer statistics such as this, we risk cynicism and disengagement if we ask the public to take them more glibly than we would ourselves.


Power cables and ejector seats – two tales of failed risk management

File:RAF Red Arrows - Rhyl Air Show.jpgThe last week has seen findings in two inquests in England that point, I think, to failures in engineering risk management. The first concerns the tragic death of Flight Lieutenant Sean Cunningham. Flight Lieutenant Cunningham was killed by the spontaneous and faulty operation of an ejector seat on his Hawk T1 (this report from the BBC has some useful illustrations).

One particular cause of Flight Lieutenant Cunningham’s death was the failure of the ejector seat parachute to deploy. This was because a single nut and bolt being over tightened. It appears that this risk of over tightening was known to the manufacturer, it says in the news report for some 20 years.

Single-point failure modes such as this, where one thing going wrong can cause disaster, present particular hazards. Usual practice is to pay particular care to ensure that they are designed conservatively, that integrity is robust against special causes, and that manufacture and installation are controlled and predictable. It does surprise me that a manufacturer of safety equipment would permit such a hazard where danger of death could arise from human error in over tightening the nut or simple mechanical problems in the nut and bolt themselves. It is again surprising that the failure mode could not have been designed out. I suspect that we have insufficient information from the BBC. It does seem that the mechanical risk was compounded by the manufacturer’s failure even to warn the RAF of the danger.

Single point failure modes need to be addressed with care, even where institutional and economic considerations obstruct redesign. It is important to realise that human error is never the root cause of any failure. Humans make errors. Systems need to be designed so that they are robust against human frailty and bounded rationality.

File:Pylon ds.jpgThe second case, equally tragic, was that of Dr James Kew. Dr Kew was out running in a field when he was electrocuted by a “low hanging” 11kV power line. When I originally read this I had thought that it was an example of a high impedance fault. Such faults happen where, for example, a power line drops into a tree. Because of the comparatively high electrical impedance of the tree there is insufficient current to activate the circuit breaker and the cable remains dangerously live. Again there is not quite enough information to work out exactly what happened in Dr Kew’s case. However, it appears that the power cable was hanging down in some way rather than having fallen into some other structure.

Again, mechanical failure of a power line that does not activate the circuit breaker is a well anticipated failure mode. It is one that can present a serious hazard to the public but is not particularly easy to eliminate. It certainly seems here that the power company changed its procedures after Dr Kew’s death. There was more they could have done beforehand.

Both tragic deaths illustrate the importance of keeping risk assessments under review and critically re-evaluating them, even in the absence of actual failures. Engineers usually know where their arguments and rationales are thinnest. Just because we decided this was OK in the past, it’s possible that we’ve just been lucky. There is a particular opportunity when new people join the team. That is a great opportunity to challenge orthodoxy and drive risk further out of the system. I wonder whether there should not be an additional column on every FMEA headed “confidence in reasoning”.