Richard Dawkins champions intelligent design (for business processes)

Richard Dawkins has recently had a couple of bad customer experiences. In each he was confronted with a system that seemed to him indifferent to his customer feedback. I sympathise with him on one matter but not the other. The two incidents do, in my mind, elucidate some important features of process discipline.

In the first, Dawkins spent a frustrating spell ordering a statement from his bank over the internet. He wanted to tell the bank about his experience and offer some suggestions for improvement, but he couldn’t find any means of channelling and communicating his feedback.

Embedding a business process in software will impose a rigid discipline on its operation. However, process discipline is not the same thing as process petrification. The design assumptions of any process include, or should include, the predicted range and variety of situations that the process is anticipated to encounter. We know that the bounded rationality of the designers will blind them to some of the situations that the process will subsequently confront in real world operation. There is no shame in that but the necessary adjunct is that, while the process is operated diligently as designed, data is accumulated on its performance and, in particular, on the customer’s experience. Once an economically opportune moment arrives (I have glossed over quote a bit there) the data can be reviewed, design assumptions challenged and redesign evaluated. Following redesign the process then embarks on another period of boring operation. The “boring” bit is essential to success. Perhaps I should say “mindful” rather than “boring” though I fear that does not really work with software.

Dawkins’ bank have missed an opportunity to listen to the voice of the customer. That weakens their competitive position. Ignorance cannot promote competitiveness. Any organisation that is not continually improving every process for planning, production and service (pace W Edwards Deming) faces the inevitable fact that its competitors will ultimately make such products and services obsolete. As Dawkins himself would appreciate, survival is not compulsory.

Dawkins’ second complaint was that security guards at a UK airport would not allow him to take a small jar of honey onto his flight because of a prohibition on liquids in the passenger cabin. Dawkins felt that the security guard should have displayed “common sense” and allowed it on board contrary to the black letter of the regulations. Dawkins protests against “rule-happy officials” and “bureaucratically imposed vexation”. Dawkins displays another failure of trust in bureaucracy. He simply would not believe that other people had studied the matter and come to a settled conclusion to protect his safety. It can hardly have been for the airport’s convenience. Dawkins was more persuaded by something he had read on the internet. He fell into the trap of thinking that What you see is all there is. I fear that Dawkins betrays his affinities with the cyclist on the railway crossing.

When we give somebody a process to operate we legitimately expect them to do so diligently and with self discipline. The risk of an operator departing from, adjusting or amending a process on the basis of novel local information is that, within the scope of the resources they have for taking that decision, there is no way of reliably incorporating the totality of assumptions and data on which the process design was predicated. Even were all the data available, when Dawkins talks of “common sense” he was demanding what Daniel Kahneman called System 2 thinking. Whenever we demand System 2 thinking ex tempore we are more likely to get System 1 and it is unlikely to perform effectively. The rationality of an individual operator in that moment is almost certainly more tightly bounded than that of the process designers.

In this particular case, any susceptibility of a security guard to depart from process would be exactly the behaviour that a terrorist might seek to exploit once aware of it.

Further, departures from process will have effects on the organisational system, upstream, downstream and collateral. Those related processes themselves rely on the operator’s predictable compliance. The consequence of ill discipline can be far reaching and unanticipated.

That is not to say that the security process was beyond improvement. In an effective process-oriented organisation, operating the process would be only one part of the security guard’s job. Part of the bargain for agreeing to the boring/ mindful diligent operation of the process is that part of work time is spent improving the process. That is something done offline, with colleagues, with the input of other parts of the organisation and with recognition of all the data including the voice of the customer.

Had he exercised the “common sense” Dawkins demanded, the security guard would have risked disciplinary action by his employers for serious misconduct. To some people, threats of sanctions appear at odds with engendering trust in an organisation’s process design and decision making. However, when we tell operators that something is important then fail to sanction others who ignore the process, we undermine the basis of the bond of trust with those that accepted our word and complied. Trust in the bureaucracy and sanctions for non-compliance are complementary elements of fostering process discipline. Both are essential.

Managing a railway on historical data is like …

I was recently looking on the web for any news on the Galicia rail crash. I didn’t find anything current but came across this old item from The Guardian (London). It mentioned in passing that consortia tendering for a new high speed railway in Brazil were excluded if they had been involved in the operation of a high speed line that had had an accident in the previous five years.

Well, I don’t think that there is necessarily anything wrong with that in itself. But it is important to remember that a rail accident is not necessarily a Signal (sic). Rail accidents worldwide are often a manifestation of what W Edwards Deming called A stable system of trouble. In other words, a system that features only Noise but which cannot deliver the desired performance. An accident free record of five years is a fine thing but there is nothing about a stable system of trouble that says it can’t have long incident free periods.

In order to turn that incident free five years into evidence about future likely safety performance we also need hard evidence, statistical and qualitative, about the stability and predictability of the rail operator’s processes. Procurement managers are often much worse at looking for, and at, this sort of data. In highly sophisticated industries such as automotive it is routine to demand capability data and evidence of process surveillance from a potential supplier. Without that, past performance is of no value whatever in predicting future results.

Rearview

Late-night drinking laws saved lives

That was the headline in The Times (London) on 19 August 2013. The copy went on:

“Hundreds of young people have escaped death on Britain’s roads after laws were relaxed to allow pubs to open late into the night, a study has found.”

It was accompanied by a chart.

How death toll fell

This conclusion was apparently based on a report detailing work led by Dr Colin Green at Lancaster University Management School. The report is not on the web but Lancaster were very kind in sending me a copy and I extend my thanks to them for the courtesy.

This is very difficult data to analyse. Any search for a signal has to be interpreted against a sustained fall in recorded accidents involving personal injury that goes back to the 1970s and is well illustrated in the lower part of the graphic (see here for data). The base accident data is therefore manifestly not stable and predictable. To draw inferences we need to be able to model the long term trend in a persuasive manner so that we can eliminate its influence and work with a residual data sequence amendable to statistical analysis.

It is important to note, however, that the authors had good reason to believe that relaxation of licensing laws may have an effect so this was a proper exercise in Confirmatory Data Analysis.

Reading the Lancaster report I learned that The Times graphic is composed of five-month moving averages. I do not think that I am attracted by that as a graphic. Shewhart’s Second Rule of Data Presentation is:

Whenever an average, range or histogram is used to summarise observations, the summary must not mislead the user into taking any action that the user would not take if the data were presented in context.

I fear that moving-averages will always obscure the message in the data. I preferred this chart from the Lancaster report. The upper series are for England, the lower for Scotland.

Drink Drive scatter

Now we can see the monthly observations. Subjectively there looks to be, at least in some years, some structure of variation throughout the year. That is unsurprising but it does ruin all hope of justifying an assumption of “independent identically distributed” residuals. Because of that alone, I feel that the use of p-values here is inappropriate, the usual criticisms of p-values in general notwithstanding (see the advocacy of Stephen Ziliak and Deirdre McCloskey).

As I said, this is very tricky data from which to separate signal and noise. Because of the patterned variation within any year I think that there is not much point in analysing other than annual aggregates. The analysis that I would have liked to have seen would have been a straight line regression through the whole of the annual data for England. There may be an initial disappointment that that gives us “less data to play with”. However, considering the correlation within the intra-year monthly figures, a little reflection confirms that there is very little sacrifice of real information. I’ve had a quick look at the annual aggregates for the period under investigation and I can’t see a signal. The analysis could be taken further by calculating an R2. That could then be compared with an R2 calculated for the Lancaster bi-linear “change point” model. Is the extra variation explained worth having for the extra parameters?

I see that the authors calculated an R2 of 42%. However, that includes accounting for the difference between English and Scottish data which is the dominant variation in the data set. I’m not sure what the Scottish data adds here other than to inflate R2.

There might also be an analysis approach by taking out the steady long term decline in injuries using a LOWESS curve then looking for a signal in the residuals.

What that really identifies are three ways of trying to cope with the long term declining trend, which is a nuisance in this analysis: straight line regression, straight line regression with “change point”, and LOWESS. If they don’t yield the same conclusions then any inference has to be treated with great caution. Inevitably, any signal is confounded with lack of stability and predictability in the long term trend.

I comment on this really to highlight the way the press use graphics without explaining what they mean. I intend no criticism of the Lancaster team as this is very difficult data to analyse. Of course, the most important conclusion is that there is no signal that the relaxation in licensing resulted in an increase in accidents. I trust that my alternative world view will be taken constructively.

Risks of Paediatric heart surgery in the NHS

I thought, before posting, I would let the controversy die down around this topic and in particular the anxieties and policy changes around Leeds General Infirmary. However, I had a look at this report and found there were some interesting things in it worth blogging about.

Readers will remember that there was anxiety in the UK about mortality rates from paediatric surgery and whether differential mortality rates from the limited number of hospitals was evidence of relative competence and, moreover, patient safety. For a time Leeds General Infirmary suspended all such surgery. The report I’ve been looking at was a re-analysis of the data after some early data quality problems had been resolved. Leeds was exonerated and recommenced surgery.

The data analysed is from 2009 to 2012. The headline graphic in the report is this. The three letter codes indicate individual hospitals.

Heart Summary

I like this chart as it makes an important point. There is nothing, in itself, significant about having the highest mortality rate. There will always be exactly two hospitals at the extremes of any league table. The task of data analysis is to tell us whether that is simply a manifestation of the noise in the system or whether it is a signal of an underlying special cause. Nate Silver makes these points very well in his book The Signal and the Noise. Leeds General Infirmary had the greatest number of deaths, relative to expectations, but then somebody had to. It may feel emotionally uncomfortable being at the top but it is no guide to executive action.

Statisticians like the word “significant” though I detest it. It is a “word worn smooth by a million tongues”. The important idea is that of a sign or signal that stands out in unambiguous contrast to the noise. As Don Wheeler observed, all data has noise, some data also has signals. Even the authors of the report seem to have lost confidence in the word as they enclose it in quotes in their introduction. However, what this report is all about is trying to separate signal from noise. Against all the variation in outcomes in paediatric heart surgery, is there a signal? If so, what does the signal tell us and what ought we to do?

The authors go about their analysis using p-values. I agree with Stephen Ziliak and Deirdre McCloskey in their criticism of p-values. They are a deeply unreliable as a guide to action. I do not think they do much harm they way they are used in this report but I would have preferred to see the argument made in a different way.

The methodology of the report starts out by recognising that the procedural risks will not be constant for all hospitals. Factors such as differential distributions of age, procedural complexity and the patient’s comorbidities will affect the risk. The report’s analysis is predicated on a model (PRAiS) that predicts the number of deaths to be expected in a given year as a function of these sorts of variables. The model is based on historical data, I presume from before 2009. I shall call this the “training” data. The PRAiS model endeavours to create a “level playing field”. If the PRAiS adjusted mortality figures are stable and predictable then we are simply observing noise. The noise is the variation that the PRAiS model cannot explain. It is caused by factors as yet unknown and possibly unknowable. What we are really interested in is whether any individual hospital in an individual year shows a signal, a mortality rate that is surprising given the PRAiS prediction.

The authors break down the PRAiS adjusted data by year and hospital. They then take a rather odd approach to the analysis. In each year, they make a further adjustment to the observed deaths based on the overall mortality rate for all hospitals in that year. I fear that there is no clear explanation as to why this was adopted.

I suppose that this enables them to make an annual comparison between hospitals. However, it does have some drawbacks. Any year-on-year variation not explained by the PRAiS model is part of the common cause variation, the noise, in the system. It ought to have been stable and predictable over the data with which the model was “trained”. It seems odd to adjust data on the basis of noise. If there were a deterioration common to all hospitals, it would not be picked up in the authors’ p-values. Further, a potential signal of deterioration in one hospital might be masked by a moderately bad, but unsurprising, year in general.

What the analysis does mask is that there is a likely signal here suggesting a general improvement in mortality rates common across the hospitals. Look at 2009-10 for example. Most hospitals reported fewer deaths than the PRAiS model predicted. The few that didn’t, barely exceeded the prediction.

Hear0910

In total, over the three years and 9930 procedures studied, the PRAiS model predicted 291 deaths. There were 243. For what it’s worth, I get a p-value of 0.002. Taking that at face value, there is a signal that mortality has dropped. Not a fact that I would want to disguise.

The plot that I would like to have seen, as an NHS user, would be a chart of PRAiS adjusted annual deaths against time for the “training” data. That chart should then have natural process limits (“NPLs”) added, calculated from the PRAiS adjusted deaths. This must show stable and predictable PRAiS adjusted deaths. Otherwise, the usefulness of the model and the whole approach is compromised. The NPLs could then be extended forwards in time and subsequent PRAiS adjusted mortalities charted on an annual basis. There would be individual hospital charts and a global chart. New points would be added annually.

I know that there is a complexity with the varying number of patients each year but if plotted in the aggregate and by hospital there is not enough variation, I think, to cause a problem.

The chart I suggest has some advantages. It would show performance over time in a manner transparent to NHS users. Every time the data comes in issue we could look and see that we have the same chart as last time we looked with new data added. We could see the new data in the context of the experience base. That helps build trust in data. There would be no need for an ad hoc analysis every time a question was raised. Further, the “training” data would give us the residual process variation empirically. We would not have to rely on simplifying assumptions such as the Poisson distribution when we are looking for a surprise.

There is a further point. The authors of the report recognise a signal against two criteria, an “Alert area” and an “Alarm area”. I’m not sure how clinicians and managers respond to a signal in these respective areas. It is suggestive of the old-fashioned “warning limits” that used to be found on some control charts. However, the authors of the report compound matters by then stating that hospitals “approaching the alert threshold may deserve additional scrutiny and monitoring of current performance”. The simple truth is that, as Terry Weight used to tell me, a signal is a signal is a signal. As soon as we see a signal we protect the customer and investigate its cause. That’s all there is to it. There is enough to do in applying that tactic diligently. Over complicating the urgency of response does not, I think, help people to act effectively on data. If we act when there is no signal then we have a strategy that will make outcomes worse.

Of course, I may have misunderstood the report and I’m happy for the authors to post here and correct me.

If we wish to make data the basis for action then we have to move from reactive ad hoc analysis to continual and transparent measurement along with a disciplined pattern of response. Medical safety strikes me as exactly the sort of system that demands such an approach.