A personal brush with existential risk

Blutdruck.jpgI visited my GP (family physician) last week on a minor matter which I am glad to say is now cleared up totally. However, the receptionist was very eager to point out that I had not taken up my earlier invitation to a cardiovascular assessment. I suspect there was some financial incentive for the practice. I responded that I was uninterested. I knew the general lifestyle advice being handed out and how I felt about it. However, she insisted and it seemed she would never book me in for my substantive complaint unless I agreed. So I agreed.

I had my blood pressure measured (ok), and good and bad cholesterol (both ok which was a surprise). Finally, the nurse gave me a percentage risk of cardiovascular disease. The number wasn’t explained and I had to ask if the number quoted was the annual risk of contracting cardiovascular disease (that’s what I had assumed) or something else. However, it turned out to be the total risk over the next decade. The quoted risk was much lower than I would have guessed so I feel emboldened in my lifestyle. The campaign’s efforts to get me to mend my ways backfired.

Of course, I should not take this sort of thing at face value. The nurse was unable to provide me with any pseudo-R2 for the logistic regression or even the Hosmer–Lemeshow statistic for that matter.

I make light of the matter but logistic regression is very much in vogue at the moment. It provides some of the trickiest issues in analysing model quality and any user would be naïve to rely on it as a basis for action without understanding whether it really was explaining any variation in outcome. Issues of stability and predictability (see Rear View tab at the head of this page) get even less attention because of their difficulty. However, issues of model quality and exchangeability do not go away because they are alien to the analysis.

When governments offer statistics such as this, we risk cynicism and disengagement if we ask the public to take them more glibly than we would ourselves.

 

The future of p-values

Another attack on p-values. This time by Regina Nuzzo in prestigious science journal Nature. Nuzzo advances the usual criticisms clearly and trenchantly. I hope that this will start to make people think hard about using probability to make decisions.

However, for me, the analysis still does not go deep enough. US baseball commentator Yogi Berra is reputed once to have observed that:

It’s tough to make predictions, especially about the future.

The fact that scientists work with confidence intervals, whereas if society is interested in such things it is interested in prediction intervals, belies the proper recognition of the future in much scientific writing.

The principal reason for doing statistics is to improve the reliability of predictions and forecasts. But the foundational question is whether the past is even representative of the future. Unless the past is representative of the future then it is of no assistance in forecasting. Many statisticians have emphasised the important property that any process must display to allow even tentative predictions about its future behaviour. Johnson and de Finetti called the property exchangeability, Shewhart and Deming called it statistical control, Don Wheeler coined the more suggestive term stable and predictable.

Shewhart once observed:

Both pure and applied science have gradually pushed further and further the requirements for accuracy and precision. However, applied science, particularly in the mass production of interchangeable parts, is even more exacting than pure science in certain matters of accuracy and precision.

Perhaps its unsurprising then that the concept is more widely relied upon in business than in scientific writing. All the same, statistical analysis begins and ends with considerations of stability. An analysis in which p-values do not assist.

At the head of this page is a tab labelled “Rearview” where I have surveyed the matter more widely. I would like to think of this as supplementary to Nuzzo’s views.

Deconstructing Deming III – Cease reliance on inspection

3. Cease dependence on inspection to achieve quality. Eliminate the need for massive inspection by building quality into the product in the first place.

W Edwards Deming Point 3 of Deming’s 14 Points. This at least cannot be controversial. For me it goes to the heart of Deming’s thinking.

The point is that every defective item produced (or defective service delivered) has taken cash from the pockets of customers or shareholders. They should be more angry. One day they will be. Inputs have been purchased with their cash, their resources have been deployed to transform the inputs and they will get nothing back in return. They will even face the costs of disposing of the scrap, especially if it is environmentally noxious.

That you have an efficient system for segregating non-conforming from conforming is unimpressive. That you spend even more of other people’s money reworking the product ought to be a matter of shame. Lean Six Sigma practitioners often talk of the hidden factory where the rework takes place. A factory hidden out of embarrassment. The costs remain whether you recognise them or not. Segregation is still more problematic in service industries.

The insight is not unique to Deming. This is a common theme in Lean, Six Sigma, Theory of Constraints and other approaches to operational excellence. However, Deming elucidated the profound statistical truths that belie the superficial effectiveness of inspection.

Inspection is inefficient

When I used to work in the railway industry I was once asked to look at what percentage of signalling scheme designs needed to be rechecked to defend against the danger of a logical error creeping through. The problem requires a simple application of Bayes’ theorem. I was rather taken aback at the result. There were only two strategies that made sense: recheck everything or recheck nothing. I didn’t at that point realise that this is a standard statistical result in inspection theory. For a wide class of real world situations, where the objective is to segregate non-conforming from conforming, the only sensible sampling schemes are 100% or 0%.

Where the inspection technique is destructive, such as a weld strength test, there really is only one option.

Inspection is ineffective

All inspection methods are imperfect. There will be false-positives and false-negatives. You will spend some money scrapping product you could have sold for cash. Some defective product will escape onto the market. Can you think of any examples in your own experience? Further, some of the conforming product will be only marginally conforming. It won’t delight the customer.

So build quality into the product

… and the process for producing the product (or delivering the service). Deming was a champion of the engineering philosophy of Genechi Taguchi who put forward a three-stage approach for achieving, what he called, off-line quality control.

  1. System design – in developing a product (or process) concept think about how variation in inputs and environment will affect performance. Choose concepts that are robust against sources of variation that are difficult or costly to control.
  2. Parameter design – choose product dimensions and process settings that minimise the sensitivity of performance to variation.
  3. Tolerance design – work out the residual sources of variation to which performance remains sensitive. Develop control plans for measuring, managing and continually reducing such variation.

Is there now no need to measure?

Conventional inspection aimed at approving or condemning a completed batch of output. The only thing of interest was the product and whether it conformed. Action would be taken on the batch. Deming called the application of statistics to such problems an enumerative study.

But the thing managers really need to know about is future outcomes and how they will be influenced by present decisions. There is no way of sampling the future. So sampling of the past has to go beyond mere characterisation and quantification of the outcomes. You are stuck with those and will have to take the consequences one way or another. Sampling (of the past) has to aim principally at understanding the causes of those historic outcomes. Only that enables managers to take a view on whether those causes will persist in the future, in what way they might change and how they might be adjusted. This is what Deming called an analytic study.

Essential to the ability to project data into the future is the recognition of common and special causes of variation. Only when managers are confident in thinking and speaking in those terms will their organisations have a sound basis for action. Then it becomes apparent that the results of inspection represent the occult interaction of inherent variation with threshold effects. Inspection obscures the distinction between common and special causes. It seduces the unwary into misguided action that exacerbates quality problems and reputational damage. It obscures the sad truth that, as Terry Weight put it, a disappointment is not necessarily a surprise.

The programme

  1. Drive out sensitivity to variation at the design stage.
  2. Routinely measure the inputs whose variation threatens product performance.
  3. Measure product performance too. Your bounded rationality may have led you to get (2) wrong.
  4. No need to measure every unit. We are trying to understand the cause system not segregate items.
  5. Plot data on a process behaviour chart.
  6. Stabilise the system.
  7. Establish capability.
  8. Keep on measuring to maintain stability and improve capability.

Some people think they have absorbed Deming’s thinking, mastered it even. Yet the test is the extent to which they are able to analyse problems in terms of common and special causes of variation. Is that the language that their organisation uses to communicate exceptions and business performance, and to share analytics, plans, successes and failures?

There has always been some distaste for Deming’s thinking among those who consider it cold, statistically driven and paralysed by data. But the data are only a means to getting beyond the emotional reaction to those two impostors: triumph and disaster. The language of common and special causes is a profound tool for building engagement, fostering communication and sharing understanding. Above that, it is the only sound approach to business measurement.

Power cables and ejector seats – two tales of failed risk management

File:RAF Red Arrows - Rhyl Air Show.jpgThe last week has seen findings in two inquests in England that point, I think, to failures in engineering risk management. The first concerns the tragic death of Flight Lieutenant Sean Cunningham. Flight Lieutenant Cunningham was killed by the spontaneous and faulty operation of an ejector seat on his Hawk T1 (this report from the BBC has some useful illustrations).

One particular cause of Flight Lieutenant Cunningham’s death was the failure of the ejector seat parachute to deploy. This was because a single nut and bolt being over tightened. It appears that this risk of over tightening was known to the manufacturer, it says in the news report for some 20 years.

Single-point failure modes such as this, where one thing going wrong can cause disaster, present particular hazards. Usual practice is to pay particular care to ensure that they are designed conservatively, that integrity is robust against special causes, and that manufacture and installation are controlled and predictable. It does surprise me that a manufacturer of safety equipment would permit such a hazard where danger of death could arise from human error in over tightening the nut or simple mechanical problems in the nut and bolt themselves. It is again surprising that the failure mode could not have been designed out. I suspect that we have insufficient information from the BBC. It does seem that the mechanical risk was compounded by the manufacturer’s failure even to warn the RAF of the danger.

Single point failure modes need to be addressed with care, even where institutional and economic considerations obstruct redesign. It is important to realise that human error is never the root cause of any failure. Humans make errors. Systems need to be designed so that they are robust against human frailty and bounded rationality.

File:Pylon ds.jpgThe second case, equally tragic, was that of Dr James Kew. Dr Kew was out running in a field when he was electrocuted by a “low hanging” 11kV power line. When I originally read this I had thought that it was an example of a high impedance fault. Such faults happen where, for example, a power line drops into a tree. Because of the comparatively high electrical impedance of the tree there is insufficient current to activate the circuit breaker and the cable remains dangerously live. Again there is not quite enough information to work out exactly what happened in Dr Kew’s case. However, it appears that the power cable was hanging down in some way rather than having fallen into some other structure.

Again, mechanical failure of a power line that does not activate the circuit breaker is a well anticipated failure mode. It is one that can present a serious hazard to the public but is not particularly easy to eliminate. It certainly seems here that the power company changed its procedures after Dr Kew’s death. There was more they could have done beforehand.

Both tragic deaths illustrate the importance of keeping risk assessments under review and critically re-evaluating them, even in the absence of actual failures. Engineers usually know where their arguments and rationales are thinnest. Just because we decided this was OK in the past, it’s possible that we’ve just been lucky. There is a particular opportunity when new people join the team. That is a great opportunity to challenge orthodoxy and drive risk further out of the system. I wonder whether there should not be an additional column on every FMEA headed “confidence in reasoning”.