Music is silver but …

The other day I came across a report on the BBC website that non-expert listeners could pick out winners of piano competitions more reliably when presented with silent performance videos than when exposed to sound alone. In the latter case they performed no better than chance.

The report was based on the work of Chia-Jung Tsay at University College London, in a paper entitled Sight over sound in the judgment of music performance.

The news report immediately leads us to suspect that the expert evaluating a musical performance is not in fact analysing and weighing auditory complexity and aesthetics but instead falling under the subliminal influence of the proxy data of the artist’s demeanour and theatrics.

That is perhaps unsurprising. We want to believe, as does the expert critic, that performance evaluation is a reflective, analytical and holistic enterprise, demanding decades of exposure to subtle shades of interpretation and developing skills of discrimination by engagement with the ascendant generation of experts. This is what Daniel Kahneman calls a System 2 task. However, a wealth of psychological study shows only too well that System 2 is easily fatigued and distracted. When we believe we are thinking in System 2, we are all too often loafing in System 1 and using simplistic learned heuristics as a substitute. It is easy to imagine that the visual proxy data might be such a heuristic, a ready reckoner that provides a plausible result in a wide variety of commonly encountered situations.

These behaviours are difficult to identify, even for the most mindful individual. Kahneman notes:

… all of us live much of our lives guided by the impressions of System 1 – and we do not know the source of these impressions. How do you know that a statement is true? If it is strongly linked by logic or association to other beliefs or preferences you hold, or comes from a source you trust and like, you will feel a sense of cognitive ease. The trouble is that there may be other causes for your feeling of ease … and you have no simple way of tracing your feelings to their source”

Thinking, Fast and Slow, p64

The problem is that what Kahneman describes is exactly what I was doing in finding my biases confirmed by this press report. I have had a superficial look at the statistics in this study and I am now less persuaded than when I read the press item. I shall maybe blog about this later and the difficulties I had in interpreting the analysis. Really, this is quite a tentative and suggestive study on a very limited frame. I would certainly like to see more inter-laboratory studies in psychology. The study is open to multiple interpretations and any individual will probably have difficulty making an exhaustive list.  There is always a danger of falling into the trap of What You See Is All There Is (WYSIATI).

That notwithstanding, even anecdotally, the story is another reminder of an important lesson of process management that, even though what we have been doing has worked in the past, we may not understand what it is that has been working.

Walkie-Talkie “death ray” and risk identification

News media have been full of the tale of London’s Walkie-Talkie office block raising temperatures on the nearby highway to car melting levels.

The full story of how the architects and engineers created the problem has yet to be told. It is certainly the case that similar phenomena have been reported elsewhere. According to one news report, the Walkie-Talkie’s architect had worked on a Las Vegas hotel that caused similar problems back in September 2010.

More generally, an external hazard from a product’s optical properties is certainly something that has been noted in the past. It appears from this web page that domestic low-emissivity (low-E) glass was suspected of setting fire to adjacent buildings as long ago as 2007. I have not yet managed to find the Consumer Product Safety Commission report into low-E glass but I now know all about the hazards of snow globes.

The Walkie-Talkie phenomenon marks a signal failure in risk management and it will cost somebody to fix it. It is not yet clear whether this was a miscalculation of a known hazard or whether the hazard was simply neglected from the start.

Risk identification is the most fundamental part of risk management. If you have failed to identify a risk you are not in a position to control, mitigate or externalise it in advance. Risk identification is also the hardest part. In the case of the Walkie-Talkie, modern materials, construction methods and aesthetic tastes have conspired to create a phenomenon that was not, at least as an accidental feature, present in structures before this century. That means that risk identification is not a matter of running down a checklist of known hazards to see which apply. Novel and emergent risks are always the most difficult to identify, especially where they involve the impact of an artefact on its environment. This is a real, as Daniel Kahneman would put it, System 2 task. The standard checklist propels it back to the flawed System 1 level. As we know, even when we think we are applying a System 2 mindset, me may subconsciously be loafing in a subliminal System 1.

It is very difficult to spot when something has been missed out of a risk assessment, even in familiar scenarios. In a famous 1978 study by Fischhoff, Slovic and others, they showed to college students fault trees analysing potential causes of a car’s failure to start (this is 1978). Some of the fault trees had been “pruned”. One branch, representing say “battery charge”, had been removed. The subjects were very poor at spotting that a major, and well known, source of failure had been omitted from the analysis. Where failure modes are unfamiliar, it is even more difficult to identify the lacuna.

Even where failure modes are identified, if they are novel then they still present challenges in effective design and risk management. Henry Petroski, in Design Paradigms, his historical analysis of human error in structural engineering, shows how novel technologies present challenges for the development of new engineering methodologies. As he says:

There is no finite checklist of rules or questions that an engineer can apply and answer in order to declare that a design is perfect and absolutely safe, for such finality is incompatible with the whole process, practice and achievement of engineering. Not only must engineers preface any state-of-the-art analysis with what has variously been called engineering thinking and engineering judgment, they must always supplement the results of their analysis with thoughtful and considered interpretations of the results.

I think there are three principles that can help guard against an overly narrow vision. Firstly, involve as broad a selection of people as possible in hazard identification. Perhaps, diagonal slice the organisation. Do not put everybody in a room together where they can converge rapidly. This is probably a situation where some variant of the Delphi method can be justified.

Secondly, be aware that all assessments are provisional. Make design assumptions explicit. Collect data at every stage, especially on your assumptions. Compare the data with what you predicted would happen. Respond to any surprises by protecting the customer and investigating. Even if you’ve not yet melted a Jaguar, if the glass is looking a little more reflective than you thought it would be, take immediate action. Do not wait until you are in the Evening Standard. There is a reputation management side to this too.

Thirdly, as Petroski advocates, analysis of case studies and reflection on the lessons of history helps to develop broader horizons and develop a sense of humility. It seems nobody’s life is actually in danger from this “death ray” but the history of failures to identify risk leaves a more tangible record of mortality.

Trust in data – II

I just picked up on this, now not so recent, news item about the prosecution of Steven Eaton. Eaton was gaoled for falsifying data in clinical trials. His prosecution was pursuant to the Good Laboratory Practice Regulations 1999. The Regulations apply to chemical safety assessments and come to us, in the UK, from that supra-national body the OECD. Sadly I have managed to find few details other than the press reports. I have had a look at the website of the prosecuting Medicines and Healthcare Products Regulatory Agency but found nothing beyond the press release. I thought about a request under the Freedom of Information Act 2000 but wonder whether an exemption is being claimed pursuant to section 31.

It’s a shame because it would have been an opportunity to compare and contrast with another notable recent case of industrial data fabrication, that concerning BNFL and the Kansai Electric contract. Fortunately, in that case, the HSE made public a detailed report.

In the BNFL case, technicians had fabricated measurements of the diameters of fuel pellets in nuclear fuel rods, it appears principally out of boredom at doing the actual job. The customer spotted it, BNFL didn’t. The matter caused huge reputational damage to BNFL and resulted in the shipment of nuclear fuel rods, necessarily under armed escort, being turned around mid-ocean and returned to the supplier.

For me, the important lesson of the BNFL affair is that businesses must avoid a culture where employees decide what parts of the job are important and interesting to them, what is called intrinsic motivation. Intrinsic motivation is related to a sense of cognitive ease. That sense rests, as Daniel Kahneman has pointed out, on an ecology of unknown and unknowable beliefs and prejudices. No doubt the technicians had encountered nothing but boringly uniform products. They took that as a signal, and felt a sense of cognitive ease in doing so, to stop measuring and conceal the fact that they had stopped.

However, nobody in the supply chain is entitled to ignore the customer’s wishes. Businesses need to foster the extrinsic motivation of the voice of the customer. That is what defines a job well done. Sometimes it will be irksome and involve a lot of measuring pellets whose dimensions look just the same as the last batch. We simply have to get over it!

The customer wanted the data collected, not simply as a sterile exercise in box-ticking, but as a basis for diligent surveillance of the manufacturing process and as a critical component of managing the risks attendant in real world nuclear industry operations. The customer showed that a proper scrutiny of the data, exactly what they had thought that BNFL would perform as part of the contract, would have exposed its inauthenticity. BNFL were embarrassed, not only by their lack of management control of their own technicians, but by the exposure of their own incapacity to scrutinise data and act on its signal message. Even if all the pellets were of perfect dimension, the customer would be legitimately appalled that so little critical attention was being paid to keeping them so.

Data that is properly scrutinised, as part of a system of objective process management and with the correct statistical tools, will readily be exposed if it is fabricated. That is part of incentivising technicians to do the job diligently. Dishonesty must not be tolerated. However, it is essential that everybody in an organisation understands the voice of the customer and understands the particular way in which they themselves add value. A scheme of goal deployment weaves the threads of the voice of the customer together with those of individual process management tactics. That is what provides an individual’s insight into how their work adds value for the customer. That is what provides the “nudge” towards honesty.

Late-night drinking laws saved lives

That was the headline in The Times (London) on 19 August 2013. The copy went on:

“Hundreds of young people have escaped death on Britain’s roads after laws were relaxed to allow pubs to open late into the night, a study has found.”

It was accompanied by a chart.

How death toll fell

This conclusion was apparently based on a report detailing work led by Dr Colin Green at Lancaster University Management School. The report is not on the web but Lancaster were very kind in sending me a copy and I extend my thanks to them for the courtesy.

This is very difficult data to analyse. Any search for a signal has to be interpreted against a sustained fall in recorded accidents involving personal injury that goes back to the 1970s and is well illustrated in the lower part of the graphic (see here for data). The base accident data is therefore manifestly not stable and predictable. To draw inferences we need to be able to model the long term trend in a persuasive manner so that we can eliminate its influence and work with a residual data sequence amendable to statistical analysis.

It is important to note, however, that the authors had good reason to believe that relaxation of licensing laws may have an effect so this was a proper exercise in Confirmatory Data Analysis.

Reading the Lancaster report I learned that The Times graphic is composed of five-month moving averages. I do not think that I am attracted by that as a graphic. Shewhart’s Second Rule of Data Presentation is:

Whenever an average, range or histogram is used to summarise observations, the summary must not mislead the user into taking any action that the user would not take if the data were presented in context.

I fear that moving-averages will always obscure the message in the data. I preferred this chart from the Lancaster report. The upper series are for England, the lower for Scotland.

Drink Drive scatter

Now we can see the monthly observations. Subjectively there looks to be, at least in some years, some structure of variation throughout the year. That is unsurprising but it does ruin all hope of justifying an assumption of “independent identically distributed” residuals. Because of that alone, I feel that the use of p-values here is inappropriate, the usual criticisms of p-values in general notwithstanding (see the advocacy of Stephen Ziliak and Deirdre McCloskey).

As I said, this is very tricky data from which to separate signal and noise. Because of the patterned variation within any year I think that there is not much point in analysing other than annual aggregates. The analysis that I would have liked to have seen would have been a straight line regression through the whole of the annual data for England. There may be an initial disappointment that that gives us “less data to play with”. However, considering the correlation within the intra-year monthly figures, a little reflection confirms that there is very little sacrifice of real information. I’ve had a quick look at the annual aggregates for the period under investigation and I can’t see a signal. The analysis could be taken further by calculating an R2. That could then be compared with an R2 calculated for the Lancaster bi-linear “change point” model. Is the extra variation explained worth having for the extra parameters?

I see that the authors calculated an R2 of 42%. However, that includes accounting for the difference between English and Scottish data which is the dominant variation in the data set. I’m not sure what the Scottish data adds here other than to inflate R2.

There might also be an analysis approach by taking out the steady long term decline in injuries using a LOWESS curve then looking for a signal in the residuals.

What that really identifies are three ways of trying to cope with the long term declining trend, which is a nuisance in this analysis: straight line regression, straight line regression with “change point”, and LOWESS. If they don’t yield the same conclusions then any inference has to be treated with great caution. Inevitably, any signal is confounded with lack of stability and predictability in the long term trend.

I comment on this really to highlight the way the press use graphics without explaining what they mean. I intend no criticism of the Lancaster team as this is very difficult data to analyse. Of course, the most important conclusion is that there is no signal that the relaxation in licensing resulted in an increase in accidents. I trust that my alternative world view will be taken constructively.

Sushi – not as lean as you thought

We all fondly imagine that the TPS and lean permeate everything Japanese. And what could be more Japanese than conveyor belt sushi?

My wife and I went out for a quick sushi lunch yesterday. It was fairly quiet being a late August holiday Monday in London. The conveyor was not doing much business but I was alarmed to see the assistant preparing more and more plates and adding them to stock in increasingly precarious piles behind the conveyor belt.

Got an assistant with nothing to do? Get them to make for stock apparently.