Data and anecdote revisited – the case of the lime jellybean

JellyBellyBeans.jpgI have already blogged about the question of whether data is the plural of anecdote. Then I recently came across the following problem in the late Richard Jeffrey’s marvellous little book Subjective Probability: The Real Thing (2004, Cambridge) and it struck me as a useful template for thinking about data and anecdotes.

The problem looks like a staple of elementary statistics practice exercises.

You are drawing a jellybean from a bag in which you know half the beans are green, all the lime flavoured ones are green and the green ones are equally divided between lime and mint flavours.

You draw a green bean. Before you taste it, what is the probability that it is lime flavoured?

A mathematically neat answer would be 50%. But what if, asked Jeffrey, when you drew the green bean you caught a whiff of mint? Or the bean was a particular shade of green that you had come to associate with “mint”. Would your probability still be 50%?

The given proportions of beans in the bag are our data. The whiff of mint or subtle colouration is the anecdote.

What use is the anecdote?

It would certainly be open to a participant in the bean problem to maintain the 50% probability derived from the data and ignore the inferential power of the anecdote. However, the anecdote is evidence that we have and, if we choose to ignore it simply because it is difficult to deal with, then we base our assessment of risk on a more restricted picture than that actually available to us.

The difficulty with the anecdote is that it does not lead to any compelling inference in the same way as do the mathematical proportions. It is easy to see how the bean proportions would give rise to a quite extensive consensus about the probability of “lime”. There would be more variety in individual responses to the anecdote, in what weight to give the evidence and in what it tended to imply.

That illustrates the tension between data and anecdote. Data tends to consensus. If there is disagreement as to its weight and relevance then the community is likely to divide into camps rather than exhibit a spectrum of views. Anecdote does not lead to such a consensus. Individuals interpret anecdotes in diverse ways and invest them with varying degrees of credence.

Yet, the person who is best at weighing and interpreting the anecdotal evidence has the advantage over the broad community who are in agreement about what the proportion data tells them. It will often be the discipline specialist who is in the best position to interpret an anecdote.

From anecdote to data

One of the things that the “mint” anecdote might do is encourage us to start collecting future data on what we smelled when a bean was drawn. A sequence of such observations, along with the actual “lime/ mint” outcome, potentially provides a potent decision support mechanism for future draws. At this point the anecdote has been developed into data.

This may be a difficult process. The whiff of mint or subtle colouration could be difficult to articulate but recognising its significance (sic) is the beginning of operationalising and sharing.

Statistician John Tukey advocated the practice of exploratory data analysis (EDA) to identify such anecdotal evidence before settling on a premature model. As he observed:

The greatest value of a picture is when it forces us to notice what we never expected to see.

Of course, the person who was able to use the single anecdote on its own has the advantage over those who had to wait until they had compelling data. Data that they share with everybody else who has the same idea.

Data or anecdote

When I previously blogged about this I had trouble in coming to any definition that distinguished data and anecdote. Having reflected, I have a modest proposal. Data is the output of some reasonably well-defined process. Anecdote isn’t. It’s not clear how it was generated.

We are not told by what process the proportion of beans was established but I am willing to wager that it was some form of counting.

If we know the process generating evidence then we can examine its biases, non-responses, precision, stability, repeatability and reproducibility. Anecdote we cannot. It is because we can characterise the measurement process, through measurement systems analysis, that we can assess its reliability and make appropriate allowances and adjustments for its limitations. An assessment that most people will agree with most of the time. Because the most potent tools for assessing the reliability of evidence are absent in the case of anecdote, there are inherent difficulties in its interpretation and there will be a spectrum of attitudes from the community.

However, having had our interest pricked by the anecdote, we can set up a process to generate data.

Borrowing strength again

Using an anecdote as the basis for further data generation is one approach to turning anecdote into reliable knowledge. There is another way.

Today in the UK, a jury of 12 found nurse Victorino Chua, beyond reasonable doubt, guilty of poisoning 21 of his patients with insulin. Two died. There was no single compelling piece of evidence put before the jury. It was all largely circumstantial. The prosecution had sought to persuade the jury that those various items of circumstantial evidence reinforced each other and led to a compelling inference.

This is a common situation in litigation where there is no single conclusive piece of data but various pieces of circumstantial evidence that have to be put together. Where these reinforce, they inherit borrowing strength from each other.

Anecdotal evidence is not really the sort of evidence we want to have. But those who know how to use it are way ahead of those embarrassed by it.

Data is the plural of anecdote, either through repetition or through borrowing.

Target and the Targeteers

This blog appeared on the Royal Statistical Society website Statslife on 29 May 2014

DartboardJohn Pullinger, newly appointed head of the UK Statistics Authority, has given a trenchant warning about the “unsophisticated” use of targets. As reported in The Times (London) (“Targets could be skewing the truth, statistics chief warns”, 26 May 2014 – paywall) he cautions:

Anywhere we have had targets, there is a danger that they become an end in themselves and people lose sight of what they’re trying to achieve. We have numbers everywhere but haven’t been well enough schooled on how to use them and that’s where problems occur.

He goes on.

The whole point of all these things is to change behaviour. The trick is to have a sophisticated understanding of what will happen when you put these things out.

Pullinger makes it clear that he is no opponent of targets, but that in the hands of the unskilled they can create perverse incentives, encouraging behaviour that distorts the system they sought to control and frustrating the very improvement they were implemented to achieve.

For example, two train companies are being assessed by the regulator for punctuality. A train is defined as “on-time” if it arrives within 5 minutes of schedule. The target is 95% punctuality.
TrainTargets
Evidently, simple management by target fails to reveal that Company 1 is doing better than Company 2 in offering a punctual service to its passengers. A simple statement of “95% punctuality (punctuality defined as arriving within 5 minutes of timetable)” discards much of the information in the data.

Further, when presented with a train that has slipped outside the 5 minute tolerance, a manager held solely to the target of 95% has no incentive to stop the late train from slipping even further behind. Certainly, if it puts further trains at risk of lateness, there will always be a temptation to strip it of all priority. Here, the target is not only a barrier to effective measurement and improvement, it is a threat to the proper operation of the railway. That is the point that Pullinger was seeking to make about the behaviour induced by the target.

And again, targets often provide only a “snapshot” rather than the “video” that discloses the information in the data that can be used for planning and managing an enterprise.

I am glad that Pullinger was not hesitant to remind users that proper deployment of system measurement requires an appreciation of psychology. Nobel Laureate psychologist Daniel Kahneman warns of the inherent human trait of thinking that What you see is all there is (WYSIATI). On their own, targets do little to guard against such bounded rationality.

In support of a corporate programme of improvement and integrated in a culture of rigorous data criticism, targets have manifest benefits. They communicate improvement priorities. They build confidence between interfacing processes. They provide constraints and parameters that prevent the system causing harm. Harm to others or harm to itself. What is important is that the targets do not become a shield to weak managers who wish to hide their lack of understanding of their own processes behind the defence that “all targets were met”.

However, all that requires some sophistication in approach. I think the following points provide a basis for auditing how an organisation is using targets.

Risk assessment

Targets should be risk assessed, anticipating realistic psychology and envisaging the range of behaviours the targets are likely to catalyse.

Customer focus

Anyone tasked with operating to a target should be periodically challenged with a review of the Voice of the Customer and how their own role contributes to the organisational system. The target is only an aid to the continual improvement of the alignment between the Voice of the Process and the Voice of the Customer. That is the only game in town.

Borrowed validation

Any organisation of any size will usually have independent data of sufficient borrowing strength to support mutual validation. There was a very good recent example of this in the UK where falling crime statistics, about which the public were rightly cynical and incredulous, were effectively validated by data collection from hospital emergency departments (Violent crime in England and Wales falls again, A&E data shows).

Over-adjustment

Mechanisms must be in place to deter over-adjustment, what W Edwards Deming called “tampering”, where naïve pursuit of a target adds variation and degrades performance.

Discipline

Employees must be left in no doubt that lack of care in maintaining the integrity of the organisational system and pursuing customer excellence will not be excused by mere adherence to a target, no matter how heroic.

Targets are for the guidance of the wise. To regard them as anything else is to ask them to do too much.

Deconstructing Deming III – Cease reliance on inspection

3. Cease dependence on inspection to achieve quality. Eliminate the need for massive inspection by building quality into the product in the first place.

W Edwards Deming Point 3 of Deming’s 14 Points. This at least cannot be controversial. For me it goes to the heart of Deming’s thinking.

The point is that every defective item produced (or defective service delivered) has taken cash from the pockets of customers or shareholders. They should be more angry. One day they will be. Inputs have been purchased with their cash, their resources have been deployed to transform the inputs and they will get nothing back in return. They will even face the costs of disposing of the scrap, especially if it is environmentally noxious.

That you have an efficient system for segregating non-conforming from conforming is unimpressive. That you spend even more of other people’s money reworking the product ought to be a matter of shame. Lean Six Sigma practitioners often talk of the hidden factory where the rework takes place. A factory hidden out of embarrassment. The costs remain whether you recognise them or not. Segregation is still more problematic in service industries.

The insight is not unique to Deming. This is a common theme in Lean, Six Sigma, Theory of Constraints and other approaches to operational excellence. However, Deming elucidated the profound statistical truths that belie the superficial effectiveness of inspection.

Inspection is inefficient

When I used to work in the railway industry I was once asked to look at what percentage of signalling scheme designs needed to be rechecked to defend against the danger of a logical error creeping through. The problem requires a simple application of Bayes’ theorem. I was rather taken aback at the result. There were only two strategies that made sense: recheck everything or recheck nothing. I didn’t at that point realise that this is a standard statistical result in inspection theory. For a wide class of real world situations, where the objective is to segregate non-conforming from conforming, the only sensible sampling schemes are 100% or 0%.

Where the inspection technique is destructive, such as a weld strength test, there really is only one option.

Inspection is ineffective

All inspection methods are imperfect. There will be false-positives and false-negatives. You will spend some money scrapping product you could have sold for cash. Some defective product will escape onto the market. Can you think of any examples in your own experience? Further, some of the conforming product will be only marginally conforming. It won’t delight the customer.

So build quality into the product

… and the process for producing the product (or delivering the service). Deming was a champion of the engineering philosophy of Genechi Taguchi who put forward a three-stage approach for achieving, what he called, off-line quality control.

  1. System design – in developing a product (or process) concept think about how variation in inputs and environment will affect performance. Choose concepts that are robust against sources of variation that are difficult or costly to control.
  2. Parameter design – choose product dimensions and process settings that minimise the sensitivity of performance to variation.
  3. Tolerance design – work out the residual sources of variation to which performance remains sensitive. Develop control plans for measuring, managing and continually reducing such variation.

Is there now no need to measure?

Conventional inspection aimed at approving or condemning a completed batch of output. The only thing of interest was the product and whether it conformed. Action would be taken on the batch. Deming called the application of statistics to such problems an enumerative study.

But the thing managers really need to know about is future outcomes and how they will be influenced by present decisions. There is no way of sampling the future. So sampling of the past has to go beyond mere characterisation and quantification of the outcomes. You are stuck with those and will have to take the consequences one way or another. Sampling (of the past) has to aim principally at understanding the causes of those historic outcomes. Only that enables managers to take a view on whether those causes will persist in the future, in what way they might change and how they might be adjusted. This is what Deming called an analytic study.

Essential to the ability to project data into the future is the recognition of common and special causes of variation. Only when managers are confident in thinking and speaking in those terms will their organisations have a sound basis for action. Then it becomes apparent that the results of inspection represent the occult interaction of inherent variation with threshold effects. Inspection obscures the distinction between common and special causes. It seduces the unwary into misguided action that exacerbates quality problems and reputational damage. It obscures the sad truth that, as Terry Weight put it, a disappointment is not necessarily a surprise.

The programme

  1. Drive out sensitivity to variation at the design stage.
  2. Routinely measure the inputs whose variation threatens product performance.
  3. Measure product performance too. Your bounded rationality may have led you to get (2) wrong.
  4. No need to measure every unit. We are trying to understand the cause system not segregate items.
  5. Plot data on a process behaviour chart.
  6. Stabilise the system.
  7. Establish capability.
  8. Keep on measuring to maintain stability and improve capability.

Some people think they have absorbed Deming’s thinking, mastered it even. Yet the test is the extent to which they are able to analyse problems in terms of common and special causes of variation. Is that the language that their organisation uses to communicate exceptions and business performance, and to share analytics, plans, successes and failures?

There has always been some distaste for Deming’s thinking among those who consider it cold, statistically driven and paralysed by data. But the data are only a means to getting beyond the emotional reaction to those two impostors: triumph and disaster. The language of common and special causes is a profound tool for building engagement, fostering communication and sharing understanding. Above that, it is the only sound approach to business measurement.

Trust in data – III – being honest about honesty

I found this presentation by Dan Ariely intriguing. I suspect that this is originally a TED talk with some patronising cartoons added. You can just listen.

When I started off in operational excellence learning about the Deming philosophy, my instructors always used to say These are honest men’s [sic] tools. From that point of view Airely’s presentation is pretty pessimistic. I don’t think I am entirely surprised when I recall Matt Ridley’s summary of evolutionary psychology from his book The Origins of Virtue.

Human beings have some instincts that foster the greater good and others that foster self-interest and anti-social behaviour. We must design a society that encourages the former and discourages the latter.

When wearing a change management hat it’s easy to be sanguine about designing a system or organisation that fosters virtue and the sort of diligent data collection that confronts present reality. However, it is useful to have a toolkit of tactics to build such a system. I think Ariely’s ideas are helpful here.

His idea of “reminders” is something that resonates with maintaining a continual focus on the Voice of the Customer/ Voice of the Business. Periodically exploring with data collectors the purpose of their data collection and the system wide consequences of fabrication is something that seems worthwhile in itself. However, the work Ariely refers to suggests that there might be reasons why such a “nudge” would be particularly effective in improving data trustworthiness.

His idea of “confessions” is a little trickier. I might reflect for a while then blog some more.

Trust in data – II

I just picked up on this, now not so recent, news item about the prosecution of Steven Eaton. Eaton was gaoled for falsifying data in clinical trials. His prosecution was pursuant to the Good Laboratory Practice Regulations 1999. The Regulations apply to chemical safety assessments and come to us, in the UK, from that supra-national body the OECD. Sadly I have managed to find few details other than the press reports. I have had a look at the website of the prosecuting Medicines and Healthcare Products Regulatory Agency but found nothing beyond the press release. I thought about a request under the Freedom of Information Act 2000 but wonder whether an exemption is being claimed pursuant to section 31.

It’s a shame because it would have been an opportunity to compare and contrast with another notable recent case of industrial data fabrication, that concerning BNFL and the Kansai Electric contract. Fortunately, in that case, the HSE made public a detailed report.

In the BNFL case, technicians had fabricated measurements of the diameters of fuel pellets in nuclear fuel rods, it appears principally out of boredom at doing the actual job. The customer spotted it, BNFL didn’t. The matter caused huge reputational damage to BNFL and resulted in the shipment of nuclear fuel rods, necessarily under armed escort, being turned around mid-ocean and returned to the supplier.

For me, the important lesson of the BNFL affair is that businesses must avoid a culture where employees decide what parts of the job are important and interesting to them, what is called intrinsic motivation. Intrinsic motivation is related to a sense of cognitive ease. That sense rests, as Daniel Kahneman has pointed out, on an ecology of unknown and unknowable beliefs and prejudices. No doubt the technicians had encountered nothing but boringly uniform products. They took that as a signal, and felt a sense of cognitive ease in doing so, to stop measuring and conceal the fact that they had stopped.

However, nobody in the supply chain is entitled to ignore the customer’s wishes. Businesses need to foster the extrinsic motivation of the voice of the customer. That is what defines a job well done. Sometimes it will be irksome and involve a lot of measuring pellets whose dimensions look just the same as the last batch. We simply have to get over it!

The customer wanted the data collected, not simply as a sterile exercise in box-ticking, but as a basis for diligent surveillance of the manufacturing process and as a critical component of managing the risks attendant in real world nuclear industry operations. The customer showed that a proper scrutiny of the data, exactly what they had thought that BNFL would perform as part of the contract, would have exposed its inauthenticity. BNFL were embarrassed, not only by their lack of management control of their own technicians, but by the exposure of their own incapacity to scrutinise data and act on its signal message. Even if all the pellets were of perfect dimension, the customer would be legitimately appalled that so little critical attention was being paid to keeping them so.

Data that is properly scrutinised, as part of a system of objective process management and with the correct statistical tools, will readily be exposed if it is fabricated. That is part of incentivising technicians to do the job diligently. Dishonesty must not be tolerated. However, it is essential that everybody in an organisation understands the voice of the customer and understands the particular way in which they themselves add value. A scheme of goal deployment weaves the threads of the voice of the customer together with those of individual process management tactics. That is what provides an individual’s insight into how their work adds value for the customer. That is what provides the “nudge” towards honesty.