I have already blogged about the question of whether data is the plural of anecdote. Then I recently came across the following problem in the late Richard Jeffrey’s marvellous little book Subjective Probability: The Real Thing (2004, Cambridge) and it struck me as a useful template for thinking about data and anecdotes.
The problem looks like a staple of elementary statistics practice exercises.
You are drawing a jellybean from a bag in which you know half the beans are green, all the lime flavoured ones are green and the green ones are equally divided between lime and mint flavours.
You draw a green bean. Before you taste it, what is the probability that it is lime flavoured?
A mathematically neat answer would be 50%. But what if, asked Jeffrey, when you drew the green bean you caught a whiff of mint? Or the bean was a particular shade of green that you had come to associate with “mint”. Would your probability still be 50%?
The given proportions of beans in the bag are our data. The whiff of mint or subtle colouration is the anecdote.
What use is the anecdote?
It would certainly be open to a participant in the bean problem to maintain the 50% probability derived from the data and ignore the inferential power of the anecdote. However, the anecdote is evidence that we have and, if we choose to ignore it simply because it is difficult to deal with, then we base our assessment of risk on a more restricted picture than that actually available to us.
The difficulty with the anecdote is that it does not lead to any compelling inference in the same way as do the mathematical proportions. It is easy to see how the bean proportions would give rise to a quite extensive consensus about the probability of “lime”. There would be more variety in individual responses to the anecdote, in what weight to give the evidence and in what it tended to imply.
That illustrates the tension between data and anecdote. Data tends to consensus. If there is disagreement as to its weight and relevance then the community is likely to divide into camps rather than exhibit a spectrum of views. Anecdote does not lead to such a consensus. Individuals interpret anecdotes in diverse ways and invest them with varying degrees of credence.
Yet, the person who is best at weighing and interpreting the anecdotal evidence has the advantage over the broad community who are in agreement about what the proportion data tells them. It will often be the discipline specialist who is in the best position to interpret an anecdote.
From anecdote to data
One of the things that the “mint” anecdote might do is encourage us to start collecting future data on what we smelled when a bean was drawn. A sequence of such observations, along with the actual “lime/ mint” outcome, potentially provides a potent decision support mechanism for future draws. At this point the anecdote has been developed into data.
This may be a difficult process. The whiff of mint or subtle colouration could be difficult to articulate but recognising its significance (sic) is the beginning of operationalising and sharing.
Statistician John Tukey advocated the practice of exploratory data analysis (EDA) to identify such anecdotal evidence before settling on a premature model. As he observed:
The greatest value of a picture is when it forces us to notice what we never expected to see.
Of course, the person who was able to use the single anecdote on its own has the advantage over those who had to wait until they had compelling data. Data that they share with everybody else who has the same idea.
Data or anecdote
When I previously blogged about this I had trouble in coming to any definition that distinguished data and anecdote. Having reflected, I have a modest proposal. Data is the output of some reasonably well-defined process. Anecdote isn’t. It’s not clear how it was generated.
We are not told by what process the proportion of beans was established but I am willing to wager that it was some form of counting.
If we know the process generating evidence then we can examine its biases, non-responses, precision, stability, repeatability and reproducibility. Anecdote we cannot. It is because we can characterise the measurement process, through measurement systems analysis, that we can assess its reliability and make appropriate allowances and adjustments for its limitations. An assessment that most people will agree with most of the time. Because the most potent tools for assessing the reliability of evidence are absent in the case of anecdote, there are inherent difficulties in its interpretation and there will be a spectrum of attitudes from the community.
However, having had our interest pricked by the anecdote, we can set up a process to generate data.
Borrowing strength again
Using an anecdote as the basis for further data generation is one approach to turning anecdote into reliable knowledge. There is another way.
Today in the UK, a jury of 12 found nurse Victorino Chua, beyond reasonable doubt, guilty of poisoning 21 of his patients with insulin. Two died. There was no single compelling piece of evidence put before the jury. It was all largely circumstantial. The prosecution had sought to persuade the jury that those various items of circumstantial evidence reinforced each other and led to a compelling inference.
This is a common situation in litigation where there is no single conclusive piece of data but various pieces of circumstantial evidence that have to be put together. Where these reinforce, they inherit borrowing strength from each other.
Anecdotal evidence is not really the sort of evidence we want to have. But those who know how to use it are way ahead of those embarrassed by it.
Data is the plural of anecdote, either through repetition or through borrowing.