Is data the plural of anecdote?

I seem to hear this intriguing quote everywhere these days.

The plural of anecdote is not data.

There is certainly one website that traces it back to Raymond Wolfinger, a political scientist from Berkeley, who claims to have said sometime around 1969 to 1970:

The plural of anecdote is data.

So, which is it?

Anecdote

My Concise Oxford English Dictionary (“COED”) defines “anecdote” as:

Narrative … of amusing or interesting incident.

Wiktionary gives a further alternative definition.

An account which supports an argument, but which is not supported by scientific or statistical analysis.

Edward Jenner by James Northcote.jpg

It’s clear that anecdote itself is a concept without a very exact meaning. It’s a story, not usually reported through an objective channel such as a journalism, or scientific or historical research, that carries some implication of its own unreliability. Perhaps it is inherently implausible when read against objective background evidence. Perhaps it is hearsay or multiple hearsay.

The anecdote’s suspect reliability is offset by the evidential weight it promises, either as a counter example to a cherished theory or as compelling support for a controversial hypothesis. Lyall Watson’s hundredth monkey story is an anecdote. So, in eighteenth century England, was the folk wisdom, recounted to Edward Jenner (pictured), that milkmaids were generally immune to smallpox.

Data

My COED defines “data” as:

Facts or impormation, esp[ecially] as basis for inference.

Wiktionary gives a further alternative definition.

Pieces of information.

Again, not much help. But the principal definition in the COED is:

Thing[s] known or granted, assumption or premise from which inferences may be drawn.

The suggestion in the word “data” is that what is given is the reliable starting point from which we can start making deductions or even inductive inferences. Data carries the suggestion of reliability, soundness and objectivity captured in the familiar Arthur Koestler quote.

Without the little hard bits of marble which are called “facts” or “data” one cannot compose a mosaic …

Yet it is common knowledge that “data” cannot always be trusted. Trust in data is a recurring theme in this blog. Cyril Burt’s purported data on the heritability of IQ is a famous case. There are legions of others.

Smart investigators know that the provenance, reliability and quality of data cannot be taken for granted but must be subject to appropriate scrutiny. The modern science of Measurement Systems Analysis (“MSA”) has developed to satisfy this need. The defining characteristic of anecdote is that it has been subject to no such scrutiny.

Evidence

Anecdote and data, as broadly defined above, are both forms of evidence. All evidence is surrounded by a penumbra of doubt and unreliability. Even the most exacting engineering measurement is accompanied by a recognition of its uncertainty and the limitations that places on its use and the inferences that can be drawn from it. In fact, it is exactly because such a measurement comes accompanied by a numerical characterisation of its precision and accuracy, that  its reliability and usefulness are validated.

It seems inherent in the definition of anecdote that it should not be taken at face value. Happenstance or wishful fabrication, it may not be a reliable basis for inference or, still less, action. However, it was Jenner’s attention to the smallpox story that led him to develop vaccination against smallpox. No mean outcome. Against that, the hundredth monkey storey is mere fantastical fiction.

Anecdotes about dogs sniffing out cancer stand at the beginning of the journey of confirmation and exploitation.

Two types of analysis

Part of the answer to the dilemma comes from statistician John Tukey’s observation that there are two kinds of data analysis: Exploratory Data Analysis (“EDA”) and Confirmatory Data Analysis (“CDA”).

EDA concerns the exploration of all the available data in order to suggest some interesting theories. As economist Ronald Coase put it:

If you torture the data long enough, it will confess.

Once a concrete theory or hypothesis is to mind, a rigorous process of data generation allows formal statistical techniques to be brought to bear (“CDA”) in separating the signal in the data from the noise and in testing the theory. People who muddle up EDA and CDA tend to get into difficulties. It is a foundation of statistical practice to understand the distinction and its implications.

Anecdote may be well suited to EDA. That’s how Jenner successfully proceeded though his CDA of testing his vaccine on live human subjects wouldn’t get past many ethics committees today.

However, absent that confirmatory CDA phase, the beguiling anecdote may be no more than the wrecker’s false light.

A basis for action

Tukey’s analysis is useful for the academic or the researcher in an R&D department where the environment is not dynamic and time not of the essence. Real life is more problematic. There is not always the opportunity to carry out CDA. The past does not typically repeat itself so that we can investigate outcomes with alternative factor settings. As economist Paul Samuelson observed:

We have but one sample of history.

History is the only thing that we have any data from. There is no data on the future. Tukey himself recognised the problem and coined the phrase uncomfortable science for inferences from observations whose repetition was not feasible or practical.

In his recent book Strategy: A History (Oxford University Press, 2013), Lawrence Freedman points out the risks of managing by anecdote “The Trouble with Stories” (pp615-618). As Nobel laureate psychologist Daniel Kahneman has investigated at length, our interpretation of anecdote is beset by all manner of cognitive biases such as the availability heuristic and base rate fallacy. The traps for the statistically naïve are perilous.

But it would be a fool who would ignore all evidence that could not be subjected to formal validation. With a background knowledge of statistical theory and psychological biases, it is possible to manage trenchantly. Bayes’ theorem suggests that all evidence has its value.

I think that the rather prosaic answer to the question posed at the head of this blog is that data is the plural of anecdote, as it is the singular, but anecdotes are not the best form of data. They may be all you have in the real world. It would be wise to have the sophistication to exploit them.

Advertisements

M5 “fireworks crash” – risk identification and reputation management

UK readers will recall this tragic accident in November 2011 when 51 people were injured and seven killed in an accident on a fog bound motorway.

What marked out the accident from a typical collision in fog was the suggestion that the environmental conditions had been exacerbated by smoke that had drifted onto the motorway from a fireworks display at nearby Taunton Rugby Club.

This suggestion excited a lot of press comment. Geoffrey Counsell, the fireworks professional who had been contracted to organise the event, was subsequently charged with manslaughter. The prosecutor’s allegation was that he had fallen so far below the standard or care he purportedly owed to the motorway traffic that a reasonable person would think a criminal sanction appropriate.

It is very difficult to pick out from the press exactly how this whole prosecution unravelled. Firstly the prosecutors resiled from the manslaughter charge, a most serious matter that in the UK can attract a life sentence. They substituted a charge under section 3(2) of the Health and Safety at Work etc. Act 1974 that Mr Counsell had failed “to conduct his undertaking in such a way as to ensure, so far as is reasonably practicable, that … other persons (not being his employees) who may be affected thereby are not thereby exposed to risks to their health or safety.”

There has been much commentary from judges and others on the meaning of “reasonably practicable” but suffice to say, for the purposes of this blog, that a self employed person is required to make substantial effort in protecting the public. That said, the section 3 offence carries a maximum sentence of no more than two years’ imprisonment.

The trial on the section 3(2) indictment opened on 18 November 2013. “Serious weaknesses” in the planning of the event were alleged. There were vague press reports about Mr Counsell’s risk assessment but insufficient for me to form any exact view. It does seem that he had not considered smoke drifting onto the motorway and interacting with fog to create an especial hazard to drivers.

A more worrying feature of the prosecution was the press suggestion that an expert meteorologist had based his opinion on a biased selection of witness statements that he had been provided with and which described which way the smoke from the fireworks display had been drifting. I only have the journalistic account of the trial but it looks far from certain that the smoke did in fact drift towards the motorway.

In any event, on 10 December 2013, following the close of the prosecution evidence, the judge directed the jury to acquit Mr Counsell. The prosecutors had brought forward insufficient evidence against Mr Counsell for a jury reasonably to return a conviction, even without any evidence in his defence.

An individual, no matter how expert, is at a serious disadvantage in identifying novel risks. An individual’s bounded rationality will always limit the futures he can conjure and the weight that he gives to them. To be fair to Mr Counsell, he says that he did seek input from the Highways Agency, Taunton Deane Borough Council and Avon and Somerset Police but he says that they did not respond. If that is the case, I am sure that those public bodies will now reflect on how they could have assisted Mr Counsell’s risk assessment the better to protect the motorists and, in fact, Mr Counsell. The judge’s finding, that this was an accident that Mr Counsell could not reasonably have foreseen, feels like a just decision.

Against that, hypothetically, had the fireworks been set by a household name corporation, they would rightly have felt ashamed at not having anticipated the risk and taken any necessary steps to protect the motorway drivers. There would have been reputational damage. A sufficient risk assessment would have provided the basis for investigating whether the smoke was in fact a cause of the accident and, where appropriate, advancing a robust and persuasive rebuttal of blame.

That is the power of risk assessment. Not only is it a critical foundational element of organisational management, it provides a powerful tool in managing reputation and litigation risk. Unfortunately, unless there is a critical mass of expertise dedicated to risk identification it is more likely that it will provide a predatory regulator with evidence of slipshod practice. Its absence is, of course, damning.

As a matter of good business and efficient leadership, the Highways Agency, Taunton Deane Borough Council, and Avon and Somerset Police ought to have taken Mr Counsell’s risk assessment seriously if they were aware of it. They would surely have known that they were in a better position than Mr Counsell to assess risks to motorists. Fireworks displays are tightly regulated in the UK yet all such regulation has failed to protect the public in this case. Again, I think that the regulators might look to their own role.

Organisations must be aware of external risks. Where they are not engaged with the external assessment of such risks they are really in an oppositional situation that must be managed accordingly. Where they are engaged the external assessments must become integrated into their own risk strategy.

It feels as though Mr Counsell has been unjustly singled out in this tragic matter. There was a rush to blame somebody and I suspect that an availability heuristic was at work. Mr Counsellor attracted attention because the alleged causation of the accident seemed so exotic and unusual. The very grounds on which the court held him blameless.

Do I have to be a scientist to assess food safety?

I saw this BBC item on the web before Christmas: Why are we more scared of raw egg than reheated rice? Just after Christmas seemed like a good time to blog about food safety. Actually, the link I followed asked Are some foods more dangerous that others? A question that has a really easy answer.

However, understanding the characteristic risks of various foods and how most safely to prepare them is less simple. Risk theorist John Adams draws a distinction between readily identified inherent and obvious risks, and risks that can only be perceived with the help of science. Food risks fall into the latter category. As far as I can see, “folk wisdom” is no reliable guide here, even partially. The BBC article refers to risks from rice, pasta and salad vegetables which are not obvious. At the same time, in the UK at least, the risk from raw eggs is very small.

Ironically, raw eggs are one food that springs readily to British people’s minds when food risk is raised, largely due to the folk memory of a high profile but ill thought out declaration by a government minister in the 1980s. This is an example of what Amos Tversky and Daniel Kahneman called an availability heuristic: If you can think of it, it must be important.

Food safety is an environment where an individual is best advised to follow the advice of scientists. We commonly receive this filtered, even if only for accessibility, through government agencies. That takes us back to the issue of trust in bureaucracy on which I have blogged before.

I wonder whether governments are in the best position to provide such advice. It is food suppliers who suffer from the public’s misallocated fears. The egg fiasco of the 1980s had a catastrophic effect on UK egg sales. All food suppliers have an interest in a market characterised by a perception that the products are safe. The food industry is also likely to be in the best position to know what is best practice, to improve such practice, to know how to communicate it to their customers, to tailor it to their products and to provide the effective behavioural “nudges” that promote safe handling. Consumers are likely to be cynical about governments, “one size fits all” advice and cycles of academic meta-analysis.

I think there are also lessons here for organisations. Some risks are assessed on the basis of scientific analysis. It is important that the prestige of that origin is communicated to all staff who will be involved in working with risk. The danger for any organisation is that an individual employee might make a reassessment based on local data and their own self-serving emotional response. As I have blogged before, some individuals have particular difficulty in aligning themselves with the wider organisation.

Of course, individuals must also be equipped with the means of detecting when the assumptions behind the science have been violated and initiating an agile escalation so that employee, customer and organisation can be protected while a reassessment is conducted. Social media provide new ways of sharing experience. I note from the BBC article that, in the UK at least, there is no real data on the origins of food poisoning outbreaks.

So the short answer to the question at the head of this blog still turns out to be “yes”. There are some things where we simply have to rely on science if we want to look after ourselves, our families and our employees.

But even scientists are limited by their own bounded rationality. Science is a work in progress. Using that science itself as a background against which to look for novel phenomena and neglected residual effects leverages that original risk analysis into a key tool in managing, improving and growing a business.