Science journal bans p-values

p-valueInteresting news here that psychology journal Basic and Applied Social Psychology (BASP) has banned the use of p-values in the academic research papers that it will publish in the future.

The dangers of p-values are widely known though their use seems to persist in any number of disciplines, from the Higgs boson to climate change.

There has been some wonderful recent advocacy deprecating p-values, from Deirdre McCloskey and Regina Nuzzo among others. BASP editor David Trafimow has indicated that the journal will not now publish formal hypothesis tests (of the Neyman-Pearson type) or confidence intervals purporting to support experimental results. I presume that appeals to “statistical significance” are proscribed too. Trafimow has no dogma as to what people should do instead but is keen to encourage descriptive statistics. That is good news.

However, Trafimow does say something that worries me.

… as the sample size increases, descriptive statistics become increasingly stable and sampling error is less of a problem.

It is trite statistics that merely increasing sample size, as in the raw number of observations, is no guarantee of improving sampling error. If the sample is not rich enough to capture all the relevant sources of variation then data is amassed in vain. A common example is that of inter-laboratory studies of analytical techniques. A researcher who takes 10 observations from Laboratory A and 10 from Laboratory B really only has two observations. At least as far as the really important and dominant sources of variation are concerned. Increasing the number of observations to 100 from each laboratory would simply be a waste of resources.

But that is not all there is to it. Sampling error only addresses how well we have represented the sampling frame. In any reasonably interesting statistics, and certainly in any attempt to manage risk, we are only interested in the future. The critical question before we can engage in any, even tentative, statistical inference is “Is the data representative of the future?”. That requires that the data has the statistical property of exchangeability. Some people prefer the more management-oriented term “stable and predictable”. That’s why I wished Trafimow hadn’t used the word “stable”.

Assessment of stability and predictability is fundamental to any prediction or data based management. It demands confident use of process-behaviour charts and trenchant scrutiny of the sources of variation that drive the data. It is the necessary starting point of all reliable inference. A taste for p-values is a major impediment to clear thinking on the matter. They do not help. It would be encouraging to believe that scepticism was on the march but I don’t think prohibition is the best means of education.

 

Deconstructing Deming IX – Break down barriers between staff areas

9. Break down barriers between staff areas.

W Edwards Deming

Something there is that doesn’t love a wall,
That wants it down!

Robert Frost
Mending Wall (1914)

Point 9 of Deming’s 14 Points. One that is always attractive to a self describing iconoclast. Barriers must be bad if they prevent the exchange and interaction of ideas, or worse if they lead to optimisation within a subunit that suboptimises the wider system. Deming was thinking of managers such as John Browett. Browett was given charge of Apple’s retail operations and immediately started to cut staff numbers and hours in order to reduce his own budget. However, Apple’s avowed strategy is to foster reputation and brand loyalty through a distinctive, unconventional and delightfully effective Apple Store encounter. My wife is more of an enthusiast for Apple products than I, but I am always wowed by our Store visits.

I feel sorry for Browett as he was clearly left to guess the corporation’s strategy. Some organisational functions are just there because they enable the principle value streams. Without them profits would fall. Silo management is the term mockingly used to satirise a management dominated by pillars of functional expertise bolstered by professional status and mute to its “rival” silos.

Deming reminded us that somebody in a leadership position does need to maintain a synoptic view of the business system to prevent Browett type misunderstandings.

Deming system diagramAnybody who has been to a Deming seminar will have seen the Deming system diagram. Deming invited participants to focus on the system that created revenues for the organisation and, further, to see that system as a network of processes. Deming used the diagram to emphasis that the critical business processes transect organisational boundaries. Raw materials, whether physical or transactional, run into and out of the silos. Some processes don’t transform the raw materials but act as critical support for the supplier-customer strand. Deming argued that equipment maintenance, product development etc. are nonetheless processes transforming their own inputs into vital enablers and accelerants of the revenue generating activities.

Further, held Deming, those processes run across the external boundaries of the organisation into suppliers and customer. A manufacturer making car tyres is part of a bigger picture including the manufacture of the tyre rubber and even the way the end user drives his motor car. Only by understanding the whole can the tyre performance be optimised, customer value maximised, and growing market share and revenues realised.

Yet the power of the functions remains and is seldom mitigated by implementing process management. Process management is something with which organisations still struggle.Those who try to follow the idea of dispersing expertise into the processes frequently find that individuals embedded in cross-functional teams perform less well than within their concentrated centres of excellence. It is worth remembering how two counterbalancing forces arise.

Behaviour

Any proposed system of reward must be risk assessed against the behaviours it is likely to encourage or discourage. Managers given the job of reducing the cost of running their own silo will do just that. All managers are optimising within their own bounded rationality.

Goal deployment

One tactic that can help prevent managers from optimising their own subsystem at the expense of the greater is to adopt some system of goal deployment such as hoshin kanri. Visibility, both horizontally and vertically, of how individual results contribute to organisational goals, effected through objective supervision and strategic governance, ought to discourage suboptimisation and reveal any such trends at an early time.

Professional expertise is important

In 1776, Scottish philosopher Adam Smith told the parable of the pin maker. Smith set out a detailed argument for the benefits of specialisation and the division of labour. The silos provide the means of rewarding the development of expertise in itself, something whose value may only be seen in the future, and of fostering the application of that expertise in management.

Deming was somewhat inimical to this idea and thought that managers should work in a variety of roles across functions as they ascended the hierarchy, as he felt they did in Japan. Yet it is critical in that environment to maintain the virtues of the silos as incubators of expertise. This is not so easily achieved.

Organisational boundaries exist for a reason

In The Democratic Corporation (1994) Russell Ackoff asked why we could not make a business out of mutually and severally co-operating individuals, each negotiating a web of personal contracts that made up the system that delivered the goods.

Nobel laureate economist Ronald Coase had already answered the question in his 1937 paper The Nature of the Firm. Coase explained why organisations are promoted and employ the people who might otherwise be a market of interacting individual contractors. It simply came down to the costs of operating such a market and the savings that could be made from making a global decision to bring some people and facilities under a single enduring roof.

Organisational and even function boundaries often arise from subtle cost structures. Perhaps these develop over time as more connected ways of remote working become commonplace. But it is important to analyse the forces that created and perpetuate the silos. Otherwise, it should be no surprise when the benefits of process management go unrealised.

Deconstructing Deming VIII – Drive out fear

8. Drive out fear.

W Edwards Deming Point 8 of Deming’s 14 Points and quite my least favourite of all his slogans. As Harry Lime averred in the motion picture The Third Man:

Like the fella says, in Italy for 30 years under the Borgias they had warfare, terror, murder, and bloodshed, but they produced Michelangelo, Leonardo da Vinci, and the Renaissance. In Switzerland they had brotherly love – they had 500 years of democracy and peace, and what did that produce? The cuckoo clock.

It’s a wisecrack and not analysis but I quote Lime to remind myself that fear isn’t inevitably the debilitating sentiment that Deming made it out to be. Inspirational writer Helen Keller vividly captured an alternative reality.

Security is mostly a superstition. It does not exist in nature, nor do the children of humankind as a whole experience it. Avoiding danger is no safer in the long run than outright exposure. Life is either a daring adventure, or it is nothing at all.

In Out of the Crisis, Deming recounts several anecdotes of corrosive fear in the workplace. He directs his criticism at managers who threaten their subordinates with dire consequences for future outcomes that are, in fact, beyond the control of the workers. There is a recurring theme in Deming’s writing, and it is a good one, that many of the factors that determine an outcome are often outside the control of the person superficially held answerable. Any business process is influenced by diverse sources of variation. The aggregate of those sources determines the capability of the process and provides a fundamental bound on its future performance. An incapable process will never meet the aspirations of the business. Berating the person who works within it will never improve it because intervention is needed to re-engineer the process. Blind attempts to coax more out of an incapable process generally lead to over adjustment and even worse outcomes.

However, there have to be some people in an organisation for whom it wasn’t my fault isn’t available as an analysis of unsatisfactory outcomes. Some people willingly and enthusiastically own the goal of re-engineering the business process, of achieving higher and higher degrees of capability, of influencing the organisation’s environment, desensitising the system to external variation, of (following Eliyahu Goldratt) bringing the constraint back inside the system, fostering radical thinking, of managing unknown and unknowable risks.

Brian Joiner used to argue that it was wishful thinking to expect a prescribed outcome next year when the responsible manager had been incapable of achieving it last. Yet business is always a matter of resources and priorities. Typically, people do not energetically pursue objectives whose importance has not been urged upon them. They already have plenty to do. It is simply disingenuous to suggest that telling somebody that something is critical, and that they will be rewarded only for achieving it, is ultimately inexpedient.

Some people must manage and take responsibility for outcomes. They are responsible for the business system. They can change it.

There is nothing wrong in holding those who have the power to effect change responsible for outcomes.

Alternatively, some employees are responsible principally for operating a process in a disciplined and repeatable way. They are not responsible if that process is ultimately incapable but they are answerable for any lack of discipline. Their managers expect them to operate in a disciplined way, so do their co-workers. They should have no comfort that safety and security will be the consequence of failure to do their job.

Those workers will though, I fear, not be able to rest easily just because they turn up and do their job conscientiously. If management fail to take on the goal of the continual improvement of the alignment between the voice of the process and the voice of the customer then their diligence will be in vain. As business leader Ian MacGregor observed:

Management is a calling and people ought to be dedicated to it. British managers have far too much security. A poor manager should be dumped. What’s at stake is the happiness of society, not the comfort of managers.

Is data the plural of anecdote?

I seem to hear this intriguing quote everywhere these days.

The plural of anecdote is not data.

There is certainly one website that traces it back to Raymond Wolfinger, a political scientist from Berkeley, who claims to have said sometime around 1969 to 1970:

The plural of anecdote is data.

So, which is it?

Anecdote

My Concise Oxford English Dictionary (“COED”) defines “anecdote” as:

Narrative … of amusing or interesting incident.

Wiktionary gives a further alternative definition.

An account which supports an argument, but which is not supported by scientific or statistical analysis.

Edward Jenner by James Northcote.jpg

It’s clear that anecdote itself is a concept without a very exact meaning. It’s a story, not usually reported through an objective channel such as a journalism, or scientific or historical research, that carries some implication of its own unreliability. Perhaps it is inherently implausible when read against objective background evidence. Perhaps it is hearsay or multiple hearsay.

The anecdote’s suspect reliability is offset by the evidential weight it promises, either as a counter example to a cherished theory or as compelling support for a controversial hypothesis. Lyall Watson’s hundredth monkey story is an anecdote. So, in eighteenth century England, was the folk wisdom, recounted to Edward Jenner (pictured), that milkmaids were generally immune to smallpox.

Data

My COED defines “data” as:

Facts or impormation, esp[ecially] as basis for inference.

Wiktionary gives a further alternative definition.

Pieces of information.

Again, not much help. But the principal definition in the COED is:

Thing[s] known or granted, assumption or premise from which inferences may be drawn.

The suggestion in the word “data” is that what is given is the reliable starting point from which we can start making deductions or even inductive inferences. Data carries the suggestion of reliability, soundness and objectivity captured in the familiar Arthur Koestler quote.

Without the little hard bits of marble which are called “facts” or “data” one cannot compose a mosaic …

Yet it is common knowledge that “data” cannot always be trusted. Trust in data is a recurring theme in this blog. Cyril Burt’s purported data on the heritability of IQ is a famous case. There are legions of others.

Smart investigators know that the provenance, reliability and quality of data cannot be taken for granted but must be subject to appropriate scrutiny. The modern science of Measurement Systems Analysis (“MSA”) has developed to satisfy this need. The defining characteristic of anecdote is that it has been subject to no such scrutiny.

Evidence

Anecdote and data, as broadly defined above, are both forms of evidence. All evidence is surrounded by a penumbra of doubt and unreliability. Even the most exacting engineering measurement is accompanied by a recognition of its uncertainty and the limitations that places on its use and the inferences that can be drawn from it. In fact, it is exactly because such a measurement comes accompanied by a numerical characterisation of its precision and accuracy, that  its reliability and usefulness are validated.

It seems inherent in the definition of anecdote that it should not be taken at face value. Happenstance or wishful fabrication, it may not be a reliable basis for inference or, still less, action. However, it was Jenner’s attention to the smallpox story that led him to develop vaccination against smallpox. No mean outcome. Against that, the hundredth monkey storey is mere fantastical fiction.

Anecdotes about dogs sniffing out cancer stand at the beginning of the journey of confirmation and exploitation.

Two types of analysis

Part of the answer to the dilemma comes from statistician John Tukey’s observation that there are two kinds of data analysis: Exploratory Data Analysis (“EDA”) and Confirmatory Data Analysis (“CDA”).

EDA concerns the exploration of all the available data in order to suggest some interesting theories. As economist Ronald Coase put it:

If you torture the data long enough, it will confess.

Once a concrete theory or hypothesis is to mind, a rigorous process of data generation allows formal statistical techniques to be brought to bear (“CDA”) in separating the signal in the data from the noise and in testing the theory. People who muddle up EDA and CDA tend to get into difficulties. It is a foundation of statistical practice to understand the distinction and its implications.

Anecdote may be well suited to EDA. That’s how Jenner successfully proceeded though his CDA of testing his vaccine on live human subjects wouldn’t get past many ethics committees today.

However, absent that confirmatory CDA phase, the beguiling anecdote may be no more than the wrecker’s false light.

A basis for action

Tukey’s analysis is useful for the academic or the researcher in an R&D department where the environment is not dynamic and time not of the essence. Real life is more problematic. There is not always the opportunity to carry out CDA. The past does not typically repeat itself so that we can investigate outcomes with alternative factor settings. As economist Paul Samuelson observed:

We have but one sample of history.

History is the only thing that we have any data from. There is no data on the future. Tukey himself recognised the problem and coined the phrase uncomfortable science for inferences from observations whose repetition was not feasible or practical.

In his recent book Strategy: A History (Oxford University Press, 2013), Lawrence Freedman points out the risks of managing by anecdote “The Trouble with Stories” (pp615-618). As Nobel laureate psychologist Daniel Kahneman has investigated at length, our interpretation of anecdote is beset by all manner of cognitive biases such as the availability heuristic and base rate fallacy. The traps for the statistically naïve are perilous.

But it would be a fool who would ignore all evidence that could not be subjected to formal validation. With a background knowledge of statistical theory and psychological biases, it is possible to manage trenchantly. Bayes’ theorem suggests that all evidence has its value.

I think that the rather prosaic answer to the question posed at the head of this blog is that data is the plural of anecdote, as it is the singular, but anecdotes are not the best form of data. They may be all you have in the real world. It would be wise to have the sophistication to exploit them.

Bad Statistics I – the phantom line

I came across this chart on the web recently.

BadScatter01

This really is one of my pet hates: a perfectly informative scatter chart with a meaningless straight line drawn on it.

The scatter chart is interesting. Each individual blot represents a nation state. Its vertical position represents national average life expectancy. I take that to be mean life expectancy at birth, though it is not explained in terms. The horizontal axis represents annual per capita health spending, though there is no indication as to whether that is adjusted for purchasing power. The whole thing is a snapshot from 2011. The message I take from the chart is that Hungary and Mexico, and I think two smaller blots, represent special causes, they are outside the experience base represented by the balance of the nations. As to the other nations the chart suggests that average life expectancy doesn’t depend very strongly on health spending.

Of course, there is much more to a thorough investigation of the impact of health spending on outcomes. The chart doesn’t reveal differential performance as to morbidity, or lost hours, or a host of important economic indicators. But it does put forward that one, slightly surprising, message that longevity is not enhanced by health spending. Or at least it wasn’t in 2011 and there is no explanation as to why that year was isolated.

The question is then as to why the author decided to put the straight line through it. As the chart “helpfully” tells me it is a “Linear Trend line”. I guess (sic) that this is a linear regression through the blots, possibly with some weighting as to national population. I originally thought that the size of the blot was related to population but there doesn’t seem to be enough variation in the blot sizes. It looks like there are only two sizes of blot and the USA (population 318.5 million) is the same size as Norway (5.1 million).

The difficulty here is that I can see that the two special cause nations, Hungary and Mexico, have very high leverage. That means that they have a large impact on where the straight lines goes, because they are so unusual as observations. The impact of those two atypical countries drags the straight line down to the left and exaggerates the impact that spending appears to have on longevity. It really is an unhelpful straight line.

These lines seem to appear a lot. I think that is because of the ease with which they can be generated in Excel. They are an example of what statistician Edward Tufte called chartjunk. They simply clutter the message of the data.

Of course, the chart here is a snapshot, not a video. If you do want to know how to use scatter charts to explain life expectancy then you need to learn here from the master, Hans Rosling.

There are no lines in nature, only areas of colour, one against another.

Edouard Manet