Populism and p-values

Time for The Guardian to get the bad data-journalism award for this headline (25 February 2019).

Vaccine scepticism grows in line with rise of populism – study

Surges in measles cases map tightly to countries where populism is on the march

The report was a journalistic account of a paper by Jonathan Kennedy of the Global Health Unit, Centre for Primary Care and Public Health, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, entitled Populist politics and vaccine hesitancy in Western Europe: an analysis of national-level data.1

Studies show a strong correlation between votes for populist parties and doubts that vaccines work, declared the newspaper, relying on support from the following single chart redrawn by The Guardian‘s own journalists.
PV Fig 1
It seemed to me there was more to the chart than the newspaper report. Is it possible this was all based on an uncritical regression line? Like this (hand drawn line but, believe me, it barely matters – you get the idea).
PV Fig 2
Perhaps there was a p-value too but I shall come back to that. However, looking back at the raw chart, I wondered if it wouldn’t bear a different analysis. The 10 countries, Portugal, …, Denmark and the UK all have “vaccine hesitancy” rates around 6%. That does not vary much with populist support varying between 0% and 27% between them. Again, France, Greece and Germany all have “hesitancy” rates of around 17%, such rate not varying much with populist support varying from 25% to 45%. In fact the Guardian journalist seems to have had that though too. The two groups are colour coded on the chart. So much for the relationship between “populism” and “vaccine hesitancy”. Austria seems not to fit into either group but perhaps that makes it interesting. France has three times the hesitancy of Denmark but is less “populist”.

So what about this picture?
PV Fig 3
Perhaps there are two groups, one with “hesitancy” around 6% and one, around 17%. Austria is an interesting exception. What differences are there between the two groups, aside from populist sentiment? I don’t know because it’s not my study or my domain of knowledge. But, whenever there is a contrast between two groups we ought to explore all the differences we can think of before putting forward an, even tentative, explanation. That’s what Ignaz Semmelweis did when he observed signal differences in post-natal mortality between two wards of the Vienna General Hospital in the 1840s.2 Austria again, coincidentally. He investigated all the differences that he could think of between the wards before advancing and testing his theory of infection. As to the vaccine analysis, we already suspect that there are particular dynamics in Italy around trust in bureaucracy. That’s where the food scare over hormone-treated beef seems to have started so there may be forces at work that make it atypical of the other countries.3, 4, 5

Slightly frustrated, I decided that I wanted to look at the original publication. This was available free of charge on the publisher’s website at the time I read The Guardian article. But now it isn’t. You will have to pay EUR 36, GBP 28 or USD 45 for 24 hour access. Perhaps you feel you already paid.

The academic research

The “populism” data comes from votes cast in the 2014 elections to the European Parliament. That means that the sampling frame was voters in that election. Turnout in the election was 42%. That is not the whole of the population of the respective countries and voters at EU elections are, I think I can safely say, are not representative of the population at large. The “hesitancy” data came from something called the Vaccine Confidence Project (“the VCP”) for which citations are given. It turns out that 65,819 individuals were sampled across 67 countries in 2015. I have not looked at details of the frame, sampling, handling of non-responses, adjustments and so on, but I start off by noting that the two variables are sampled from, inevitably, different frames and that is not really discussed in the paper. Of course here, We make no mockery of honest ad-hockery.6

The VCP put a number of questions to the respondents. It is not clear from the paper whether there were more questions than analysed here. Each question was answered “strongly agree”, “tend to agree”, “do not know”, “tend to disagree”, and “strongly disagree”. The “hesitancy” variable comes from the aggregate of the latter two categories. I would like to have seen the raw data.

The three questions are set out below, along with the associated R2s from the regression analysis.

Question R2
(1) Vaccines are important for children to have 63%
(2) Overall I think vaccines are effective 52%
(3) Overall I think vaccines are safe. 25%

Well the individual questions raise issues about variation in interpreting the respective meanings, notwithstanding translation issues between languages, fidelity and felicity.

As I guessed, there were p-values but, as usual, they add nil to the analysis.

We now see that the plot reproduced in The Guardian is for question (2) alone and has  R2 = 52%. I would have been interested in seeing R2 for my 2-level analysis. The plotted response for question (1) (not reproduced here) actually looks a bit more like a straight line and has better fit. However, in both cases, I am worried by how much leverage the Italy group has. Not discussed in the paper. No regression diagnostics.

So how about this picture, from Kennedy’s paper, for the response to question (3)?

PV Fig 5

Now, the variation in perceptions of vaccine safety, between France, Greece and Italy, is greater than between the remainder countries. Moreover, if anything, among that group, there is evidence that “hesitancy” falls as “populism” increases. There is certainly no evidence that it increases. In my opinion, that figure is powerful evidence that there are other important factors at work here. That is confirmed by the lousy R2 = 25% for the regression. And this is about perceptions of vaccine safety specifically.

I think that the paper also suffers from a failure to honour John Tukey’s trenchant distinction between exploratory data analysis and confirmatory data analysis. Such a failure always leads to trouble.

Confirmation bias

On the basis of his analysis, Kennedy felt confident to conclude as follows.

Vaccine hesitancy and political populism are driven by similar dynamics: a profound distrust in elites and experts. It is necessary for public health scholars and actors to work to build trust with parents that are reluctant to vaccinate their children, but there are limits to this strategy. The more general popular distrust of elites and experts which informs vaccine hesitancy will be difficult to resolve unless its underlying causes—the political disenfranchisement and economic marginalisation of large parts of the Western European population—are also addressed.

Well, in my opinion that goes a long way from what the data reveal. The data are far from from being conclusive as to association between “vaccine hesitancy” and “populism”. Then there is the unsupported assertion of a common causation in “political disenfranchisement and economic marginalisation”. While the focus remains there, the diligent search for other important factors is ignored and devalued.

We all suffer from a confirmation bias in favour of our own cherished narratives.7 We tend to believe and share evidence that we feel supports the narrative and ignore and criticise that which doesn’t. That has been particularly apparent over recent months in the energetic, even enthusiastic, reporting as fact of the scandalous accusations made against Nathan Phillips and dubious allegations made by Jussie Smollett. They fitted a narrative.

I am as bad. I hold to the narrative that people aren’t very good with statistics and constantly look for examples that I can dissect to “prove” that. Please tell me when you think I get it wrong.

Yet, it does seem to me the that the author here, and The Guardian journalist, ought to have been much more critical of the data and much more curious as to the factors at work. In my view, The Guardian had a particular duty of candour as the original research is not generally available to the public.

This sort of selective analysis does not build trust in “elites and experts”.

References

  1. Kennedy, J (2019) Populist politics and vaccine hesitancy in Western Europe: an analysis of national-level data, Journal of Public Health, ckz004, https://doi.org/10.1093/eurpub/ckz004
  2. Semmelweis, I (1860) The Etiology, Concept, and Prophylaxis of Childbed Fever, trans. K Codell Carter [1983] University of Wisconsin Press: Madison, Wisconsin
  3.  Kerr, W A & Hobbs, J E (2005). “9. Consumers, Cows and Carousels: Why the Dispute over Beef Hormones is Far More Important than its Commercial Value”, in Perdikis, N & Read, R, The WTO and the Regulation of International Trade. Edward Elgar Publishing, pp 191–214
  4. Caduff, L (August 2002). “Growth Hormones and Beyond” (PDF). ETH Zentrum. Archived from the original (PDF) on 25 May 2005. Retrieved 11 December 2007.
  5. Gandhi, R & Snedeker, S M (June 2000). “Consumer Concerns About Hormones in Food“. Program on Breast Cancer and Environmental Risk Factors. Cornell University. Archived from the original on 19 July 2011.
  6. I J Good
  7. Kahneman, D (2011) Thinking, Fast and Slow, London: Allen Lane, pp80-81
Advertisements

Is data the plural of anecdote?

I seem to hear this intriguing quote everywhere these days.

The plural of anecdote is not data.

There is certainly one website that traces it back to Raymond Wolfinger, a political scientist from Berkeley, who claims to have said sometime around 1969 to 1970:

The plural of anecdote is data.

So, which is it?

Anecdote

My Concise Oxford English Dictionary (“COED”) defines “anecdote” as:

Narrative … of amusing or interesting incident.

Wiktionary gives a further alternative definition.

An account which supports an argument, but which is not supported by scientific or statistical analysis.

Edward Jenner by James Northcote.jpg

It’s clear that anecdote itself is a concept without a very exact meaning. It’s a story, not usually reported through an objective channel such as a journalism, or scientific or historical research, that carries some implication of its own unreliability. Perhaps it is inherently implausible when read against objective background evidence. Perhaps it is hearsay or multiple hearsay.

The anecdote’s suspect reliability is offset by the evidential weight it promises, either as a counter example to a cherished theory or as compelling support for a controversial hypothesis. Lyall Watson’s hundredth monkey story is an anecdote. So, in eighteenth century England, was the folk wisdom, recounted to Edward Jenner (pictured), that milkmaids were generally immune to smallpox.

Data

My COED defines “data” as:

Facts or impormation, esp[ecially] as basis for inference.

Wiktionary gives a further alternative definition.

Pieces of information.

Again, not much help. But the principal definition in the COED is:

Thing[s] known or granted, assumption or premise from which inferences may be drawn.

The suggestion in the word “data” is that what is given is the reliable starting point from which we can start making deductions or even inductive inferences. Data carries the suggestion of reliability, soundness and objectivity captured in the familiar Arthur Koestler quote.

Without the little hard bits of marble which are called “facts” or “data” one cannot compose a mosaic …

Yet it is common knowledge that “data” cannot always be trusted. Trust in data is a recurring theme in this blog. Cyril Burt’s purported data on the heritability of IQ is a famous case. There are legions of others.

Smart investigators know that the provenance, reliability and quality of data cannot be taken for granted but must be subject to appropriate scrutiny. The modern science of Measurement Systems Analysis (“MSA”) has developed to satisfy this need. The defining characteristic of anecdote is that it has been subject to no such scrutiny.

Evidence

Anecdote and data, as broadly defined above, are both forms of evidence. All evidence is surrounded by a penumbra of doubt and unreliability. Even the most exacting engineering measurement is accompanied by a recognition of its uncertainty and the limitations that places on its use and the inferences that can be drawn from it. In fact, it is exactly because such a measurement comes accompanied by a numerical characterisation of its precision and accuracy, that  its reliability and usefulness are validated.

It seems inherent in the definition of anecdote that it should not be taken at face value. Happenstance or wishful fabrication, it may not be a reliable basis for inference or, still less, action. However, it was Jenner’s attention to the smallpox story that led him to develop vaccination against smallpox. No mean outcome. Against that, the hundredth monkey storey is mere fantastical fiction.

Anecdotes about dogs sniffing out cancer stand at the beginning of the journey of confirmation and exploitation.

Two types of analysis

Part of the answer to the dilemma comes from statistician John Tukey’s observation that there are two kinds of data analysis: Exploratory Data Analysis (“EDA”) and Confirmatory Data Analysis (“CDA”).

EDA concerns the exploration of all the available data in order to suggest some interesting theories. As economist Ronald Coase put it:

If you torture the data long enough, it will confess.

Once a concrete theory or hypothesis is to mind, a rigorous process of data generation allows formal statistical techniques to be brought to bear (“CDA”) in separating the signal in the data from the noise and in testing the theory. People who muddle up EDA and CDA tend to get into difficulties. It is a foundation of statistical practice to understand the distinction and its implications.

Anecdote may be well suited to EDA. That’s how Jenner successfully proceeded though his CDA of testing his vaccine on live human subjects wouldn’t get past many ethics committees today.

However, absent that confirmatory CDA phase, the beguiling anecdote may be no more than the wrecker’s false light.

A basis for action

Tukey’s analysis is useful for the academic or the researcher in an R&D department where the environment is not dynamic and time not of the essence. Real life is more problematic. There is not always the opportunity to carry out CDA. The past does not typically repeat itself so that we can investigate outcomes with alternative factor settings. As economist Paul Samuelson observed:

We have but one sample of history.

History is the only thing that we have any data from. There is no data on the future. Tukey himself recognised the problem and coined the phrase uncomfortable science for inferences from observations whose repetition was not feasible or practical.

In his recent book Strategy: A History (Oxford University Press, 2013), Lawrence Freedman points out the risks of managing by anecdote “The Trouble with Stories” (pp615-618). As Nobel laureate psychologist Daniel Kahneman has investigated at length, our interpretation of anecdote is beset by all manner of cognitive biases such as the availability heuristic and base rate fallacy. The traps for the statistically naïve are perilous.

But it would be a fool who would ignore all evidence that could not be subjected to formal validation. With a background knowledge of statistical theory and psychological biases, it is possible to manage trenchantly. Bayes’ theorem suggests that all evidence has its value.

I think that the rather prosaic answer to the question posed at the head of this blog is that data is the plural of anecdote, as it is the singular, but anecdotes are not the best form of data. They may be all you have in the real world. It would be wise to have the sophistication to exploit them.

Late-night drinking laws saved lives

That was the headline in The Times (London) on 19 August 2013. The copy went on:

“Hundreds of young people have escaped death on Britain’s roads after laws were relaxed to allow pubs to open late into the night, a study has found.”

It was accompanied by a chart.

How death toll fell

This conclusion was apparently based on a report detailing work led by Dr Colin Green at Lancaster University Management School. The report is not on the web but Lancaster were very kind in sending me a copy and I extend my thanks to them for the courtesy.

This is very difficult data to analyse. Any search for a signal has to be interpreted against a sustained fall in recorded accidents involving personal injury that goes back to the 1970s and is well illustrated in the lower part of the graphic (see here for data). The base accident data is therefore manifestly not stable and predictable. To draw inferences we need to be able to model the long term trend in a persuasive manner so that we can eliminate its influence and work with a residual data sequence amendable to statistical analysis.

It is important to note, however, that the authors had good reason to believe that relaxation of licensing laws may have an effect so this was a proper exercise in Confirmatory Data Analysis.

Reading the Lancaster report I learned that The Times graphic is composed of five-month moving averages. I do not think that I am attracted by that as a graphic. Shewhart’s Second Rule of Data Presentation is:

Whenever an average, range or histogram is used to summarise observations, the summary must not mislead the user into taking any action that the user would not take if the data were presented in context.

I fear that moving-averages will always obscure the message in the data. I preferred this chart from the Lancaster report. The upper series are for England, the lower for Scotland.

Drink Drive scatter

Now we can see the monthly observations. Subjectively there looks to be, at least in some years, some structure of variation throughout the year. That is unsurprising but it does ruin all hope of justifying an assumption of “independent identically distributed” residuals. Because of that alone, I feel that the use of p-values here is inappropriate, the usual criticisms of p-values in general notwithstanding (see the advocacy of Stephen Ziliak and Deirdre McCloskey).

As I said, this is very tricky data from which to separate signal and noise. Because of the patterned variation within any year I think that there is not much point in analysing other than annual aggregates. The analysis that I would have liked to have seen would have been a straight line regression through the whole of the annual data for England. There may be an initial disappointment that that gives us “less data to play with”. However, considering the correlation within the intra-year monthly figures, a little reflection confirms that there is very little sacrifice of real information. I’ve had a quick look at the annual aggregates for the period under investigation and I can’t see a signal. The analysis could be taken further by calculating an R2. That could then be compared with an R2 calculated for the Lancaster bi-linear “change point” model. Is the extra variation explained worth having for the extra parameters?

I see that the authors calculated an R2 of 42%. However, that includes accounting for the difference between English and Scottish data which is the dominant variation in the data set. I’m not sure what the Scottish data adds here other than to inflate R2.

There might also be an analysis approach by taking out the steady long term decline in injuries using a LOWESS curve then looking for a signal in the residuals.

What that really identifies are three ways of trying to cope with the long term declining trend, which is a nuisance in this analysis: straight line regression, straight line regression with “change point”, and LOWESS. If they don’t yield the same conclusions then any inference has to be treated with great caution. Inevitably, any signal is confounded with lack of stability and predictability in the long term trend.

I comment on this really to highlight the way the press use graphics without explaining what they mean. I intend no criticism of the Lancaster team as this is very difficult data to analyse. Of course, the most important conclusion is that there is no signal that the relaxation in licensing resulted in an increase in accidents. I trust that my alternative world view will be taken constructively.