Here are all the published opinion polls for the June 2017 UK general election, plotted as a Shewhart chart.
The Conservative lead over Labour had been pretty constant at 16% from February 2017, after May’s Lancaster House speech. The initial Natural Process Limits (“NPLs”) on the chart extend back to that date. Then something odd happened in the polls around Easter. There were several polls above the upper NPL. That does not seem to fit with any surrounding event. Article 50 had been declared two weeks before and had had no real immediate impact.
I suspect that the “fugue state” around Easter was reflected in the respective parties’ private polling. It is possible that public reaction to the election announcement somehow locked in the phenomenon for a short while.
Things then seem to settle down to the 16% lead level again. However, the local election results at the bottom of the range of polls ought to have sounded some alarm bells. Local election results are not a reliable predictor of general elections but this data should not have felt very comforting.
Then the slide in lead begins. But when exactly? A lot of commentators have assumed that it was the badly received Conservative Party manifesto that started the decline. It is not possible to be definitive from the chart but it is certainly arguable that it was the leak of the Labour Party manifesto that started to shift voting intention.
Then the swing from Conservative to Labour continued unabated to polling day.
How did the individual pollsters fair? I have, somewhat arbitrarily, summarised all polls conducted in the 10 days before the election (29 May to 7 June). Here is the plot along with the actual popular poll result which gave a 2.5% margin of Conservative over Labour. That is the number that everybody was trying to predict.
The red points are the surveys from the 5 days before the election (3 to 7 June). Visually, they seem to be no closer, in general, than the other points (6 to 10 days before). The vertical lines are just an aid for the eye in grouping the points. The absence of “closing in” is confirmed by looking at the mean squared error (MSE) (in %2) for the points over 10 days (31.1) and 5 days (34.8). There is no evidence of polls closing in on the final result. The overall Shewhart chart certainly doesn’t suggest that.
Taking the polls over the 10 day period, then, here is the performance of the pollsters in terms of MSE. Lower MSE is better.
Norstat and Survation pollsters will have been enjoying bonuses on the morning after the election. There are a few other commendable performances.
I should also mention the YouGov model (the green line on the Shewhart chart) that has an MSE of 2.25. YouGov conduct web-based surveys against at huge data base or around 50,000 registered participants. They also collect, with permission, deep demographic data on those individuals concerning income, profession, education and other factors. There is enough published demographic data from the national census to judge whether that is a representative frame from which to sample.
YouGov did not poll and publish the raw, or even adjusted, voting intention. They used their poll to construct a model, perhaps a logistic regression or an artificial neural network, they don’t say, to predict voting intention from demographic factors. They then input into that model, not their own demographic data but data from the national census. That then gave their published forecast. I have to say that this looks about the best possible method for eliminating sampling frame effects.
It remains to be seen how widely this approach is adopted next time.