Trust in data – IV – trusting the team

Today (20 November 2013) I was reading an item in The Times (London) with the headline “We fiddle our crime numbers, admit police”. This is a fairly unedifying business.

The blame is once again laid at the door of government targets and performance related pay. I fear that this is akin to blaming police corruption on the largesse of criminals. If only organised crime would stop offering bribes, the police would not succumb to taking them in consideration of repudiating their office as constable, so the argument might run (pace Brian Joiner). Of course, the argument is nonsense. What we expect of police constables is honesty even, perhaps especially, when temptation presents itself. We expect the police to give truthful evidence in court, to deal with the public fairly and to conduct their investigations diligently and rationally. The public expects the police to behave in this way even in the face of manifest temptation to do otherwise. The public expects the same honest approach to reporting their performance. I think Robert Frank put it well in Passions within Reason.

The honest individual … is someone who values trustworthiness for its own sake. That he might receive a material payoff for such behaviour is beyond his concern. And it is precisely because he has this attitude that he can be trusted in situations where his behaviour cannot be monitored. Trustworthiness, provided it is recognizable, creates valuable opportunities that would not otherwise be available.

Matt Ridley put it starkly in his overview of evolutionary psychology, The Origins of Virtue. He wasn’t speaking of policing in particular.

The virtuous are virtuous for no other reason that it enables them to join forces with others who are virtuous, for mutual benefit.

What worried me most about the article was a remark from Peter Barron, a former detective chief superintendent in the Metropolitan Police. Should any individual challenge the distortion of data:

You are judged to be not a team player.

“Teamwork” can be a smokescreen for the most appalling bullying. In our current corporate cultures, to be branded as “not a team player” can be the most horrible slur, smearing the individual’s contribution to the overall mission. One can see how such an environment can allow a team’s behaviours and objectives to become misaligned from those of the parent organisation. That is a problem that can often be addressed by management with a proper system of goal deployment.

However, the problem is more severe when the team is in fact well aligned to what are distorted organisational goals. The remedies for this lie in the twin processes of governance and whistleblowing. Neither seem to be working very well in UK policing at the moment but that simply leaves an opportunity for process improvement. Work is underway. The English law of whistleblowing has been amended this year. If you aren’t familiar with it you can find it here.

Governance has to take scrutiny of data seriously. Reported performance needs to be compared with other sources of data. Reporting and recording processes need themselves to be assessed. Where there is no coherent picture questions need to be asked.

Adoption statistics for England – signals of improvement?

I am adopted so I follow the politics of adoption fairly carefully. I was therefore interested to see this report on the BBC, claiming a “record” increase in adoptions. The quotation marks are the BBC’s. The usual meaning of such quotes is that the word “record” is not being used with its usual meaning. I note that the story was repeated in several newspapers this morning.

The UK government were claiming a 15% increase in children adopted from local authority care over the last year and the highest total since data had been collected on this basis starting in 1992.

Most people will, I think, recognise what Don Wheeler calls an executive time series. A comparison of two numbers ignoring any broader historical trends or context. Of course, any two consecutive numbers will be different. One will be greater than the other. Without the context that gives rise to the data, a comparison of two numbers is uninformative.

I decided to look at the data myself by following the BBC link to the GOV.UK website. I found a spreadsheet there but only with data from 2009 to 2013. I dug around a little more and managed to find 2006 to 2008. However, the website told me that to find any earlier data I would have to consult the National Archives. At the same time it told me that the search function at the National Archives did not work. I ended up browsing 30 web pages of Department of Education documents and managed to get figures back to 2004. However, when I tried to browse back beyond documents dated January 2008, I got “Sorry, the page you were looking for can’t be found” and an invitation to use the search facility. Needless to say, I failed to find the missing data back to 1992, there or on the Office for National Statistics website. It could just be my internet search skills that are wanting but I spent an hour or so on this.

Gladly, Justin Ushie and Julie Glenndenning from the Department for Education were able to help me and provided much of the missing data. Many thanks to them both. Unfortunately, even they could not find the data for 1992 and 1993.

Here is the run chart.

Adoption1

Some caution is needed in interpreting this chart because there is clearly some substantial serial correlation in the annual data. That said, I am not able to quite persuade myself that the 2013 figure represents a signal. Things look much better than the mid-1990s but 2013 still looks consistent with a system that has been stable since the early years of the century.

The mid 1990s is a long time ago so I also wanted to look at adoptions as a percentage of children in care. I don’t think that that is automatically a better measure but I wanted to check that it didn’t yield a different picture.

Adoption2

That confirms the improvement since the mid-1990s but the 2013 figures now look even less remarkable against the experience base of the rest of the 21st century.

I would like to see these charts with all the interventions and policy changes of respective governments marked. That would then properly set the data in context and assist interpretation. There would be an opportunity to build a narrative, add natural process limits and come to a firmer view about whether there was a signal. Sadly, I have not found an easy way of building a chronology of intervention from government publications.

Anyone holding themselves out as having made an improvement must bring forward the whole of the relevant context for the data. That means plotting data over time and flagging background events. It is only then that the decision maker, or citizen, can make a proper assessment of whether there has been an improvement. The simple chart of data against time, even without natural process limits, is immensely richer than a comparison of two selected numbers.

Properly capturing context is the essence of data visualization and the beginnings of graphical excellence.

One my favourite slogans:

In God we trust. All else bring data.

W Edwards Deming

I plan to come back to this data in 2014.

The graph of doom – one year on

I recently came across the chart (sic) below on this web site.

GraphofDoom

It’s apparently called the “graph of doom”. It first came to public attention in May 2012 in the UK newspaper The Guardian. It purports to show how the London Borough of Barnet’s spending on social services will overtake the Borough’s total budget some time around 2022.

At first sight the chart doesn’t offend too much against the principles of graphical excellence as set down by Edward Tufte in his book The Visual Display of Quantitative Information. The bars could probably have been better replaced by lines and that would have saved some expensive, coloured non-data ink. That is a small quibble.

The most puzzling thing about the chart is that it shows very little data. I presume that the figures for 2010/11 are actuals. The 2011/12 may be provisional. But the rest of the area of the chart shows predictions. There is a lot of ink on this chart showing predictions and very little showing actual data. Further, the chart does not distinguish, graphically, between actual data and predictions. I worry that that might lend the dramatic picture more authority that it is really entitled to. The visible trend lies wholly in the predictions.

Some past history would have exposed variation in both funding and spending and enabled the viewer to set the predictions in that historical context. A chart showing a converging trend of historical data projected into the future is more impressive than a chart showing historical stability with all the convergence found in the future prediction. This chart does not tell us which is the actual picture.

Further, I suspect that this is not the first time the author had made a prediction of future funds or demand. What would interest me, were I in the position of decision maker, is some history of how those predictions have performed in the past.

We are now more than one year on from the original chart and I trust that the 2012/13 data is now available. Perhaps the authors have produced an updated chart but it has not made its way onto the internet.

The chart shows hardly any historical data. Such data would have been useful to a decision maker. The ink devoted to predictions could have been saved. All that was really needed was to say that spending was projected to exceed total income around 2022. Some attempt at quantifying the uncertainty in that prediction would also have been useful.

Graphical representations of data carry a potent authority. Unfortunately, when on the receiving end of most Powerpoint presentations we don’t have long to deconstruct them. We invest a lot of trust in the author of a chart that it can be taken at face value. That ought to be the chart’s function, to communicate the information in the data efficiently and as dramatically as the data and its context justifies.

I think that the following principles can usefully apply to the charting of predictions and forecasts.

  • Use ink on data rather than speculation.
  • Ditto for chart space.
  • Chart predictions using a distinctive colour or symbol so as to be less prominent than measured data.
  • Use historical data to set predictions in context.
  • Update chart as soon as predictions become data.
  • Ensure everybody who got the original chart gets the updated chart.
  • Leave the prediction on the updated chart.

The last point is what really sets predictions in context.

Note: I have tagged this post “Data visualization”, adopting the US spelling which I feel has become standard English.

Late-night drinking laws saved lives

That was the headline in The Times (London) on 19 August 2013. The copy went on:

“Hundreds of young people have escaped death on Britain’s roads after laws were relaxed to allow pubs to open late into the night, a study has found.”

It was accompanied by a chart.

How death toll fell

This conclusion was apparently based on a report detailing work led by Dr Colin Green at Lancaster University Management School. The report is not on the web but Lancaster were very kind in sending me a copy and I extend my thanks to them for the courtesy.

This is very difficult data to analyse. Any search for a signal has to be interpreted against a sustained fall in recorded accidents involving personal injury that goes back to the 1970s and is well illustrated in the lower part of the graphic (see here for data). The base accident data is therefore manifestly not stable and predictable. To draw inferences we need to be able to model the long term trend in a persuasive manner so that we can eliminate its influence and work with a residual data sequence amendable to statistical analysis.

It is important to note, however, that the authors had good reason to believe that relaxation of licensing laws may have an effect so this was a proper exercise in Confirmatory Data Analysis.

Reading the Lancaster report I learned that The Times graphic is composed of five-month moving averages. I do not think that I am attracted by that as a graphic. Shewhart’s Second Rule of Data Presentation is:

Whenever an average, range or histogram is used to summarise observations, the summary must not mislead the user into taking any action that the user would not take if the data were presented in context.

I fear that moving-averages will always obscure the message in the data. I preferred this chart from the Lancaster report. The upper series are for England, the lower for Scotland.

Drink Drive scatter

Now we can see the monthly observations. Subjectively there looks to be, at least in some years, some structure of variation throughout the year. That is unsurprising but it does ruin all hope of justifying an assumption of “independent identically distributed” residuals. Because of that alone, I feel that the use of p-values here is inappropriate, the usual criticisms of p-values in general notwithstanding (see the advocacy of Stephen Ziliak and Deirdre McCloskey).

As I said, this is very tricky data from which to separate signal and noise. Because of the patterned variation within any year I think that there is not much point in analysing other than annual aggregates. The analysis that I would have liked to have seen would have been a straight line regression through the whole of the annual data for England. There may be an initial disappointment that that gives us “less data to play with”. However, considering the correlation within the intra-year monthly figures, a little reflection confirms that there is very little sacrifice of real information. I’ve had a quick look at the annual aggregates for the period under investigation and I can’t see a signal. The analysis could be taken further by calculating an R2. That could then be compared with an R2 calculated for the Lancaster bi-linear “change point” model. Is the extra variation explained worth having for the extra parameters?

I see that the authors calculated an R2 of 42%. However, that includes accounting for the difference between English and Scottish data which is the dominant variation in the data set. I’m not sure what the Scottish data adds here other than to inflate R2.

There might also be an analysis approach by taking out the steady long term decline in injuries using a LOWESS curve then looking for a signal in the residuals.

What that really identifies are three ways of trying to cope with the long term declining trend, which is a nuisance in this analysis: straight line regression, straight line regression with “change point”, and LOWESS. If they don’t yield the same conclusions then any inference has to be treated with great caution. Inevitably, any signal is confounded with lack of stability and predictability in the long term trend.

I comment on this really to highlight the way the press use graphics without explaining what they mean. I intend no criticism of the Lancaster team as this is very difficult data to analyse. Of course, the most important conclusion is that there is no signal that the relaxation in licensing resulted in an increase in accidents. I trust that my alternative world view will be taken constructively.

Trust in data – I

I was listening to the BBC’s election coverage on 2 May (2013) when Nick Robinson announced that UKIP supporters were five times more likely than other voters to believe that the MMR vaccine was dangerous.

I had a search on the web. The following graphic had appeared on Mike Smithson’s PoliticalBetting blog on 21 April 2013.

MMR plot

It’s not an attractive bar chart. The bars are different colours. There is a “mean” bar that tends to make the variation look less than it is and makes the UKIP bar (next to it) look more extreme. I was, however, intrigued so I had a look for the original data which had come from a YouGov survey of 1765 respondents. You can find the data here.

Here is a summary of the salient points of the data from the YouGov website in a table which I think is less distracting than the graphic.

Voting   intention Con. Lab. Lib. Dem. UKIP
No. Of   respondents 417 518 142 212
% % % %
MMR safe 99 85 84 72
MMR unsafe 1 3 12 28
Don’t know 0 12 3 0

My first question was: Where had Nick Robinson and Mike Smithson got their numbers from? It is possible that there was another survey I have not found. It is also possible that I am being thick. In any event, the YouGov data raises some interesting questions. This is an exploratory date analysis exercise. We are looking for interesting theories. I don’t think there is any doubt that there is a signal in this data. How do we interpret it? There does look to be some relationship between voting intention and attitude to public safety data.

Should anyone be tempted to sneer at people with political views other than their own, it is worth remembering that it is unlikely that anyone surveyed had scrutinised any of the published scientific research on the topic. All will have digested it, most probably at third hand, through the press, internet, or cooler moment. They may not have any clear idea of the provenance of the assurances as to the vaccination’s safety. They may not have clearly identified issues as to whether what they had absorbed was a purportedly independent scientific study or a governmental policy statement that sought to rely on the science. I suspect that most of my readers have given it no more thought.

The mental process behind the answers probably wouldn’t withstand much analysis. This would be part of Kahneman’s System 1 thinking. However, the question of how such heuristics become established is an interesting one. I suspect there is a factor here that can be labelled “trust in data”.

Trust in data is an issue we all encounter, in business and in life. How do we know when we can trust data?

A starting point for many in this debate is the often cited observation of Brian Joiner that, when presented with a numerical target, a manager has three options: Manage the system so as to achieve the target, distort the system so the target is achieved but at the cost of performance elsewhere (possibly not on the dashboard), or simply distort the data. This, no doubt true, observation is then cited in support of the general proposition that management by numerical target is at best ineffective and at worst counter productive. John Seddon is a particular advocate of the view that, whatever benefits may flow from management by target (and they are seldom championed with any great energy), they are outweighed by the inevitable corruption of the organisation’s data generation and reporting.

It is an unhappy view. One immediate objection is that the broader system cannot operate without targets. Unless the machine part’s diameter is between 49.99 and 50.01 mm it will not fit. Unless chlorine concentrations are below the safe limit, swimmers risk being poisoned. Unless demand for working capital is cut by 10% we will face the consequences of insolvency. Advocates of the target free world respond that those matters can be characterised as the legitimate voice of the customer/ business. It is only arbitrary targets that are corrosive.

I am not persuaded that the legitimate/ arbitrary distinction is a real one, nor how the distinction motivates two different kinds of behaviour. I will blog more about this later. Leadership’s urgent task is to ensure that all managers have the tools to measure present reality and work to improve it. Without knowing how much improvement is essential a manager cannot make rational decisions about the allocation of resources. In that context, when the correct management control is exercised, improving the system is easier than cheating. I shall blog about goal deployment and Hoshin Kanri on another occasion.

Trust in data is just a factor of trust in general. In his popular book on evolutionary psychology and economics, The Origins of Virtue, Matt Ridley observes the following.

Trust is as vital a form of social capital as money is a form of actual capital. … Trust, like money, can be lent (‘I trust you because I trust the person who told me he trusts you’), and can be risked, hoarded or squandered. It pays dividends in the currency of more trust.

Within an organisation, trust in data is something for everybody to work on building collaboratively under diligent leadership. As to the public sphere, trust in data is related to trust in politicians and that may be a bigger problem to solve. It is also a salutary warning as to what happens when there is a failure of trust in leadership.