It was 20 years ago today …

File:W. Edwards Deming.gifToday, 20 December 2013, marks the twentieth anniversary of the death of W Edwards Deming. Deming was a hugely influential figure in management science, in Japan during the 1950s, 1960s and 1970s, then internationally from the early 1980s until his death. His memory persists in a continuing debate about his thinking among a small and aging sector of the operational excellence community, and in a broader reputation as a “management guru”, one of the writers who from the 1980s onwards championed and popularised the causes of employee engagement and business growth through customer satisfaction.

Deming’s training had been in mathematics and physics but in his professional life he first developed into a statistician, largely because of the influence of Walter Shewhart, an early mentor. It was fundamental to Deming’s beliefs that an organisation could only be managed effectively with widespread business measurement and trenchant statistical criticism of data. In that way he anticipated writers of a later generation such as Nate Silver and Nassim Taleb.

Since Deming’s death the operational excellence landscape has become more densely populated. In particular, lean operations and Six Sigma have variously been seen as competitors for Deming’s approach, as successors, usurpers, as complementary, as development, or as tools or tool sets to be deployed within Deming’s business strategy. In many ways, the pragmatic development of lean and Six Sigma have exposed the discursive, anecdotal and sometimes gnomic way Deming liked to communicate. In his book Out of the Crisis: Quality, Productivity and Competitive Position (1982) minor points are expanded over whole chapters while major ideas are finessed in a few words. Having elevated the importance of measurement and a proper system for responding to data he goes on to observe that the most important numbers are unknown and unknowable. I fear that this has often been an obstacle to managers finding the hard science in Deming.

For me, the core of Deming’s thinking remains this. There is only one game in town, the continual improvement of the alignment between the voice of the process and the voice of the customer. That improvement is achieved by the diligent use of process behaviour charts. Pursuit of that aim will collaterally reduce organisational costs.

Deming pursued the idea further. He asked what kind of organisation could most effectively exploit process behaviour charts. He sought philosophical justifications for successful heuristics. It is here that his writing became more difficult to accept for many people. In his last book, The New Economics for Industry, Government, Education, he trespassed on broader issues usually reserved to politics and social science, areas in which he was poorly qualified to contribute. The problem with Deming’s later work is that where it is new, it is not economics, and where it is economics, it is not new. It is this part of his writing that has tended to attract a few persistent followers. What is sad about Deming’s continued following is the lack of challenge. Every seminal thinker’s works are subject to repeated criticism, re-evaluation and development. Not simply development by accumulation but development by revision, deletion and synthesis. It is here that Deming’s memory is badly served. At the top of the page is a link to Deming’s Wikipedia entry. It is disturbing that everything is stated as though a settled and triumphant truth, a treatment that contrasts with the fact that his work is now largely ignored in mainstream management. Managers have found in lean and Six Sigma systems they could implement, even if only partially. In Deming they have not.

What Deming deserves, now that a generation, a global telecommunications system and a world wide web separate us from him, is a robust criticism and challenge of his work. The statistical thinking at the heart is profound. For me, the question of what sort of organisation is best placed to exploit that thinking remains open. Now is the time for the re-evaluation because I believe that out of it we can join in reaching new levels of operational excellence.

Advertisements

Trouble at the EU

I enjoy Metro the UK national free morning newspaper. It has a very straightforward non-partisan style. This morning there was an article dealing with the European Union’s (EU’s) accounting difficulties. There were a couple of very telling admissions from an EU bureaucrat. We lawyers love an admission.

Aidas Palubinskas, from the European Court of Auditors, … described the error rate as ‘relatively stable from year to year’.

He admits that the EU’s accounting is a stable system of trouble. That is a system where there is only common cause variation, variation common to the whole of the output, but where the system is still incapable of reliably delivering what the customer wants. Recognising that one is embedded in such a problem is the first step towards operational improvement. W Edwards Deming addressed the implications of the stable system and the strategy for its improvement at length in his seminal book Out of the Crisis (1982). The problems are not intractable but the solution demands leadership and adoption of the correct improvement approach.

Unfortunately, the second half of the quote is less encouraging.

He said the errors highlighted in its report were ‘examples of inefficiency, but not necessarily of waste’.

This makes me fear that the correct approach is far off for the EU. Everything that is not efficient, timely and effective delivery of what the customer wants is waste, as Toyota call it muda. Waste represents the scope of opportunity for improvement, for improving service and simultaneously reducing its cost. The first step in improvement is taken by accepting that waste is not inevitable and that it can be incrementally eliminated through use of appropriate tools under competent leadership.

The next step to improvement is to commit to the discipline of eliminating waste progressively. That requires leadership. That sort of leadership is often found in successful organisations. The EU, however, faces particular difficulties as an international bureaucracy with a multi-partisan political master and a democratically disengaged public. It is not easy to see where leadership will come from. This is a common problem of state bureaucracies.

Palubinskas is right to seek to analyse the problems as a stable system of trouble. However, beyond that, the path to radical improvement lies in rejecting the casual acceptance of waste and in committing to continual improvement of every process for delivery of service.

Suicide statistics for British railways

I chose a prosaic title because it’s not a subject about which levity is appropriate. I remain haunted by this cyclist on the level crossing. As a result I thought I would delve a little into railway accident statistics. The data is here. Unfortunately, the data only goes back to 2001/2002. This is a common feature of government data. There is no long term continuity in measurement to allow proper understanding of variation, trends and changes. All this encourages the “executive time series” that are familiar in press releases. I think that I shall call this political amnesia. When I have more time I shall look for a longer time series. The relevant department is usually helpful if contacted directly.

However, while I was searching I found this recent report on Railway Suicides in the UK: risk factors and prevention strategies. The report is by Kamaldeep Bhui and Jason Chalangary of the Wolfson Institute of Preventive Medicine, and Edgar Jones of the Institute of Psychiatry, King’s College, London. Originally, I didn’t intend to narrow my investigation to suicides but there were some things in the paper that bothered me and I felt were worth blogging about.

Obviously this is really important work. No civilised society is indifferent to tragedies such as suicide whose consequences are absorbed deeply into the community. The report analyses a wide base of theories and interventions concerning railway suicide risk. There is a lot of information and the authors have done an important job in bringing together and seeking conclusions. However, I was bothered by this passage (at p5).

The Rail Safety and Standards Board (RSSB) reported a progressive rise in suicides and suspected suicides from 192 in 2001-02 to a peak 233 in 2009-10, the total falling to 208 in 2010-11.

Oh dear! An “executive time series”. Let’s look at the data on a process behaviour chart.

RailwaySuicides1

There is no signal, even ignoring the last observation in 2011/2012 which the authors had not had to hand. There has been no increasing propensity for suicide since 2001. The writers have been, as Nassim Taleb would put it, “fooled by randomness”. In the words of Nate Silver, they have confused signal and noise. The common cause variation in the data has been over interpreted by zealous and well meaning policy makers as an upward trend. However, all diligent risk managers know that interpretation of a chart is forbidden if there is no signal. Over interpretation will lead to (well meaning) over adjustment and admixture of even more variation into a stable system of trouble.

Looking at the development of the data over time I can understand that there will have been a temptation to perform a regression analysis and calculate a p-value for the perceived slope. This is an approach to avoid in general. It is beset with the dangers of testing effects suggested by the data and the general criticisms of p-values made by McCloskey and Ziliak. It is not a method that will be a reliable guide to future action. For what it’s worth I got a p-value of 0.015 for the slope but I am not impressed. I looked to see if I could find a pattern in the data then tested for the pattern my mind had created. It is unsurprising that it was “significant”.

The authors of the report go on to interpret the two figures for 2009/2010 (233 suicides) and 2010/2011 (208 suicides) as a “fall in suicides”. It is clear from the process behaviour chart that this is not a signal of a fall in suicides. It is simply noise, common cause variation from year to year.

Having misidentified this as a signal they go on to seek a cause. Of course they “find” a potential cause. A partnership between Network Rail and the Samaritans, Men on the Ropes, had started in January 2010. The programme’s aim was to reduce suicides by 20% over five years. I genuinely hope that the programme shows success. However, the programme will not be assisted by thinking that it has yet shown signs of improvement.

With the current mean annual total at 211, a 20% reduction entails a new mean of 169 annual suicides.That is an ambitious target I think, and I want to emphasise that the programme is entirely laudable and plausible. However, whether it succeeds is to be judged by the figures on the process behaviour chart, not by any post hoc rationalisation. This is the tough discipline of the charts. It is no longer possible to claim an improvement where that is not supported by the data.

I will come back to this data next year and look to see if there are any signs of encouragement.

Trust in data – IV – trusting the team

Today (20 November 2013) I was reading an item in The Times (London) with the headline “We fiddle our crime numbers, admit police”. This is a fairly unedifying business.

The blame is once again laid at the door of government targets and performance related pay. I fear that this is akin to blaming police corruption on the largesse of criminals. If only organised crime would stop offering bribes, the police would not succumb to taking them in consideration of repudiating their office as constable, so the argument might run (pace Brian Joiner). Of course, the argument is nonsense. What we expect of police constables is honesty even, perhaps especially, when temptation presents itself. We expect the police to give truthful evidence in court, to deal with the public fairly and to conduct their investigations diligently and rationally. The public expects the police to behave in this way even in the face of manifest temptation to do otherwise. The public expects the same honest approach to reporting their performance. I think Robert Frank put it well in Passions within Reason.

The honest individual … is someone who values trustworthiness for its own sake. That he might receive a material payoff for such behaviour is beyond his concern. And it is precisely because he has this attitude that he can be trusted in situations where his behaviour cannot be monitored. Trustworthiness, provided it is recognizable, creates valuable opportunities that would not otherwise be available.

Matt Ridley put it starkly in his overview of evolutionary psychology, The Origins of Virtue. He wasn’t speaking of policing in particular.

The virtuous are virtuous for no other reason that it enables them to join forces with others who are virtuous, for mutual benefit.

What worried me most about the article was a remark from Peter Barron, a former detective chief superintendent in the Metropolitan Police. Should any individual challenge the distortion of data:

You are judged to be not a team player.

“Teamwork” can be a smokescreen for the most appalling bullying. In our current corporate cultures, to be branded as “not a team player” can be the most horrible slur, smearing the individual’s contribution to the overall mission. One can see how such an environment can allow a team’s behaviours and objectives to become misaligned from those of the parent organisation. That is a problem that can often be addressed by management with a proper system of goal deployment.

However, the problem is more severe when the team is in fact well aligned to what are distorted organisational goals. The remedies for this lie in the twin processes of governance and whistleblowing. Neither seem to be working very well in UK policing at the moment but that simply leaves an opportunity for process improvement. Work is underway. The English law of whistleblowing has been amended this year. If you aren’t familiar with it you can find it here.

Governance has to take scrutiny of data seriously. Reported performance needs to be compared with other sources of data. Reporting and recording processes need themselves to be assessed. Where there is no coherent picture questions need to be asked.

Adoption statistics for England – signals of improvement?

I am adopted so I follow the politics of adoption fairly carefully. I was therefore interested to see this report on the BBC, claiming a “record” increase in adoptions. The quotation marks are the BBC’s. The usual meaning of such quotes is that the word “record” is not being used with its usual meaning. I note that the story was repeated in several newspapers this morning.

The UK government were claiming a 15% increase in children adopted from local authority care over the last year and the highest total since data had been collected on this basis starting in 1992.

Most people will, I think, recognise what Don Wheeler calls an executive time series. A comparison of two numbers ignoring any broader historical trends or context. Of course, any two consecutive numbers will be different. One will be greater than the other. Without the context that gives rise to the data, a comparison of two numbers is uninformative.

I decided to look at the data myself by following the BBC link to the GOV.UK website. I found a spreadsheet there but only with data from 2009 to 2013. I dug around a little more and managed to find 2006 to 2008. However, the website told me that to find any earlier data I would have to consult the National Archives. At the same time it told me that the search function at the National Archives did not work. I ended up browsing 30 web pages of Department of Education documents and managed to get figures back to 2004. However, when I tried to browse back beyond documents dated January 2008, I got “Sorry, the page you were looking for can’t be found” and an invitation to use the search facility. Needless to say, I failed to find the missing data back to 1992, there or on the Office for National Statistics website. It could just be my internet search skills that are wanting but I spent an hour or so on this.

Gladly, Justin Ushie and Julie Glenndenning from the Department for Education were able to help me and provided much of the missing data. Many thanks to them both. Unfortunately, even they could not find the data for 1992 and 1993.

Here is the run chart.

Adoption1

Some caution is needed in interpreting this chart because there is clearly some substantial serial correlation in the annual data. That said, I am not able to quite persuade myself that the 2013 figure represents a signal. Things look much better than the mid-1990s but 2013 still looks consistent with a system that has been stable since the early years of the century.

The mid 1990s is a long time ago so I also wanted to look at adoptions as a percentage of children in care. I don’t think that that is automatically a better measure but I wanted to check that it didn’t yield a different picture.

Adoption2

That confirms the improvement since the mid-1990s but the 2013 figures now look even less remarkable against the experience base of the rest of the 21st century.

I would like to see these charts with all the interventions and policy changes of respective governments marked. That would then properly set the data in context and assist interpretation. There would be an opportunity to build a narrative, add natural process limits and come to a firmer view about whether there was a signal. Sadly, I have not found an easy way of building a chronology of intervention from government publications.

Anyone holding themselves out as having made an improvement must bring forward the whole of the relevant context for the data. That means plotting data over time and flagging background events. It is only then that the decision maker, or citizen, can make a proper assessment of whether there has been an improvement. The simple chart of data against time, even without natural process limits, is immensely richer than a comparison of two selected numbers.

Properly capturing context is the essence of data visualization and the beginnings of graphical excellence.

One my favourite slogans:

In God we trust. All else bring data.

W Edwards Deming

I plan to come back to this data in 2014.

The graph of doom – one year on

I recently came across the chart (sic) below on this web site.

GraphofDoom

It’s apparently called the “graph of doom”. It first came to public attention in May 2012 in the UK newspaper The Guardian. It purports to show how the London Borough of Barnet’s spending on social services will overtake the Borough’s total budget some time around 2022.

At first sight the chart doesn’t offend too much against the principles of graphical excellence as set down by Edward Tufte in his book The Visual Display of Quantitative Information. The bars could probably have been better replaced by lines and that would have saved some expensive, coloured non-data ink. That is a small quibble.

The most puzzling thing about the chart is that it shows very little data. I presume that the figures for 2010/11 are actuals. The 2011/12 may be provisional. But the rest of the area of the chart shows predictions. There is a lot of ink on this chart showing predictions and very little showing actual data. Further, the chart does not distinguish, graphically, between actual data and predictions. I worry that that might lend the dramatic picture more authority that it is really entitled to. The visible trend lies wholly in the predictions.

Some past history would have exposed variation in both funding and spending and enabled the viewer to set the predictions in that historical context. A chart showing a converging trend of historical data projected into the future is more impressive than a chart showing historical stability with all the convergence found in the future prediction. This chart does not tell us which is the actual picture.

Further, I suspect that this is not the first time the author had made a prediction of future funds or demand. What would interest me, were I in the position of decision maker, is some history of how those predictions have performed in the past.

We are now more than one year on from the original chart and I trust that the 2012/13 data is now available. Perhaps the authors have produced an updated chart but it has not made its way onto the internet.

The chart shows hardly any historical data. Such data would have been useful to a decision maker. The ink devoted to predictions could have been saved. All that was really needed was to say that spending was projected to exceed total income around 2022. Some attempt at quantifying the uncertainty in that prediction would also have been useful.

Graphical representations of data carry a potent authority. Unfortunately, when on the receiving end of most Powerpoint presentations we don’t have long to deconstruct them. We invest a lot of trust in the author of a chart that it can be taken at face value. That ought to be the chart’s function, to communicate the information in the data efficiently and as dramatically as the data and its context justifies.

I think that the following principles can usefully apply to the charting of predictions and forecasts.

  • Use ink on data rather than speculation.
  • Ditto for chart space.
  • Chart predictions using a distinctive colour or symbol so as to be less prominent than measured data.
  • Use historical data to set predictions in context.
  • Update chart as soon as predictions become data.
  • Ensure everybody who got the original chart gets the updated chart.
  • Leave the prediction on the updated chart.

The last point is what really sets predictions in context.

Note: I have tagged this post “Data visualization”, adopting the US spelling which I feel has become standard English.

Late-night drinking laws saved lives

That was the headline in The Times (London) on 19 August 2013. The copy went on:

“Hundreds of young people have escaped death on Britain’s roads after laws were relaxed to allow pubs to open late into the night, a study has found.”

It was accompanied by a chart.

How death toll fell

This conclusion was apparently based on a report detailing work led by Dr Colin Green at Lancaster University Management School. The report is not on the web but Lancaster were very kind in sending me a copy and I extend my thanks to them for the courtesy.

This is very difficult data to analyse. Any search for a signal has to be interpreted against a sustained fall in recorded accidents involving personal injury that goes back to the 1970s and is well illustrated in the lower part of the graphic (see here for data). The base accident data is therefore manifestly not stable and predictable. To draw inferences we need to be able to model the long term trend in a persuasive manner so that we can eliminate its influence and work with a residual data sequence amendable to statistical analysis.

It is important to note, however, that the authors had good reason to believe that relaxation of licensing laws may have an effect so this was a proper exercise in Confirmatory Data Analysis.

Reading the Lancaster report I learned that The Times graphic is composed of five-month moving averages. I do not think that I am attracted by that as a graphic. Shewhart’s Second Rule of Data Presentation is:

Whenever an average, range or histogram is used to summarise observations, the summary must not mislead the user into taking any action that the user would not take if the data were presented in context.

I fear that moving-averages will always obscure the message in the data. I preferred this chart from the Lancaster report. The upper series are for England, the lower for Scotland.

Drink Drive scatter

Now we can see the monthly observations. Subjectively there looks to be, at least in some years, some structure of variation throughout the year. That is unsurprising but it does ruin all hope of justifying an assumption of “independent identically distributed” residuals. Because of that alone, I feel that the use of p-values here is inappropriate, the usual criticisms of p-values in general notwithstanding (see the advocacy of Stephen Ziliak and Deirdre McCloskey).

As I said, this is very tricky data from which to separate signal and noise. Because of the patterned variation within any year I think that there is not much point in analysing other than annual aggregates. The analysis that I would have liked to have seen would have been a straight line regression through the whole of the annual data for England. There may be an initial disappointment that that gives us “less data to play with”. However, considering the correlation within the intra-year monthly figures, a little reflection confirms that there is very little sacrifice of real information. I’ve had a quick look at the annual aggregates for the period under investigation and I can’t see a signal. The analysis could be taken further by calculating an R2. That could then be compared with an R2 calculated for the Lancaster bi-linear “change point” model. Is the extra variation explained worth having for the extra parameters?

I see that the authors calculated an R2 of 42%. However, that includes accounting for the difference between English and Scottish data which is the dominant variation in the data set. I’m not sure what the Scottish data adds here other than to inflate R2.

There might also be an analysis approach by taking out the steady long term decline in injuries using a LOWESS curve then looking for a signal in the residuals.

What that really identifies are three ways of trying to cope with the long term declining trend, which is a nuisance in this analysis: straight line regression, straight line regression with “change point”, and LOWESS. If they don’t yield the same conclusions then any inference has to be treated with great caution. Inevitably, any signal is confounded with lack of stability and predictability in the long term trend.

I comment on this really to highlight the way the press use graphics without explaining what they mean. I intend no criticism of the Lancaster team as this is very difficult data to analyse. Of course, the most important conclusion is that there is no signal that the relaxation in licensing resulted in an increase in accidents. I trust that my alternative world view will be taken constructively.