Soccer management – signal, noise and contract negotiation

Some poor data journalism here from the BBC on 28 May 2015, concerning turnover in professional soccer managers in England. “Managerial sackings reach highest level for 13 years” says the headline. A classic executive time series. What is the significance of the 13 years? Other than it being the last year with more sackings than the present.

The data was purportedly from the League Managers’ Association (LMA) and their Richard Bevan thought the matter “very concerning”. The BBC provided a chart (fair use claimed).

MgrSackingsto201503

Now, I had a couple of thoughts as soon as I saw this. Firstly, why chart only back to 2005/6? More importantly, this looked to me like a stable system of trouble (for football managers) with the possible exception of this (2014/15) season’s Championship coach turnover. Personally, I detest multiple time series on a common chart unless there is a good reason for doing so. I do not think it the best way of showing variation and/ or association.

Signal and noise

The first task of any analyst looking at data is to seek to separate signal from noise. Nate Silver made this point powerfully in his book The Signal and the Noise: The Art and Science of Prediction. As Don Wheeler put it: all data has noise; some data has signal.

Noise is typically the irregular aggregate of many causes. It is predictable in the same way as a roulette wheel. A signal is a sign of some underlying factor that has had so large an effect that it stands out from the noise. Signals can herald a fundamental unpredictability of future behaviour.

If we find a signal we look for a special cause. If we start assigning special causes to observations that are simply noise then, at best, we spend money and effort to no effect and, at worst, we aggravate the situation.

The Championship data

In any event, I wanted to look at the data for myself. I was most interested in the Championship data as that was where the BBC and LMA had been quick to find a signal. I looked on the LMA’s website and this is the latest data I found. The data only records dismissals up to 31 March of the 2014/15 season. There were 16. The data in the report gives the total number of dismissals for each preceding season back to 2005/6. The report separates out “dismissals” from “resignations” but does not say exactly how the classification was made. It can be ambiguous. A manager may well resign because he feels his club have themselves repudiated his contract, a situation known in England as constructive dismissal.

The BBC’s analysis included dismissals right up to the end of each season including 2014/15. Reading from the chart they had 20. The BBC have added some data for 2014/15 that isn’t in the LMA report and not given the source. I regard that as poor data journalism.

I found one source of further data at website The Sack Race. That told me that since the end of March there had been four terminations.

Manager Club Termination Date
Malky Mackay Wigan Athletic Sacked 6 April
Lee Clark Blackpool Resigned 9 May
Neil Redfearn Leeds United Contract expired 20 May
Steve McClaren Derby County Sacked 25 May

As far as I can tell, “dismissals” include contract non-renewals and terminations by mutual consent. There are then a further three dismissals, not four. However, Clark left Blackpool amid some corporate chaos. That is certainly a termination that is classifiable either way. In any event, I have taken the BBC figure at face value though I am alerted as to some possible data quality issues here.

Signal and noise

Looking at the Championship data, this was the process behaviour chart, plotted as an individuals chart.

MgrSackingsto201503

There is a clear signal for the 2014/15 season with an observation, 20 dismissals,, above the upper natural process limit of 19.18 dismissals. Where there is a signal we should seek a special cause. There is no guarantee that we will find a special cause. Data limitations and bounded rationality are always constraints. In fact, there is no guarantee that there was a special cause. The signal could be a false positive. Such effects cannot be eliminated. However, signals efficiently direct our limited energy for, what Daniel Kahneman calls, System 2 thinking towards the most promising enquiries.

Analysis

The BBC reports one narrative woven round the data.

Bevan said the current tenure of those employed in the second tier was about eight months. And the demand to reach the top flight, where a new record £5.14bn TV deal is set to begin in 2016, had led to clubs hitting the “panic button” too quickly.

It is certainly a plausible view. I compiled a list of the dismissals and non-renewals, not the resignations, with data from Wikipedia and The Sack Race. I only identified 17 which again suggests some data quality issue around classification. I have then charted a scatter plot of date of dismissal against the club’s then league position.

MgrSackings201415

It certainly looks as though risk of relegation is the major driver for dismissal. Aside from that, Watford dismissed Billy McKinlay after only two games when they were third in the league, equal on points with the top two. McKinlay had been an emergency appointment after Oscar Garcia had been compelled to resign through ill health. Watford thought they had quickly found a better manager in Slavisa Jokanovic. Watford ended the season in second place and were promoted to the Premiership.

There were two dismissals after the final game on 2 May by disappointed mid-table teams. Beyond that, the only evidence for impulsive managerial changes in pursuit of promotion is the three mid-season, mid-table dismissals.

Club league position
Manager Club On dismissal At end of season
Nigel Adkins Reading 16 19
Bob Peeters Charlton Athletic 14 12
Stuart Pearce Nottingham Forrest 12 14

A table that speaks for itself. I am not impressed by the argument that there has been the sort of increase in panic sackings that Bevan fears. Both Blackpool and Leeds experienced chaotic executive management which will have resulted in an enhanced force of mortality on their respective coaches. That along with the data quality issues and the technical matter I have described below lead me to feel that there was no great enhanced threat to the typical Championship manager in 2014/15.

Next season I would expect some regression to the mean with a lower number of dismissals. Not much of a prediction really but that’s what the data tells me. If Bevan tries to attribute that to the LMA’s activism them I fear that he will be indulging in Langian statistical analysis. Will he be able to resist?

Techie bit

I have a preference for individuals charts but I did also try plotting the data on an np-chart where I found no signal. It is trite service-course statistics that a Poisson distribution with mean λ has standard deviation √λ so an upper 3-sigma limit for a (homogeneous) Poisson process with mean 11.1 dismissals would be 21.1 dismissals. Kahneman has cogently highlighted how people tend to see patterns in data as signals even where they are typical of mere noise. In this case I am aware that the data is not atypical of a Poisson process so I am unsurprised that I failed to identify a special cause.

A Poisson process with mean 11.1 dismissals is a pretty good model going forwards and that is the basis I would press on any managers in contract negotiations.

Of course, the clubs should remember that when they look for a replacement manager they will then take a random sample from the pool of job seekers. Really!

Advertisements

Anecdotes and p-values

JellyBellyBeans.jpgI have been feeling guilty ever since I recently published a p-value. It led me to sit down and think hard about why I could not resist doing so and what I really think it told me, if anything. I suppose that a collateral question is to ask why I didn’t keep it to myself. To be honest, I quite often calculate p-values though I seldom let on.

It occurred to me that there was something in common between p-values and the anecdotes that I have blogged about here and here. Hence more jellybeans.

What is a p-value?

My starting data was the conversion rates of 10 elite soccer penalty takers. Each of their conversion rates was different. Leighton Baines had the best figures having converted 11 out of 11. Peter Beardsley and Emmanuel Adebayor had the superficially weakest, having converted 18 out of 20 and 9 out of 10 respectively. To an analyst that raises a natural question. Was the variation between the performance signal or was it noise?

In his rather discursive book The Signal and the Noise: The Art and Science of Prediction, Nate Silver observes:

The signal is the truth. The noise is what distracts us from the truth.

In the penalties data the signal, the truth, that we are looking for is Who is the best penalty taker and how good are they? The noise is the sampling variation inherent in a short sequence of penalty kicks. Take a coin and toss it 10 times. Count the number of heads. Make another 10 tosses. And a third 10. It is unlikely that you got the same number of heads but that was not because anything changed in the coin. The variation between the three counts is all down to the short sequence of tosses, the sampling variation.

In Understanding Variation: The Key to Managing ChaosDon Wheeler observes:

All data has noise. Some data has signal.

We first want to know whether the penalty statistics display nothing more than sampling variation or whether there is also a signal that some penalty takers are better than others, some extra variation arising from that cause.

The p-value told me the probability that we could have observed the data we did had the variation been solely down to noise, 0.8%. Unlikely.

p-Values do not answer the exam question

The first problem is that p-values do not give me anything near what I really want. I want to know, given the observed data, what it the probability that penalty conversion rates are just noise. The p-value tells me the probability that, were penalty conversion rates just noise, I would have observed the data I did.

The distinction is between the probability of data given a theory and the probability of a theory give then data. It is usually the latter that is interesting. Now this may seem like a fine distinction without a difference. However, consider the probability that somebody with measles has spots. It is, I think, pretty close to one. Now consider the probability that somebody with spots has measles. Many things other than measles cause spots so that probability is going to be very much less than one. I would need a lot of information to come to an exact assessment.

In general, Bayes’ theorem governs the relationship between the two probabilities. However, practical use requires more information than I have or am likely to get. The p-values consider all the possible data that you might have got if the theory were true. It seems more rational to consider all the different theories that the actual data might support or imply. However, that is not so simple.

A dumb question

In any event, I know the answer to the question of whether some penalty takers are better than others. Of course they are. In that sense p-values fail to answer a question to which I already know the answer. Further, collecting more and more data increases the power of the procedure (the probability that it dodges a false negative). Thus, by doing no more than collecting enough data I can make the p-value as small as I like. A small p-value may have more to do with the number of observations than it has with anything interesting in penalty kicks.

That said, what I was trying to do in the blog was to set a benchmark for elite penalty taking. As such this was an enumerative study. Of course, had I been trying to select a penalty taker for my team, that would have been an analytic study and I would have to have worried additionally about stability.

Problems, problems

There is a further question about whether the data variation arose from happenstance such as one or more players having had the advantage of weather or ineffective goalkeepers. This is an observational study not a designed experiment.

And even if I observe a signal, the p-value does not tell me how big it is. And it doesn’t tell me who is the best or worst penalty taker. As R A Fisher observed, just because we know there had been a murder we do not necessarily know who was the murderer.

E pur si muove

It seems then that individuals will have different ways of interpreting p-values. They do reveal something about the data but it is not easy to say what it is. It is suggestive of a signal but no more. There will be very many cases where there are better alternative analytics about which there is less ambiguity, for example Bayes factors.

However, in the limited case of what I might call alternative-free model criticism I think that the p-value does provide me with some insight. Just to ask the question of whether the data is consistent with the simplest of models. However, it is a similar insight to that of an anecdote: of vague weight with little hope of forming a consensus round its interpretation. I will continue to calculate them but I think it better if I keep quiet about it.

R A Fisher often comes in for censure as having done more than anyone to advance the cult of p-values. I think that is unfair. Fisher only saw p-values as part of the evidence that a researcher would have to hand in reaching a decision. He saw the intelligent use of p-values and significance tests as very different from the, as he saw it, mechanistic practices of hypothesis testing and acceptance procedures on the Neyman-Pearson model.

In an acceptance procedure, on the other hand, acceptance is irreversible, whether the evidence for it was strong or weak. It is the result of applying mechanically rules laid down in advance; no thought is given to the particular case, and the tester’s state of mind, or his capacity for learning is inoperative. By contrast, the conclusions drawn by a scientific worker from a test of significance are provisional, and involve an intelligent attempt to understand the experimental situation.

“Statistical methods and scientific induction”
Journal of the Royal Statistical Society Series B 17: 69–78. 1955, at 74-75

Fisher was well known for his robust, sometimes spiteful, views on other people’s work. However, it was Maurice Kendall in his obituary of Fisher who observed that:

… a man’s attitude toward inference, like his attitude towards religion, is determined by his emotional make-up, not by reason or mathematics.

Deconstructing Deming VII – Adopt and institute leadership

7. Adopt and institute leadership.

W Edwards Deming Point 7 of Deming’s 14 Points. This point leaves me with some of the same uncertainty as Point 6 Institute training on the job. But everybody thinks they know what training is. Leadership is a much more elusive concept.

In a recent review of Archie Brown’s book The Myth of the Strong Leader: Political Leadership in the Modern Age (Times (London) 12 April 2014), Philip Collins observed as follows.

The problem with Brown’s book is his idea that there is a single entity called “leadership” that covers all these categories. It does not follow from the existence of leaders that there is such a thing as “leadership”. It may be no more possible to distil wisdom on leadership than it is on love. Every lover is different, I would imagine. There doesn’t seem to be much profit in the attempt to set out a theory of “lovership” as if there were common traits in every act of seduction.

Collins identifies a common discomfort. Yet there remain good and bad leaders, as there are good and bad lovers. All who aspire to improve must start by distinguishing the characteristics of the good and the bad.

Deming elaborates his own Point 7 further in Out of the Crisis and, predictably, several distinct positions emerge. I identify four but they don’t all help me understanding what leadership is.

1. Abolish focus on outcomes

Deming’s point is well taken that, for the statistically naïve, day to day management based on historical outcomes typically leads to over adjustment, what Deming called tampering. The consequences are increased operating costs that have been themselves induced by the over active management.

However, outcomes must be the overriding benchmark by which all management is measured. The problem with the over adjustment that flows from a lack of rigorous criticism of data is that it frustrates the very outcomes it aspired to serve. There has ultimately to be some measure of success and failure, an outcome. That is the inevitable focus of every leader.

2. Remove barriers to pride in workmanship

This is picked up at greater depth in Deming’s Point 12. I shall come back to it then.

3. Leaders must know the work they supervise

Alan Clark was a British politician, a very minor, and comically gaff prone, minister in the Thatcher government of the 1980s. He is now mostly remembered as a notorious self styled bon viveur and womaniser. His diaries are as scandalous as they are apocryphal. A good read for those who like that sort of thing.

In 1961, Clark published an historical work about the First World War, The Donkeys. The book adopted a common popular sentiment of mid-twentieth-century Britain, that the enlisted men of the war were lions led by donkeys. The donkeys were the officer class, their leaders. Clark helped to reinforce the idea that the private soldier was brave and capable, but betrayed by a self styled elite who failed to equip and direct them with commensurate valour. Historian Basil Liddell Hart endorsed Clark’s proofs.

To be fair there is legitimate controversy about the matter. But I think that now academic, and certainly popular, sentiment has swung the other way, no longer regarding the leaders as incompetent and indifferent, but rather as diligent and compassionate though overwhelmed. Historian Robin Neillands put it thus:

… the idea that they were indifferent to the sufferings of their men is constantly refuted by the facts, and only endures because some commentators wish to perpetuate the myth that these generals, representing the upper classes, did not give a damn what happened to the lower orders.

I find Deming content to perpetuate a similar trope about industrial managers in his writings. In Out of the Crisis:

There was a time, years ago, when a foreman selected his people, trained them, helped them, worked with them. He knew the job. … Supervision on the factory floor is, I fear, in many companies, an entry position for college boys and girls [sic] to learn about the company, six months here, six months there. … He does not understand the problem. and could get nothing done about it if he did.

I frankly don’t know where to start with that. It goes on. I constantly see Deming’s followers approving and sharing this sort of article. They all simply have the whiff of lamp oil about them. They fail to ring true and betray the same sort of lazy, chippy, defensive emotions as the donkeys attribution.

Other than in the simplest of endeavours, perhaps a window cleaning business, perhaps, the value of an enterprise flows from the confluence and integration of diverse materials, skills, technologies, knowledge and people. A manager or leader is the person who makes that confluence occur. But for the manager it would not have happened. Inevitably that means that the leader’s domain knowledge of any particular element is limited. It is the manager’s ability to absorb and assimilate information from a variety of sources that enables the enterprise. Leadership demands capacity to trust that other people know what they are doing, and to use the borrowing strength of diverse sources of information to signal when assumptions are betrayed. The hope that the leader can be a craft master of all he or she seeks to integrate is forlorn.

4. Leaders understand variation

I dealt with this under Point 6. It is a strong point. Without understanding of statistics, rigorous criticism of historical data is impossible. Signal and noise cannot be efficiently separated. That leads to over adjustment, tampering, increased costs and frustrated outcome. Only managers who are not held to outcomes will ultimately be indulged in an innumerate pursuit of over adjustment. But it takes a long time for things to shake out.

The role of a manager of people

Deming wrote under this head in his last book The New Economics. There are another 14 points with overlaps and extensions of his original 14. A lot of it expands Principal Point 12. I will need to come back to them at another time. However, Deming certainly saw a leader as somebody with a plan and an ability to explain the plan to the workforce.

Attempts to define leadership abound yet no single one is, to me, compelling. However, part of it must be engagement with strategy. Strategy is the way of dealing with the painful experience that plans do not survive for very long. I liked the way Lawrence Freedman put it in his recent Strategy: A History.

The strategist has to accept that even when there is an obvious climax (a battle or an election), the story line will still be open-ended … leaving a number of issues to be resolved later. Even when the desired endpoint is reached, it is not really the end, The enemy may have surrendered, the election won, the target company taken over, the revolutionary opportunity seized, but that just means there is now an occupied country to run, a new government to be formed, a whole new revolutionary order to be established, or distinctive sets of corporate activities to be merged. … The transition is immediate and may well be conditional on how the original endpoint was reached. This takes us back to the observation that much strategy is about getting to the next stage rather than some ultimate destination. Rather than think of strategy as a three-act play, it is better to think of it as a soap opera with a continuing cast of characters and plot lines that unfold over a series of episodes. Each of these episodes will be self-contained and set up the subsequent episode. Unlike a play with a definite ending, there is no need for a soap opera to ever reach a conclusion, even though the central characters and their circumstances change.

That leads us to my first response to Deming’s Point 7.

  • Leaders take responsibility for aligning outcomes to targets.
  • Targets are in constant motion.
  • Continual rigorous statistical criticism of historical data is the way to align outcomes and targets, by avoiding over adjustment and by navigating the sort of strategic soap opera Freedman describes.
  • Leaders need to trust that their team know what they are doing.
  • Leaders use the borrowing strength of diverse data to monitor performance.

There is much else to leadership. I have not addressed people or engagement. That takes me back to Deming’s Principal Point 12 (yet to come). I want to look closely at those topics at a later time within the framework of Max Weber’s ethics of responsibility.

I also want to come back to Freedman’s narrative approach to strategy and the work of G L S Shackle on statisics, economics and imagination. It will have to wait.

Deconstructing Deming VI – Institute training on the job

6. Institute training on the job.

W Edwards Deming Point 6 of Deming’s 14 Points. I think it was this point that made me realise that everybody projects their own anxieties onto Deming’s writings and finds what they want to find there.

Deming elaborates this point further in Out of the Crisis and several distinct positions emerge. I identify nine. In many ways, the slogan Institute training on the job is no very good description of what Deming was seeking to communicate. Not everything sits well under this heading.

“Training”, along with its sagacious uncle, “education” is one of those things that every one can be in favour of. The systems by which the accumulated knowledge of humanity are communicated, criticised and developed are the foundations of civilisation. But like all accepted truths some scrutiny repays the time and effort. Here are the nine topics I identified in Out of the Crisis.

1. People don’t spend enough on training because the benefits do not show on the balance sheet

This was one of Deming’s targets behind his sixth point. It reiterates a common theme of his. It goes back to the criticisms of Hayes and Abernathy that managers were incapable of understanding their own business. Without such understanding, a manager would lack a narrative to envision the future material rewards of current spending. Cash movements showed on the profit and loss account. The spending became merely an overhead to be attacked so as to enhance the current picture of performance projected by the accounts, the visible figures.

I have considered Hayes and Abernathy’s analysis elsewhere. Whatever the conditions of the early 1980s in the US, I think today’s global marketplace is a very different arena. Organisations vie to invest in their people, as this recent Forbes article shows (though the author can’t spell “bellwether”). True, the article confirms that development spending falls in a recession but cash flow and the availability of working capital are real constraints on a business and have to be managed. Once optimism returns, training spend takes off.

But as US satirist P J O’Rourke observed:

Getting people to give vast amounts of money when there’s no firm idea what that money will do is like throwing maidens down a well. It’s an appeal to magic. And the results are likely to be as stupid and disappointing as the results of magic usually are.

The tragedy of so many corporations is that training budgets are set and value measured on how much money is spent, in the idealistic but sentimental belief that training is an inherent good and that rewards will inevitably flow to those who have faith.

The reality is that it is only within a system of rigorous goal deployment that local training objectives can be identified so as to serve corporate strategy. Only then can training be designed to serve those objectives and only then can training’s value be measured.

2. Root Cause Analysis

The other arena in which the word “training” is guaranteed to turn up is during Root Cause Analysis. It is a moral certainty that somebody will volunteer it somewhere on the Ishikawa diagram. “To stop this happening again, let’s repeat the training.”

Yet, failure of training can never be the root cause of a problem or defect. Such an assertion yields too readily to the question Why did lack of training cause the failure?. The Why? question exposes that there was something the training was supposed to do. It could be that the root cause is readily identified and training put in place as a solution. But, the question could expose that, whatever the perceived past failures in training, the root cause, that the training would have purportedly addressed, remains obscure. Forget worrying about training until the root cause is identified within the system.

In any event, training will seldom be the best way of eliminating a problem. Redesign of the system will always be the first thing to consider.

3. Train managers and new employees

Uncontroversial but I think Deming overstated businesses’ failure to appreciate this.

4. Managers need to understand the company

Uncontroversial but I think Deming overstated businesses’ failure to appreciate this.

5. Managers need to understand variation

So much of Deming’s approach was about rigorous criticism of business data and the diligent separation of signal and noise. Those are topics that certainly have greater salience than a quarter of a century ago. Nate Silver has done much to awaken appetites for statistical thinking and the Six Sigma discipline has alerted the many to the wealth of available tools and techniques. Despite that, I am unpersuaded that genuine statistical literacy and numeracy (both are important) are any more common now than in the days of the first IBM PC.

Deming’s banner headline here is Institute training on the job. I think the point sits uncomfortably. I would have imagined that it is business schools and not employers who should apply their energies to developing and promoting quantitative skills in executives. One of the distractions that has beset industrial statistics is its propensity to create a variety of vernacular approaches with conflicting vocabularies and competing champion priorities: Taguchi methods, Six Sigma, SPC, Shainin, … . The situation is aggravated by the differential enthusiasms between corporations for the individual brands. Even within a single strand such as Six Sigma there is a frustrating variety of nomenclature, content and emphasis.

It’s not training on the job that’s needed. It is the academic industry here that is failing to provide what business needs.

6. Recognise that people learn in different ways

Of this I remain unpersuaded. I do not believe that people learn to drive motor cars in different ways. It can’t be done from theory alone. It can’t be done by writing a song about it. it comes from a subtle interaction of experience and direction. Some people learn without the direction, perhaps because they watch Nelly (see below).

Many have found a resonance between Deming’s point and the Theory of Multiple Intelligences. I fear this has distracted from some of the important themes in business education. As far as I can see, the theory has no real empirical support. Professor John White of the University of London, Institute of Education has firmly debunked the idea (Howard Gardner : the myth of Multiple Intelligences).

7. Don’t rely on watch Nelly

After my academic and vocational training as a lawyer, I followed a senior barrister around for six months, then slightly less closely for another six months. I also went to court and sat behind barristers in their first few years of practice so that I could smell what I would be doing a few months later.

It was important. So was the academic study and so was the classroom vocational training. It comes back to understanding how the training is supposed to achieve its objectives and designing learning from that standpoint.

8. Be inflexible as to work standards

This is tremendously dangerous advice for anybody lacking statistical literacy and numeracy (both).

I will come back to this but it embraces some of my earlier postings on process discipline.

9. Teach customer needs

This is the gem. Employee engagement is a popular concern. Employees who have no sight of how their job impacts the customer, who pays their wages, will soon see the process discipline that is essential to operational excellence as arbitrary and vexatious. Their mindfulness and diligence cannot but be affected by the expectation that they can operate in a cognitive vacuum.

Walter Shewhart famously observed that Data have no meaning apart from their context. By extension, continual re-orientation to the Voice of the Customer gives meaning to structure, process and procedure on the shop floor; it resolves ambiguity as to method in favour of the end-user; it fosters extrinsic, rather than intrinsic, motivation; and it sets the external standard by which conduct and alignment to the business will be judged and governed.

It was 20 years ago today …

File:W. Edwards Deming.gifToday, 20 December 2013, marks the twentieth anniversary of the death of W Edwards Deming. Deming was a hugely influential figure in management science, in Japan during the 1950s, 1960s and 1970s, then internationally from the early 1980s until his death. His memory persists in a continuing debate about his thinking among a small and aging sector of the operational excellence community, and in a broader reputation as a “management guru”, one of the writers who from the 1980s onwards championed and popularised the causes of employee engagement and business growth through customer satisfaction.

Deming’s training had been in mathematics and physics but in his professional life he first developed into a statistician, largely because of the influence of Walter Shewhart, an early mentor. It was fundamental to Deming’s beliefs that an organisation could only be managed effectively with widespread business measurement and trenchant statistical criticism of data. In that way he anticipated writers of a later generation such as Nate Silver and Nassim Taleb.

Since Deming’s death the operational excellence landscape has become more densely populated. In particular, lean operations and Six Sigma have variously been seen as competitors for Deming’s approach, as successors, usurpers, as complementary, as development, or as tools or tool sets to be deployed within Deming’s business strategy. In many ways, the pragmatic development of lean and Six Sigma have exposed the discursive, anecdotal and sometimes gnomic way Deming liked to communicate. In his book Out of the Crisis: Quality, Productivity and Competitive Position (1982) minor points are expanded over whole chapters while major ideas are finessed in a few words. Having elevated the importance of measurement and a proper system for responding to data he goes on to observe that the most important numbers are unknown and unknowable. I fear that this has often been an obstacle to managers finding the hard science in Deming.

For me, the core of Deming’s thinking remains this. There is only one game in town, the continual improvement of the alignment between the voice of the process and the voice of the customer. That improvement is achieved by the diligent use of process behaviour charts. Pursuit of that aim will collaterally reduce organisational costs.

Deming pursued the idea further. He asked what kind of organisation could most effectively exploit process behaviour charts. He sought philosophical justifications for successful heuristics. It is here that his writing became more difficult to accept for many people. In his last book, The New Economics for Industry, Government, Education, he trespassed on broader issues usually reserved to politics and social science, areas in which he was poorly qualified to contribute. The problem with Deming’s later work is that where it is new, it is not economics, and where it is economics, it is not new. It is this part of his writing that has tended to attract a few persistent followers. What is sad about Deming’s continued following is the lack of challenge. Every seminal thinker’s works are subject to repeated criticism, re-evaluation and development. Not simply development by accumulation but development by revision, deletion and synthesis. It is here that Deming’s memory is badly served. At the top of the page is a link to Deming’s Wikipedia entry. It is disturbing that everything is stated as though a settled and triumphant truth, a treatment that contrasts with the fact that his work is now largely ignored in mainstream management. Managers have found in lean and Six Sigma systems they could implement, even if only partially. In Deming they have not.

What Deming deserves, now that a generation, a global telecommunications system and a world wide web separate us from him, is a robust criticism and challenge of his work. The statistical thinking at the heart is profound. For me, the question of what sort of organisation is best placed to exploit that thinking remains open. Now is the time for the re-evaluation because I believe that out of it we can join in reaching new levels of operational excellence.

Rationing in UK health care – signal or noise?

The NHS in England appears to be rationing access to vital non-emergency hospital care, a review suggests.

This was the rather weaselly BBC headline last Friday. It referred to a report from Dr Foster Intelligence which appears to be a trading arm of Imperial College London.

The analysis alleged that the number of operations in three categories (cataract, knee and hip) had risen steadily between 2002 and 2008 but then “plateaued”. As evidence for this the BBC reproduced the following chart.

NHS_DrFoster_Dec13

Dr Foster Intelligence apparently argued that, as the UK population had continued to age since 2008, a “plateau” in the number of such operations must be evidence of “rationing”. Otherwise the rising trend would have continued. I find myself using a lot of quotes when I try to follow the BBC’s “data journalism”.

Unfortunately, I was unable to find the report or the raw data on the Dr Foster Intelligence website. It could be that my search skills are limited but I think I am fairly typical of the sort of people who might be interested in this. I would be very happy if somebody pointed me to the report and data. If I try to interpret the BBC’s journalism, the argument goes something like this.

  1. The rise in cataract, knee and hip operations has “plateaued”.
  2. Need for such operations has not plateaued.
  3. That is evidence of a decreased tendency to perform such operations when needed.
  4. Such a decreased tendency is because of “rationing”.

Now there are a lot of unanswered questions and unsupported assertions behind 2, 3 and 4 but I want to focus on 1. What the researchers say is that the experience base showed a steady rise in operations but that ceased some time around 2008. In other words, since 2008 there has been a signal that something has changed over the historical data.

Signals are seldom straightforward to spot. As Nate Silver emphasises, signals need to be contrasted with, and understood in the context of, noise, the irregular variation that is common to the whole of the historical data. The problem with common cause variation is that it can lead us to be, as Nassim Taleb puts it, fooled by randomness.

Unfortunately, without the data, I cannot test this out on a process behaviour chart. Can I be persuaded that this data represents an increasing trend then a signal of a “plateau”?

The first question is whether there is a signal of a trend at all. I suspect that in this case there is if the data is plotted on a process behaviour chart. The next question is whether there is any variation in the slope of that trend. One simple approach to this is to fit a linear regression line through the data and put the residuals on a process behaviour chart. Only if there is a signal on the residuals chart is an inference of a “plateau” left open. Looking at the data my suspicion is that there would be no such signal.

More complex analyses are possible. One possibility would be to adjust the number of operations by a measure of population age then look at the stability and predictability of those numbers. However, I see no evidence of that analysis either.

I think that where anybody claims to have detected a signal, the legal maxim should prevail: He who asserts must prove. I see no evidence in the chart alone to support the assertion of a rising trend followed by a “plateau”.

Suicide statistics for British railways

I chose a prosaic title because it’s not a subject about which levity is appropriate. I remain haunted by this cyclist on the level crossing. As a result I thought I would delve a little into railway accident statistics. The data is here. Unfortunately, the data only goes back to 2001/2002. This is a common feature of government data. There is no long term continuity in measurement to allow proper understanding of variation, trends and changes. All this encourages the “executive time series” that are familiar in press releases. I think that I shall call this political amnesia. When I have more time I shall look for a longer time series. The relevant department is usually helpful if contacted directly.

However, while I was searching I found this recent report on Railway Suicides in the UK: risk factors and prevention strategies. The report is by Kamaldeep Bhui and Jason Chalangary of the Wolfson Institute of Preventive Medicine, and Edgar Jones of the Institute of Psychiatry, King’s College, London. Originally, I didn’t intend to narrow my investigation to suicides but there were some things in the paper that bothered me and I felt were worth blogging about.

Obviously this is really important work. No civilised society is indifferent to tragedies such as suicide whose consequences are absorbed deeply into the community. The report analyses a wide base of theories and interventions concerning railway suicide risk. There is a lot of information and the authors have done an important job in bringing together and seeking conclusions. However, I was bothered by this passage (at p5).

The Rail Safety and Standards Board (RSSB) reported a progressive rise in suicides and suspected suicides from 192 in 2001-02 to a peak 233 in 2009-10, the total falling to 208 in 2010-11.

Oh dear! An “executive time series”. Let’s look at the data on a process behaviour chart.

RailwaySuicides1

There is no signal, even ignoring the last observation in 2011/2012 which the authors had not had to hand. There has been no increasing propensity for suicide since 2001. The writers have been, as Nassim Taleb would put it, “fooled by randomness”. In the words of Nate Silver, they have confused signal and noise. The common cause variation in the data has been over interpreted by zealous and well meaning policy makers as an upward trend. However, all diligent risk managers know that interpretation of a chart is forbidden if there is no signal. Over interpretation will lead to (well meaning) over adjustment and admixture of even more variation into a stable system of trouble.

Looking at the development of the data over time I can understand that there will have been a temptation to perform a regression analysis and calculate a p-value for the perceived slope. This is an approach to avoid in general. It is beset with the dangers of testing effects suggested by the data and the general criticisms of p-values made by McCloskey and Ziliak. It is not a method that will be a reliable guide to future action. For what it’s worth I got a p-value of 0.015 for the slope but I am not impressed. I looked to see if I could find a pattern in the data then tested for the pattern my mind had created. It is unsurprising that it was “significant”.

The authors of the report go on to interpret the two figures for 2009/2010 (233 suicides) and 2010/2011 (208 suicides) as a “fall in suicides”. It is clear from the process behaviour chart that this is not a signal of a fall in suicides. It is simply noise, common cause variation from year to year.

Having misidentified this as a signal they go on to seek a cause. Of course they “find” a potential cause. A partnership between Network Rail and the Samaritans, Men on the Ropes, had started in January 2010. The programme’s aim was to reduce suicides by 20% over five years. I genuinely hope that the programme shows success. However, the programme will not be assisted by thinking that it has yet shown signs of improvement.

With the current mean annual total at 211, a 20% reduction entails a new mean of 169 annual suicides.That is an ambitious target I think, and I want to emphasise that the programme is entirely laudable and plausible. However, whether it succeeds is to be judged by the figures on the process behaviour chart, not by any post hoc rationalisation. This is the tough discipline of the charts. It is no longer possible to claim an improvement where that is not supported by the data.

I will come back to this data next year and look to see if there are any signs of encouragement.