Soccer management – signal, noise and contract negotiation

Some poor data journalism here from the BBC on 28 May 2015, concerning turnover in professional soccer managers in England. “Managerial sackings reach highest level for 13 years” says the headline. A classic executive time series. What is the significance of the 13 years? Other than it being the last year with more sackings than the present.

The data was purportedly from the League Managers’ Association (LMA) and their Richard Bevan thought the matter “very concerning”. The BBC provided a chart (fair use claimed).

MgrSackingsto201503

Now, I had a couple of thoughts as soon as I saw this. Firstly, why chart only back to 2005/6? More importantly, this looked to me like a stable system of trouble (for football managers) with the possible exception of this (2014/15) season’s Championship coach turnover. Personally, I detest multiple time series on a common chart unless there is a good reason for doing so. I do not think it the best way of showing variation and/ or association.

Signal and noise

The first task of any analyst looking at data is to seek to separate signal from noise. Nate Silver made this point powerfully in his book The Signal and the Noise: The Art and Science of Prediction. As Don Wheeler put it: all data has noise; some data has signal.

Noise is typically the irregular aggregate of many causes. It is predictable in the same way as a roulette wheel. A signal is a sign of some underlying factor that has had so large an effect that it stands out from the noise. Signals can herald a fundamental unpredictability of future behaviour.

If we find a signal we look for a special cause. If we start assigning special causes to observations that are simply noise then, at best, we spend money and effort to no effect and, at worst, we aggravate the situation.

The Championship data

In any event, I wanted to look at the data for myself. I was most interested in the Championship data as that was where the BBC and LMA had been quick to find a signal. I looked on the LMA’s website and this is the latest data I found. The data only records dismissals up to 31 March of the 2014/15 season. There were 16. The data in the report gives the total number of dismissals for each preceding season back to 2005/6. The report separates out “dismissals” from “resignations” but does not say exactly how the classification was made. It can be ambiguous. A manager may well resign because he feels his club have themselves repudiated his contract, a situation known in England as constructive dismissal.

The BBC’s analysis included dismissals right up to the end of each season including 2014/15. Reading from the chart they had 20. The BBC have added some data for 2014/15 that isn’t in the LMA report and not given the source. I regard that as poor data journalism.

I found one source of further data at website The Sack Race. That told me that since the end of March there had been four terminations.

Manager Club Termination Date
Malky Mackay Wigan Athletic Sacked 6 April
Lee Clark Blackpool Resigned 9 May
Neil Redfearn Leeds United Contract expired 20 May
Steve McClaren Derby County Sacked 25 May

As far as I can tell, “dismissals” include contract non-renewals and terminations by mutual consent. There are then a further three dismissals, not four. However, Clark left Blackpool amid some corporate chaos. That is certainly a termination that is classifiable either way. In any event, I have taken the BBC figure at face value though I am alerted as to some possible data quality issues here.

Signal and noise

Looking at the Championship data, this was the process behaviour chart, plotted as an individuals chart.

MgrSackingsto201503

There is a clear signal for the 2014/15 season with an observation, 20 dismissals,, above the upper natural process limit of 19.18 dismissals. Where there is a signal we should seek a special cause. There is no guarantee that we will find a special cause. Data limitations and bounded rationality are always constraints. In fact, there is no guarantee that there was a special cause. The signal could be a false positive. Such effects cannot be eliminated. However, signals efficiently direct our limited energy for, what Daniel Kahneman calls, System 2 thinking towards the most promising enquiries.

Analysis

The BBC reports one narrative woven round the data.

Bevan said the current tenure of those employed in the second tier was about eight months. And the demand to reach the top flight, where a new record £5.14bn TV deal is set to begin in 2016, had led to clubs hitting the “panic button” too quickly.

It is certainly a plausible view. I compiled a list of the dismissals and non-renewals, not the resignations, with data from Wikipedia and The Sack Race. I only identified 17 which again suggests some data quality issue around classification. I have then charted a scatter plot of date of dismissal against the club’s then league position.

MgrSackings201415

It certainly looks as though risk of relegation is the major driver for dismissal. Aside from that, Watford dismissed Billy McKinlay after only two games when they were third in the league, equal on points with the top two. McKinlay had been an emergency appointment after Oscar Garcia had been compelled to resign through ill health. Watford thought they had quickly found a better manager in Slavisa Jokanovic. Watford ended the season in second place and were promoted to the Premiership.

There were two dismissals after the final game on 2 May by disappointed mid-table teams. Beyond that, the only evidence for impulsive managerial changes in pursuit of promotion is the three mid-season, mid-table dismissals.

Club league position
Manager Club On dismissal At end of season
Nigel Adkins Reading 16 19
Bob Peeters Charlton Athletic 14 12
Stuart Pearce Nottingham Forrest 12 14

A table that speaks for itself. I am not impressed by the argument that there has been the sort of increase in panic sackings that Bevan fears. Both Blackpool and Leeds experienced chaotic executive management which will have resulted in an enhanced force of mortality on their respective coaches. That along with the data quality issues and the technical matter I have described below lead me to feel that there was no great enhanced threat to the typical Championship manager in 2014/15.

Next season I would expect some regression to the mean with a lower number of dismissals. Not much of a prediction really but that’s what the data tells me. If Bevan tries to attribute that to the LMA’s activism them I fear that he will be indulging in Langian statistical analysis. Will he be able to resist?

Techie bit

I have a preference for individuals charts but I did also try plotting the data on an np-chart where I found no signal. It is trite service-course statistics that a Poisson distribution with mean λ has standard deviation √λ so an upper 3-sigma limit for a (homogeneous) Poisson process with mean 11.1 dismissals would be 21.1 dismissals. Kahneman has cogently highlighted how people tend to see patterns in data as signals even where they are typical of mere noise. In this case I am aware that the data is not atypical of a Poisson process so I am unsurprised that I failed to identify a special cause.

A Poisson process with mean 11.1 dismissals is a pretty good model going forwards and that is the basis I would press on any managers in contract negotiations.

Of course, the clubs should remember that when they look for a replacement manager they will then take a random sample from the pool of job seekers. Really!

Deconstructing Deming XI B – Eliminate numerical goals for management

11. Part B. Eliminate numerical goals for management.

W. Edwards Deming.jpgA supposed corollary to the elimination of numerical quotas for the workforce.

This topic seems to form a very large part of what passes for exploration and development of Deming’s ideas in the present day. It gets tied in to criticisms of remuneration practices and annual appraisal, and target-setting in general (management by objectives). It seems to me that interest flows principally from a community who have some passionately held emotional attitudes to these issues. Advocates are enthusiastic to advance the views of theorists like Alfie Kohn who deny, in terms, the effectiveness of traditional incentives. It is sad that those attitudes stifle analytical debate. I fear that the problem started with Deming himself.

Deming’s detailed arguments are set out in Out of the Crisis (at pp75-76). There are two principle reasoned objections.

  1. Managers will seek empty justification from the most convenient executive time series to hand.
  2. Surely, if we can improve now, we would have done so previously, so managers will fall back on (1).

The executive time series

I’ve used the time series below in some other blogs (here in 2013 and here in 2012). It represents the anual number of suicides on UK railways. This is just the data up to 2013.
RailwaySuicides2

The process behaviour chart shows a stable system of trouble. There is variation from year to year but no significant (sic) pattern. There is noise but no signal. There is an average of just over 200 fatalities, varying irregularly between around 175 and 250. Sadly, as I have discussed in earlier blogs, simply selecting a pair of observations enables a polemicist to advance any theory they choose.

In Railway Suicides in the UK: risk factors and prevention strategies, Kamaldeep Bhui and Jason Chalangary of the Wolfson Institute of Preventive Medicine, and Edgar Jones of the Institute of Psychiatry, King’s College, London quoted the Rail Safety and Standards Board (RSSB) in the following two assertions.

  • Suicides rose from 192 in 2001-02 to a peak 233 in 2009-10; and
  • The total fell from 233 to 208 in 2010-11 because of actions taken.

Each of these points is what Don Wheeler calls an executive time series. Selective attention, or inattention, on just two numbers from a sequence of irregular variation can be used to justify any theory. Deming feared such behaviour could be perverted to justify satisfaction of any goal. Of course, the process behaviour chart, nowhere more strongly advocated than by Deming himself in Out of the Crisis, is the robust defence against such deceptions. Diligent criticism of historical data by means of process behaviour charts is exactly what is needed to improve the business and exactly what guards against success-oriented interpretations.

Wishful thinking, and the more subtle cognitive biases studied by Daniel Kahneman and others, will always assist us in finding support for our position somewhere in the data. Process behaviour charts keep us objective.

If not now, when?

If I am not for myself, then who will be for me?
And when I am for myself, then what am “I”?
And if not now, when?

Hillel the Elder

Deming criticises managerial targets on the grounds that, were the means of achieving the target known, it would already have been achieved and, further, that without having the means efforts are futile at best. It’s important to remember that Deming is not here, I think, talking about efforts to stabilise a business process. Deming is talking about working to improve an already stable, but incapable, process.

There are trite reasons why a target might legitimately be mandated where it has not been historically realised. External market conditions change. A manager might unremarkably be instructed to “Make 20% more of product X and 40% less of product Y“. That plays in to the broader picture of targets’ role in co-ordinating the parts of a system, internal to the organisation of more widely. It may be a straightforward matter to change the output of a well-understood, stable system by an adjustment of the inputs.

Deming says:

If you have a stable system, then there is no use to specify a goal. You will get whatever the system will deliver.

But it is the manager’s job to work on a stable system to improve its capability (Out of the Crisis at pp321-322). That requires capital and a plan. It involves a target because the target captures the consensus of the whole system as to what is required, how much to spend, what the new system looks like to its customer. Simply settling for the existing process, being managed through systematic productivity to do its best, is exactly what Deming criticises at his Point 1 (Constancy of purpose for improvement).

Numerical goals are essential

… a manager is an information channel of decidedly limited capacity.

Kenneth Arrow
Essays in the Theory of Risk-Bearing

Deming’s followers have, to some extent, conceded those criticisms. They say that it is only arbitrary targets that are deprecated and not the legitimate Voice of the Customer/ Voice of the Business. But I think they make a distinction without a difference through the weasel words “arbitrary” and “legitimate”. Deming himself was content to allow managerial targets relating to two categories of existential risk.

However, those two examples are not of any qualitatively different type from the “Increase sales by 10%” that he condemns. Certainly back when Deming was writing Out of the Crisis most OELs were based on LD50 studies, a methodology that I am sure Deming would have been the first to criticise.

Properly defined targets are essential to business survival as they are one of the principal means by which the integrated function of the whole system is communicated. If my factory is producing more than I can sell, I will not work on increasing capacity until somebody promises me that there is a plan to improve sales. And I need to know the target of the sales plan to know where to aim with plant capacity. It is no good just to say “Make as much as you can. Sell as much as you can.” That is to guarantee discoordination and inefficiency. It is unsurprising that Deming’s thinking has found so little real world implementation when he seeks to deprive managers of one of the principle tools of managing.

Targets are dangerous

I have previously blogged about what is needed to implement effective targets. An ill judged target can induce perverse incentives. These can be catastrophic for an organisation, particularly one where the rigorous criticism of historical data is absent.

The art of managing footballers

Van Persie (15300483040) (crop).jpg… or is it a science? Robin van Persie’s penalty miss against West Bromwich Albion on 2 May 2015 was certainly welcome news to my ears. It eased the relegation pressures on West Brom and allowed us to advance to 40 points for the season. Relegation fears are only “mathematical” now. However, the miss also resulted in van Persie being relieved of penalty taking duties, by Manchester United manager Louis van Gaal, until further notice.

He is now at the end of the road. It is always [like that]. Wayne [Rooney] has missed also so when you miss you are at the bottom again.

The Daily Mail report linked above goes on to say that van Persie had converted his previous 6 penalties.

Van Gaal was, of course, referring to Rooney’s shot over the crossbar against West Ham in February 2013, when Rooney had himself invited then manager Sir Alex Ferguson to retire him as designated penalty taker. Rooney’s record had apparently been 9 misses from 27 penalties. I have all this from this Daily Telegraph report.

I wonder if statistics can offer any insight into soccer management?

The benchmark

It was very difficult to find, very quickly, any exhaustive statistics on penalty conversion rates on the web. However, I would like to start by establishing what constituted “good” performance for a penalty taker. As a starting point I have looked at Table 2 on this Premier League website. The data is from February 2014 and shows, at that date, data on the players with the best conversion rates in the League’s history. Players who took fewer than 10 penalties were excluded. It shows that of the ten top converting players, who must rank as the very good if not the ten best, in the aggregate they converted 155 of 166 penalties. That is a conversion rate of 93.4%. At first sight that suggests a useful baseline against which to assess any individual penalty taker.

Several questions come to mind. The aggregate statistics do not tell us how individual players have developed over time, whether improving or losing their nerve. That said, it is difficult to perform that sort of analysis on these comparatively low volumes of data when collected in this way. There is however data (Table 4) on the overall conversion rate in the Premier League since its inception.

Penalties

That looks to me like a fairly stable system. That would be expected as players come and go and this is the aggregate of many effects. Perhaps there is latterly reduced season-to-season variation, which would be odd, but I am not really interested in that and have not pursued it. I am aware that during this period there has been a rule change allowing goalkeepers to move before the kick his taken but I have just spent 30 minutes on the web and failed to establish the date when that happened. The total aggregate statistics up to 2014 are 1,438 penalties converted out of 1,888. That is a conversion rate of 76.2%.

I did wonder if there was any evidence that some of the top ten players were better than others or whether the data was consistent with a common elite conversion rate of 93.4%. In that case the table positions would reflect nothing more than sampling variation. Somewhat reluctantly I calculated the chi-squared statistic for the table of successes and failures (I know! But what else to do?). The statistic came out as 2.02 which, with 9 degrees of freedom, has a p-value (I know!) of 0.8%. That is very suggestive of a genuine ranking among the elite penalty takers.

It inevitably follows that the elite are doing better than the overall success rate of 76.2%. Considering all that together I am happy to proceed with 93.4% as the sort of benchmark for a penalty taker that a team like Manchester United would aspire to.

Van Persie

This website, dated 6 Sept 2012, told me that van Persie had converted 18 penalties with a 77% success rate. That does not quite fit either 18/23 or 18/24 but let us take it at face value. If that is accurate then that is, more or less, the data on which Ferguson gave van Persie the job in February 2013. It is a surprising appointment given the Premier League average of 76.2% and the elite benchmark but perhaps it was the best that could be mustered from the squad.

Rooney’s 9 misses out of 27 yields a success rate of 67%. Not so much lower than van Persie’s historical performance but, in all the circumstances, it was not good enough.

The dismissal

What is fascinating is that, no matter what van Persie’s historical record on which he was appointed penalty taker, before his 2 May miss he had scored 6 out of 6. The miss made it 6 out of 7, 85.7%. That was his recent record of performance, even if selected to some extent to show him in a good light.

Selection of that run is a danger. It is often “convenient” to select a subset of data that favours a cherished hypothesis. Though there might be that selectivity, where was the real signal that van Persie had deteriorated or that the club would perform better were he replaced?

The process

Of course, a manager has more information than the straightforward success/ fail ratio. A coach may have observed goalkeepers increasingly guessing a penalty taker’s shot direction. There may have been many near-saves, a hesitancy on the part of the player, trepidation in training. Those are all factors that a manager must take into account. That may lead to the rotation of even the most impressive performer. Perhaps.

But that is not the process that van Gaal advocates. Keep scoring until you miss then go to the bottom of the list. The bottom! Even scorers in the elite-10 miss sometimes. Is it rational to then replace them with an alternative that will most likely be more average (i.e. worse)? And then make them wait until everyone else has missed.

With an average success rate of 76.2% it is more likely than not that van Persie’s replacement will score their first penalty. Van Gaal will be vindicated. That is the phenomenon called regression to the mean. An extreme event (a miss) is most likely followed by something more average (a goal). Economist Daniel Kahneman explores this at length in his book Thinking, Fast and Slow.

It is an odd strategy to adopt. Keep the able until they fail. Then replace them with somebody less able. But different.

 

Royal babies and the wisdom of crowds

Prince George of Cambridge with wombat plush toy (crop).jpgIn 2004 James Surowiecki published a book with the unequivocal title The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. It was intended as a gloss on Charles Mackay’s 1841 book Extraordinary Popular Delusions and the Madness of Crowds. Both books are essential reading for any risk professional.

I am something of a believer in the wisdom of crowds. The other week I was fretting about the possible relegation of English Premier League soccer club West Bromwich Albion. It’s an emotional and atavistic tie for me. I always feel there is merit, as part of my overall assessment of risk, in checking online bookmakers’ odds. They surely represent the aggregated risk assessment of gamblers if nobody else. I was relieved that bookmakers were offering typically 100/1 against West Brom being relegated. My own assessment of risk is, of course, contaminated with personal anxiety so I was pleased that the crowd was more phlegmatic.

However, while I was on the online bookmaker’s website, I couldn’t help but notice that they were also accepting bets on the imminent birth of the royal baby, the next child of the Duke and Duchess of Cambridge. It struck me as weird that anyone would bet on the sex of the royal baby. Surely this was a mere coin toss, though I know that people will bet on that. Being hopelessly inquisitive I had a look. I was somewhat astonished to find these odds being offered (this was 22 April 2015, ten days before the royal birth).

odds implied probability
Girl 1/2 0.67
Boy 6/4 0.40
 Total 1.07

Here I have used the usual formula for converting between odds and implied probabilities: odds of m / n against an event imply a probability of n / (m + n) of the event occurring. Of course, the principle of finite additivity requires that probabilities add up to one. Here they don’t and there is an overround of 7%. Like the rest of us, bookmakers have to make a living and I was unsurprised to find a Dutch book.

The odds certainly suggested that the crowd thought a girl manifestly more probable than a boy. Bookmakers shorten the odds on the outcome that is attracting the money to avoid a heavy payout on an event that the crowd seems to know something about.

Historical data on sex ratio

I started, at this stage, to doubt my assumption that boy/ girl represented no more than a coin toss, 50:50, an evens bet. As with most things, sex ratio turns out to be an interesting subject. I found this interesting research paper which showed that sex ratio was definitely dependent on factors such as the age and ethnicity of the mother. The narrative of this chart was very interesting.

Sex ratio

However, the paper confirmed that the sex of a baby is independent of previous births, conditioned on the factors identified, and that the ratio of girls to boys is nowhere and no time greater than 1,100 to 1000, about 52% girls.

So why the odds?

Bookmakers lengthen the odds on the outcome attracting the smaller value of bets in order to encourage stakes on the less fancied outcomes, on which there is presumably less risk of having to pay out. At odds of 6/4, a punter betting £10 on a boy would receive his stake back plus £15 ( = 6 × £10 / 4 ). If we assume an equal chance of boy or girl then that is an expected return of £12.50 ( = 0.5 × £25 ) for a £10.00 stake. I’m not sure I’d seen such a good value wager since we all used to bet against Tim Henman winning Wimbledon.

Ex ante there are two superficially suggestive explanations as to the asymmetry in the odds. At least this is all my bounded rationality could imagine.

  • A lot of people (mistakenly) thought that the run of five male royal births (Princes Andrew, Edward, William, Harry and George) escalated the probability of a girl being next. “It was overdue.”
  • A lot of people believed that somebody “knew something” and that they knew what it was.

In his book about cognitive biases in decision making (Thinking, Fast and Slow, Allen Lane, 2011) Nobel laureate economist Daniel Kahneman describes widespread misconceptions concerning randomness of boy/ girl birth outcomes (at p115). People tend to see regularity in sequences of data as evidence of non-randomness, even where patterns are typical of, and unsurprising in, random events.

I had thought that there could not be sufficient gamblers who would be fooled by the baseless belief that a long run of boys made the next birth more likely to be a girl. But then Danny Finkelstein reminded me (The (London) Times, Saturday 25 April 2015) of a survey of UK politicians that revealed their limited ability to deal with chance and probabilities. Are politicians more or less competent with probabilities than online gamblers? That is a question for another day. I could add that the survey compared politicians of various parties but we have an on-going election campaign in the UK at the moment so I would, in the interest of balance, invite my voting-age UK readers not to draw any inferences therefrom.

The alternative is the possibility that somebody thought that somebody knew something. The parents avowed that they didn’t know. Medical staff may or may not have. The sort of people who work in VIP medicine in the UK are not the sort of people who divulge information. But one can imagine that a random shift in sentiment, perhaps because of the misconception that a girl was “overdue”, and a consequent drift in the odds, could lead others to infer that there was insight out there. It is not completely impossible. How many other situations in life and business does that model?

It’s a girl!

The wisdom of crowds or pure luck? We shall never know. I think it was Thomas Mann who observed that the best proof of the genuineness of a prophesy was that it turned out to be false. Had the royal baby been a boy we could have been sure that the crowd was mad.

To be complete, Bayes’ theorem tells us that the outcome should enhance our degree of belief in the crowd’s wisdom. But it is a modest increase (Bayes’ factor of 2, 3 deciban after Alan Turing’s suggestion) and as we were most sceptical before we remain unpersuaded.

In his book, Surowiecki identified five factors that can impair crowd intelligence. One of these is homogeneity. Insufficient diversity frustrates the inherent virtue on which the principle is founded. I wonder how much variety there is among online punters? Similarly, where judgments are made sequentially there is a danger of influence. That was surely a factor at work here. There must also have been an element of emotion, the factor that led to all those unrealistically short odds on Henman at Wimbledon on which the wise dined so well.

But I’m trusting that none of that applies to the West Brom odds.

Deconstructing Deming X – Eliminate slogans!

10. Eliminate slogans, exhortations and targets for the workforce.

W Edwards Deming

Neither snow nor rain nor heat nor gloom of night stays these couriers from the swift completion of their appointed rounds.

Inscription on the James Farley Post Office, New York City, New York, USA
William Mitchell Kendall pace Herodotus

Now, that’s what I call a slogan. Is this what Point 10 of Deming’s 14 Points was condemning? There are three heads here, all making quite distinct criticisms of modern management. The important dimension of this criticism is the way in which managers use data in communicating with the wider organisation, in setting imperatives and priorities and in determining what individual workers will consider important when they are free from immediate supervision.

Eliminate slogans!

The US postal inscription at the head of this blog certainly falls within the category of slogans. Apparently the root of the word “slogan” is the Scottish Gaelic sluagh-ghairm meaning a battle cry. It seeks to articulate a solidarity and commitment to purpose that transcends individual doubts or rationalisation. That is what the US postal inscription seeks to do. Beyond the data on customer satisfaction, the demands of the business to protect and promote its reputation, the service levels in place for individual value streams, the tension between current performance and aspiration, the disappointment of missed objectives, it seeks to draw together the whole of the organisation around an ideal.

Slogans are part of the broader oral culture of an organisation. In the words of Lawrence Freedman (Strategy: A History, Oxford, 2013, p564) stories, and I think by extension slogans:

[make] it possible to avoid abstractions, reduce complexity, and make vital points indirectly, stressing the importance of being alert to serendipitous opportunities, discontented staff, or the one small point that might ruin an otherwise brilliant campaign.

But Freedman was quick to point out the use of stories by consultants and in organisations frequently confused anecdote with data. They were commonly used selectively and often contrived. Freedman sought to extract some residual value from the culture of business stories, in particular drawing on the work of psychologist Jerome Bruner along with Daniel Kahneman’s System 1 and System 2 thinking. The purpose of the narrative of an organisation, including its slogans and shared stories, is not to predict events but to define a context for action when reality is inevitably overtaken by a special cause.

In building such a rich narrative, slogans alone are an inert and lifeless tactic unless woven with the continual, rigorous criticism of historical data. In fact, it is the process behaviour chart that acts as the armature around which the narrative can be wound. Building the narrative will be critical to how individuals respond to the messages of the chart.

Deming himself coined plenty of slogans: “Drive out fear”, “Create joy in work”, … . They are not forbidden. But to be effective they must form a verisimilar commentary on, and motivation for, the hard numbers and ineluctable signals of the process behaviour chart.

Eliminate exhortations!

I had thought I would dismiss this in a single clause. It is, though, a little more complicated. The sports team captain who urges her teammates onwards to take the last gasp scoring opportunity doesn’t necessarily urge in vain. There is no analysis of this scenario. It is only muscle, nerve, sweat and emotion.

The English team just suffered a humiliating exit from the Cricket World Cup. The head coach’s response was “We’ll have to look at the data.” Andrew Miller in The Times (London) (10 March 2015) reflected most cricket fans’ view when he observed that “a team of meticulously prepared cricketers suffered a collective loss of nerve and confidence.” Exhortations might not have gone amiss.

It is not, though, a management strategy. If your principal means of managing risk, achieving compelling objectives, creating value and consistently delivering customer excellence, day in, day out is to yell “one more heave!” then you had better not lose your voice. In the long run, I am on the side of the analysts.

Slogans and exhortations will prove a brittle veneer on a stable system of trouble (RearView). It is there that they will inevitably corrode engagement, breed cynicism, foster distrust, and mask decline. Only the process behaviour chart can guard against the risk.

Eliminate targets for the workforce!

This one is more complicated. How do I communicate to the rest of the organisation what I need from them? What are the consequences when they don’t deliver? How do the rest of the organisation communicate with me? This really breaks down into two separate topics and they happen to be the two halves of Deming’s Point 11.

I shall return to those in my next two posts in the Deconstructing Deming series.

 

Is data the plural of anecdote?

I seem to hear this intriguing quote everywhere these days.

The plural of anecdote is not data.

There is certainly one website that traces it back to Raymond Wolfinger, a political scientist from Berkeley, who claims to have said sometime around 1969 to 1970:

The plural of anecdote is data.

So, which is it?

Anecdote

My Concise Oxford English Dictionary (“COED”) defines “anecdote” as:

Narrative … of amusing or interesting incident.

Wiktionary gives a further alternative definition.

An account which supports an argument, but which is not supported by scientific or statistical analysis.

Edward Jenner by James Northcote.jpg

It’s clear that anecdote itself is a concept without a very exact meaning. It’s a story, not usually reported through an objective channel such as a journalism, or scientific or historical research, that carries some implication of its own unreliability. Perhaps it is inherently implausible when read against objective background evidence. Perhaps it is hearsay or multiple hearsay.

The anecdote’s suspect reliability is offset by the evidential weight it promises, either as a counter example to a cherished theory or as compelling support for a controversial hypothesis. Lyall Watson’s hundredth monkey story is an anecdote. So, in eighteenth century England, was the folk wisdom, recounted to Edward Jenner (pictured), that milkmaids were generally immune to smallpox.

Data

My COED defines “data” as:

Facts or impormation, esp[ecially] as basis for inference.

Wiktionary gives a further alternative definition.

Pieces of information.

Again, not much help. But the principal definition in the COED is:

Thing[s] known or granted, assumption or premise from which inferences may be drawn.

The suggestion in the word “data” is that what is given is the reliable starting point from which we can start making deductions or even inductive inferences. Data carries the suggestion of reliability, soundness and objectivity captured in the familiar Arthur Koestler quote.

Without the little hard bits of marble which are called “facts” or “data” one cannot compose a mosaic …

Yet it is common knowledge that “data” cannot always be trusted. Trust in data is a recurring theme in this blog. Cyril Burt’s purported data on the heritability of IQ is a famous case. There are legions of others.

Smart investigators know that the provenance, reliability and quality of data cannot be taken for granted but must be subject to appropriate scrutiny. The modern science of Measurement Systems Analysis (“MSA”) has developed to satisfy this need. The defining characteristic of anecdote is that it has been subject to no such scrutiny.

Evidence

Anecdote and data, as broadly defined above, are both forms of evidence. All evidence is surrounded by a penumbra of doubt and unreliability. Even the most exacting engineering measurement is accompanied by a recognition of its uncertainty and the limitations that places on its use and the inferences that can be drawn from it. In fact, it is exactly because such a measurement comes accompanied by a numerical characterisation of its precision and accuracy, that  its reliability and usefulness are validated.

It seems inherent in the definition of anecdote that it should not be taken at face value. Happenstance or wishful fabrication, it may not be a reliable basis for inference or, still less, action. However, it was Jenner’s attention to the smallpox story that led him to develop vaccination against smallpox. No mean outcome. Against that, the hundredth monkey storey is mere fantastical fiction.

Anecdotes about dogs sniffing out cancer stand at the beginning of the journey of confirmation and exploitation.

Two types of analysis

Part of the answer to the dilemma comes from statistician John Tukey’s observation that there are two kinds of data analysis: Exploratory Data Analysis (“EDA”) and Confirmatory Data Analysis (“CDA”).

EDA concerns the exploration of all the available data in order to suggest some interesting theories. As economist Ronald Coase put it:

If you torture the data long enough, it will confess.

Once a concrete theory or hypothesis is to mind, a rigorous process of data generation allows formal statistical techniques to be brought to bear (“CDA”) in separating the signal in the data from the noise and in testing the theory. People who muddle up EDA and CDA tend to get into difficulties. It is a foundation of statistical practice to understand the distinction and its implications.

Anecdote may be well suited to EDA. That’s how Jenner successfully proceeded though his CDA of testing his vaccine on live human subjects wouldn’t get past many ethics committees today.

However, absent that confirmatory CDA phase, the beguiling anecdote may be no more than the wrecker’s false light.

A basis for action

Tukey’s analysis is useful for the academic or the researcher in an R&D department where the environment is not dynamic and time not of the essence. Real life is more problematic. There is not always the opportunity to carry out CDA. The past does not typically repeat itself so that we can investigate outcomes with alternative factor settings. As economist Paul Samuelson observed:

We have but one sample of history.

History is the only thing that we have any data from. There is no data on the future. Tukey himself recognised the problem and coined the phrase uncomfortable science for inferences from observations whose repetition was not feasible or practical.

In his recent book Strategy: A History (Oxford University Press, 2013), Lawrence Freedman points out the risks of managing by anecdote “The Trouble with Stories” (pp615-618). As Nobel laureate psychologist Daniel Kahneman has investigated at length, our interpretation of anecdote is beset by all manner of cognitive biases such as the availability heuristic and base rate fallacy. The traps for the statistically naïve are perilous.

But it would be a fool who would ignore all evidence that could not be subjected to formal validation. With a background knowledge of statistical theory and psychological biases, it is possible to manage trenchantly. Bayes’ theorem suggests that all evidence has its value.

I think that the rather prosaic answer to the question posed at the head of this blog is that data is the plural of anecdote, as it is the singular, but anecdotes are not the best form of data. They may be all you have in the real world. It would be wise to have the sophistication to exploit them.

How to use data to scare people …

… and how to use data for analytics.

Crisis hit GP surgeries forced to turn away millions of patients

That was the headline on the Royal College of General Practitioners (“RCGP” – UK family physicians) website today. The catastrophic tone was elaborated in The (London) Times: Millions shut out of doctors’ surgeries (paywall).
Blutdruck.jpg
The GPs’ alarm was based on data from the GP Patient Survey which is a survey conducted on behalf or the National Health Service (“NHS”) by pollsters Ipsos MORI. The study is conducted by way of a survey questionnaire sent out to selected NHS patients. You can find the survey form here. Ipsos MORI’s careful analysis is here.

Participants were asked to recall their experience of making an appointment last time they wanted to. From this, the GPs have extracted the material for their blog’s lead paragraph.

GP surgeries are so overstretched due to the lack of investment in general practice that in 2015 on more than 51.3m occasions patients in England will be unable to get an appointment to see a GP or nurse when they contact their local practice, according to new research.

Now, this is not analysis. For the avoidance of doubt, the Ipsos MORI report cited above does not suffer from such tendentious framing. The RCGP blog features the following tropes of Langian statistical method.

  • Using emotive language such as “crisis”, “forced” and “turn away”.
  • Stating the cause of the avowed problem, “lack of investment”, without presenting any supporting argument.
  • Quoting an absolute number of affected patients rather than a percentage which would properly capture individual risk.
  • Casually extrapolating to a future round number, over 50 million.
  • Seeking to bolster their position by citing “new research”.
  • Failing to recognise the inevitable biases that beset human descriptions of past events.

Humans are notoriously susceptible to bias in how they recall and report past events. Psychologist Daniel Kahneman has spent a lifetime mapping out the various cognitive biases that afflict our thinking. The Ipsos MORI survey appears to me rigorously designed but no degree of rigour can eliminate the frailties of human memory, especially about an uneventful visit to the GP. An individual is much more likely to recall a frustrating attempt to make an appointment than a straightforward encounter.

Sometimes, such survey data will be the best we can do and will be the least bad guide to action though in itself flawed. As Charles Babbage observed:

Errors using inadequate data are much less than those using no data at all.

Yet the GPs’ use of this external survey data to support their funding campaign looks particularly out of place in this situation. This is a case where there is a better source of evidence. The point is that the problem under investigation lies entirely within the GPs’ own domain. The GPs themselves are in a vastly superior position to collect data on frustrated appointments, within their own practices. Data can be generated at the moment an appointment is sought. Memory biases and patient non-responses can be eliminated. The reasons for any diary difficulties can be recorded as they are encountered. And investigated before the trail has gone cold. Data can be explored within the practice, improvements proposed, gains measured, solutions shared on social media. The RCGP could play the leadership role of aggregating the data and fostering sharing of ideas.

It is only with local data generation that the capability of an appointments system can be assessed. Constraints can be identified, managed and stabilised. It is only when the system is shown to be incapable that a case can be made for investment. And the local data collected is exactly the data needed to make that case. Not only does such data provide a compelling visual narrative of the appointment system’s inability to heal itself but, when supported by rigorous analysis, it liquidates the level of investment and creates its own business case. Rigorous criticism of data inhibits groundless extrapolation. At the very least, local data would have provided some borrowing strength to validate the patient survey.

Looking to external data to support a case when there is better data to be had internally, both to improve now what is in place and to support the business case for new investment, is neither pretty nor effective. And it is not analysis.