Soccer management – signal, noise and contract negotiation

Some poor data journalism here from the BBC on 28 May 2015, concerning turnover in professional soccer managers in England. “Managerial sackings reach highest level for 13 years” says the headline. A classic executive time series. What is the significance of the 13 years? Other than it being the last year with more sackings than the present.

The data was purportedly from the League Managers’ Association (LMA) and their Richard Bevan thought the matter “very concerning”. The BBC provided a chart (fair use claimed).

MgrSackingsto201503

Now, I had a couple of thoughts as soon as I saw this. Firstly, why chart only back to 2005/6? More importantly, this looked to me like a stable system of trouble (for football managers) with the possible exception of this (2014/15) season’s Championship coach turnover. Personally, I detest multiple time series on a common chart unless there is a good reason for doing so. I do not think it the best way of showing variation and/ or association.

Signal and noise

The first task of any analyst looking at data is to seek to separate signal from noise. Nate Silver made this point powerfully in his book The Signal and the Noise: The Art and Science of Prediction. As Don Wheeler put it: all data has noise; some data has signal.

Noise is typically the irregular aggregate of many causes. It is predictable in the same way as a roulette wheel. A signal is a sign of some underlying factor that has had so large an effect that it stands out from the noise. Signals can herald a fundamental unpredictability of future behaviour.

If we find a signal we look for a special cause. If we start assigning special causes to observations that are simply noise then, at best, we spend money and effort to no effect and, at worst, we aggravate the situation.

The Championship data

In any event, I wanted to look at the data for myself. I was most interested in the Championship data as that was where the BBC and LMA had been quick to find a signal. I looked on the LMA’s website and this is the latest data I found. The data only records dismissals up to 31 March of the 2014/15 season. There were 16. The data in the report gives the total number of dismissals for each preceding season back to 2005/6. The report separates out “dismissals” from “resignations” but does not say exactly how the classification was made. It can be ambiguous. A manager may well resign because he feels his club have themselves repudiated his contract, a situation known in England as constructive dismissal.

The BBC’s analysis included dismissals right up to the end of each season including 2014/15. Reading from the chart they had 20. The BBC have added some data for 2014/15 that isn’t in the LMA report and not given the source. I regard that as poor data journalism.

I found one source of further data at website The Sack Race. That told me that since the end of March there had been four terminations.

Manager Club Termination Date
Malky Mackay Wigan Athletic Sacked 6 April
Lee Clark Blackpool Resigned 9 May
Neil Redfearn Leeds United Contract expired 20 May
Steve McClaren Derby County Sacked 25 May

As far as I can tell, “dismissals” include contract non-renewals and terminations by mutual consent. There are then a further three dismissals, not four. However, Clark left Blackpool amid some corporate chaos. That is certainly a termination that is classifiable either way. In any event, I have taken the BBC figure at face value though I am alerted as to some possible data quality issues here.

Signal and noise

Looking at the Championship data, this was the process behaviour chart, plotted as an individuals chart.

MgrSackingsto201503

There is a clear signal for the 2014/15 season with an observation, 20 dismissals,, above the upper natural process limit of 19.18 dismissals. Where there is a signal we should seek a special cause. There is no guarantee that we will find a special cause. Data limitations and bounded rationality are always constraints. In fact, there is no guarantee that there was a special cause. The signal could be a false positive. Such effects cannot be eliminated. However, signals efficiently direct our limited energy for, what Daniel Kahneman calls, System 2 thinking towards the most promising enquiries.

Analysis

The BBC reports one narrative woven round the data.

Bevan said the current tenure of those employed in the second tier was about eight months. And the demand to reach the top flight, where a new record £5.14bn TV deal is set to begin in 2016, had led to clubs hitting the “panic button” too quickly.

It is certainly a plausible view. I compiled a list of the dismissals and non-renewals, not the resignations, with data from Wikipedia and The Sack Race. I only identified 17 which again suggests some data quality issue around classification. I have then charted a scatter plot of date of dismissal against the club’s then league position.

MgrSackings201415

It certainly looks as though risk of relegation is the major driver for dismissal. Aside from that, Watford dismissed Billy McKinlay after only two games when they were third in the league, equal on points with the top two. McKinlay had been an emergency appointment after Oscar Garcia had been compelled to resign through ill health. Watford thought they had quickly found a better manager in Slavisa Jokanovic. Watford ended the season in second place and were promoted to the Premiership.

There were two dismissals after the final game on 2 May by disappointed mid-table teams. Beyond that, the only evidence for impulsive managerial changes in pursuit of promotion is the three mid-season, mid-table dismissals.

Club league position
Manager Club On dismissal At end of season
Nigel Adkins Reading 16 19
Bob Peeters Charlton Athletic 14 12
Stuart Pearce Nottingham Forrest 12 14

A table that speaks for itself. I am not impressed by the argument that there has been the sort of increase in panic sackings that Bevan fears. Both Blackpool and Leeds experienced chaotic executive management which will have resulted in an enhanced force of mortality on their respective coaches. That along with the data quality issues and the technical matter I have described below lead me to feel that there was no great enhanced threat to the typical Championship manager in 2014/15.

Next season I would expect some regression to the mean with a lower number of dismissals. Not much of a prediction really but that’s what the data tells me. If Bevan tries to attribute that to the LMA’s activism them I fear that he will be indulging in Langian statistical analysis. Will he be able to resist?

Techie bit

I have a preference for individuals charts but I did also try plotting the data on an np-chart where I found no signal. It is trite service-course statistics that a Poisson distribution with mean λ has standard deviation √λ so an upper 3-sigma limit for a (homogeneous) Poisson process with mean 11.1 dismissals would be 21.1 dismissals. Kahneman has cogently highlighted how people tend to see patterns in data as signals even where they are typical of mere noise. In this case I am aware that the data is not atypical of a Poisson process so I am unsurprised that I failed to identify a special cause.

A Poisson process with mean 11.1 dismissals is a pretty good model going forwards and that is the basis I would press on any managers in contract negotiations.

Of course, the clubs should remember that when they look for a replacement manager they will then take a random sample from the pool of job seekers. Really!

Advertisements

Deconstructing Deming X – Eliminate slogans!

10. Eliminate slogans, exhortations and targets for the workforce.

W Edwards Deming

Neither snow nor rain nor heat nor gloom of night stays these couriers from the swift completion of their appointed rounds.

Inscription on the James Farley Post Office, New York City, New York, USA
William Mitchell Kendall pace Herodotus

Now, that’s what I call a slogan. Is this what Point 10 of Deming’s 14 Points was condemning? There are three heads here, all making quite distinct criticisms of modern management. The important dimension of this criticism is the way in which managers use data in communicating with the wider organisation, in setting imperatives and priorities and in determining what individual workers will consider important when they are free from immediate supervision.

Eliminate slogans!

The US postal inscription at the head of this blog certainly falls within the category of slogans. Apparently the root of the word “slogan” is the Scottish Gaelic sluagh-ghairm meaning a battle cry. It seeks to articulate a solidarity and commitment to purpose that transcends individual doubts or rationalisation. That is what the US postal inscription seeks to do. Beyond the data on customer satisfaction, the demands of the business to protect and promote its reputation, the service levels in place for individual value streams, the tension between current performance and aspiration, the disappointment of missed objectives, it seeks to draw together the whole of the organisation around an ideal.

Slogans are part of the broader oral culture of an organisation. In the words of Lawrence Freedman (Strategy: A History, Oxford, 2013, p564) stories, and I think by extension slogans:

[make] it possible to avoid abstractions, reduce complexity, and make vital points indirectly, stressing the importance of being alert to serendipitous opportunities, discontented staff, or the one small point that might ruin an otherwise brilliant campaign.

But Freedman was quick to point out the use of stories by consultants and in organisations frequently confused anecdote with data. They were commonly used selectively and often contrived. Freedman sought to extract some residual value from the culture of business stories, in particular drawing on the work of psychologist Jerome Bruner along with Daniel Kahneman’s System 1 and System 2 thinking. The purpose of the narrative of an organisation, including its slogans and shared stories, is not to predict events but to define a context for action when reality is inevitably overtaken by a special cause.

In building such a rich narrative, slogans alone are an inert and lifeless tactic unless woven with the continual, rigorous criticism of historical data. In fact, it is the process behaviour chart that acts as the armature around which the narrative can be wound. Building the narrative will be critical to how individuals respond to the messages of the chart.

Deming himself coined plenty of slogans: “Drive out fear”, “Create joy in work”, … . They are not forbidden. But to be effective they must form a verisimilar commentary on, and motivation for, the hard numbers and ineluctable signals of the process behaviour chart.

Eliminate exhortations!

I had thought I would dismiss this in a single clause. It is, though, a little more complicated. The sports team captain who urges her teammates onwards to take the last gasp scoring opportunity doesn’t necessarily urge in vain. There is no analysis of this scenario. It is only muscle, nerve, sweat and emotion.

The English team just suffered a humiliating exit from the Cricket World Cup. The head coach’s response was “We’ll have to look at the data.” Andrew Miller in The Times (London) (10 March 2015) reflected most cricket fans’ view when he observed that “a team of meticulously prepared cricketers suffered a collective loss of nerve and confidence.” Exhortations might not have gone amiss.

It is not, though, a management strategy. If your principal means of managing risk, achieving compelling objectives, creating value and consistently delivering customer excellence, day in, day out is to yell “one more heave!” then you had better not lose your voice. In the long run, I am on the side of the analysts.

Slogans and exhortations will prove a brittle veneer on a stable system of trouble (RearView). It is there that they will inevitably corrode engagement, breed cynicism, foster distrust, and mask decline. Only the process behaviour chart can guard against the risk.

Eliminate targets for the workforce!

This one is more complicated. How do I communicate to the rest of the organisation what I need from them? What are the consequences when they don’t deliver? How do the rest of the organisation communicate with me? This really breaks down into two separate topics and they happen to be the two halves of Deming’s Point 11.

I shall return to those in my next two posts in the Deconstructing Deming series.

 

Richard Dawkins champions intelligent design (for business processes)

Richard Dawkins has recently had a couple of bad customer experiences. In each he was confronted with a system that seemed to him indifferent to his customer feedback. I sympathise with him on one matter but not the other. The two incidents do, in my mind, elucidate some important features of process discipline.

In the first, Dawkins spent a frustrating spell ordering a statement from his bank over the internet. He wanted to tell the bank about his experience and offer some suggestions for improvement, but he couldn’t find any means of channelling and communicating his feedback.

Embedding a business process in software will impose a rigid discipline on its operation. However, process discipline is not the same thing as process petrification. The design assumptions of any process include, or should include, the predicted range and variety of situations that the process is anticipated to encounter. We know that the bounded rationality of the designers will blind them to some of the situations that the process will subsequently confront in real world operation. There is no shame in that but the necessary adjunct is that, while the process is operated diligently as designed, data is accumulated on its performance and, in particular, on the customer’s experience. Once an economically opportune moment arrives (I have glossed over quote a bit there) the data can be reviewed, design assumptions challenged and redesign evaluated. Following redesign the process then embarks on another period of boring operation. The “boring” bit is essential to success. Perhaps I should say “mindful” rather than “boring” though I fear that does not really work with software.

Dawkins’ bank have missed an opportunity to listen to the voice of the customer. That weakens their competitive position. Ignorance cannot promote competitiveness. Any organisation that is not continually improving every process for planning, production and service (pace W Edwards Deming) faces the inevitable fact that its competitors will ultimately make such products and services obsolete. As Dawkins himself would appreciate, survival is not compulsory.

Dawkins’ second complaint was that security guards at a UK airport would not allow him to take a small jar of honey onto his flight because of a prohibition on liquids in the passenger cabin. Dawkins felt that the security guard should have displayed “common sense” and allowed it on board contrary to the black letter of the regulations. Dawkins protests against “rule-happy officials” and “bureaucratically imposed vexation”. Dawkins displays another failure of trust in bureaucracy. He simply would not believe that other people had studied the matter and come to a settled conclusion to protect his safety. It can hardly have been for the airport’s convenience. Dawkins was more persuaded by something he had read on the internet. He fell into the trap of thinking that What you see is all there is. I fear that Dawkins betrays his affinities with the cyclist on the railway crossing.

When we give somebody a process to operate we legitimately expect them to do so diligently and with self discipline. The risk of an operator departing from, adjusting or amending a process on the basis of novel local information is that, within the scope of the resources they have for taking that decision, there is no way of reliably incorporating the totality of assumptions and data on which the process design was predicated. Even were all the data available, when Dawkins talks of “common sense” he was demanding what Daniel Kahneman called System 2 thinking. Whenever we demand System 2 thinking ex tempore we are more likely to get System 1 and it is unlikely to perform effectively. The rationality of an individual operator in that moment is almost certainly more tightly bounded than that of the process designers.

In this particular case, any susceptibility of a security guard to depart from process would be exactly the behaviour that a terrorist might seek to exploit once aware of it.

Further, departures from process will have effects on the organisational system, upstream, downstream and collateral. Those related processes themselves rely on the operator’s predictable compliance. The consequence of ill discipline can be far reaching and unanticipated.

That is not to say that the security process was beyond improvement. In an effective process-oriented organisation, operating the process would be only one part of the security guard’s job. Part of the bargain for agreeing to the boring/ mindful diligent operation of the process is that part of work time is spent improving the process. That is something done offline, with colleagues, with the input of other parts of the organisation and with recognition of all the data including the voice of the customer.

Had he exercised the “common sense” Dawkins demanded, the security guard would have risked disciplinary action by his employers for serious misconduct. To some people, threats of sanctions appear at odds with engendering trust in an organisation’s process design and decision making. However, when we tell operators that something is important then fail to sanction others who ignore the process, we undermine the basis of the bond of trust with those that accepted our word and complied. Trust in the bureaucracy and sanctions for non-compliance are complementary elements of fostering process discipline. Both are essential.

The Monty Hall Problem redux

This old chestnut refuses to die and I see that it has turned up again on the BBC website. I have been intending for a while to blog about this so this has given me the excuse. I think that there has been a terrible history of misunderstanding this problem and I want to set down how the confusion comes about. People have mistaken a problem in psychology for a problem in probability.

Here is the classic statement of the problem that appeared in Parade magazine in 1990.

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?

The rational way of approaching this problem is through Bayes’ theorem. Bayes’ theorem tells us how to update our views as to the probability of events when we have some new information. In this problem I have never seen anyone start from a position other than that, before any doors are opened, no door is more probably hiding the car than the others. I think it is uncontroversial to say that for each door the probability of its hiding the car is 1/3.

Once the host opens door No. 3, we have some more information. We certainly know that the car is not behind door No. 3 but does the host tell us anything else? Bayes’ theorem tells us how to ask the right question. The theorem can be illustrated like this.
Bayes

The probability of observing the new data, if the theory is correct (the green box), is called the likelihood and plays a very important role in statistics.

Without giving the details of the mathematics, Bayes’ theorem leads us to analyse the problem in this way.

MH1

We can work this out arithmetically but, because all three doors were initially equally probable, the matter comes down to deciding which of the two likelihoods is greater.

MH2

So what are the respective probabilities of the host behaving in the way he did? Unfortunately, this is where we run into problems because the answer depends on the tactic that the host was adopting.

And we are not given that in the question.

Consider some of the following possible tactics the host may have adopted.

  1. Open an unopened door hiding a goat, if both unopened doors have goats, choose at random.
  2. If the contestant chooses door 1 (or 2, or 3), always open 3 (or 1, or 2) whether or not it contains a goat.
  3. Open either unopened door at random but only if contestant has chosen box with prize otherwise don’t open a box (the devious strategy, suggested to me by a former girlfriend as the obviously correct answer).
  4. Choose an unopened door at random. If it hides a goat open it. Otherwise do not open a door (not the same as tactic 1).
  5. Open either unopened door at random whether or not it contains a goat

There are many more. All these various tactics lead to different likelihoods.

Tactic Probability that the host revealed a goat at door 3: Rational choice
given that the car is at 1 given that the car is at 2
1

½

1

Switch
2

1

1

No difference
3

½

0

Don’t switch
4

½

½

No difference
5

½

½

No difference

So if we were given this situation in real life we would have to work out which tactic the host was adopting. The problem is presented as though it is a straightforward maths problem but it critically hinges on a problem in psychology. What can we infer from the host’s choice? What is he up to? I think that this leads to people’s discomfort and difficulty. I am aware that even people who start out assuming Tactic 1 struggle but I suspect that somewhere in the back of their minds they cannot rid themselves of the other possibilities. The seeds of doubt have been sown in the way the problem is set.

A participant in the game show would probably have to make a snap judgment about the meaning of the new data. This is the sort of thinking that Daniel Kahneman calls System 1 thinking. It is intuitive, heuristic and terribly bad at coping with novel situations. Fear of the devious strategy may well prevail.

A more ambitious contestant may try to embark on more reflective analytical System 2 thinking about the likely tactic. That would be quite an achievement under pressure. However, anyone with the inclination may have been able to prepare himself with some pre-show analysis. There may be a record of past shows from which the host’s common tactics can be inferred. The production company’s reputation in similar shows may be known. The host may be displaying signs of discomfort or emotional stress, the “tells” relied on by poker players.

There is a lot of data potentially out there. However, that only leads us to another level of statistical, and psychological, inference about the host’s strategy, an inference that itself relies on its own uncertain likelihoods and prior probabilities. And that then leads to the level of behaviour and cognitive psychology and the uncertainties in the fundamental science of human nature. It seems as though, as philosopher Richard Jeffrey put it, “It’s probabilities all the way down”.

Behind all this, it is always useful advice that, having once taken a decision, it should only be revised if there is some genuinely new data that was surprising given our initial thinking.

Economist G L S Shackle long ago lamented that:

… we habitually and, it seems, unthinkingly assume that the problem facing … a business man, is of the same kind as those set in examinations in mathematics, where the candidate unhesitatingly (and justly) takes it for granted that he has been given enough information to construe a satisfactory solution. Where, in real life, are we justified in assuming that we possess ‘enough’ information?

Music is silver but …

The other day I came across a report on the BBC website that non-expert listeners could pick out winners of piano competitions more reliably when presented with silent performance videos than when exposed to sound alone. In the latter case they performed no better than chance.

The report was based on the work of Chia-Jung Tsay at University College London, in a paper entitled Sight over sound in the judgment of music performance.

The news report immediately leads us to suspect that the expert evaluating a musical performance is not in fact analysing and weighing auditory complexity and aesthetics but instead falling under the subliminal influence of the proxy data of the artist’s demeanour and theatrics.

That is perhaps unsurprising. We want to believe, as does the expert critic, that performance evaluation is a reflective, analytical and holistic enterprise, demanding decades of exposure to subtle shades of interpretation and developing skills of discrimination by engagement with the ascendant generation of experts. This is what Daniel Kahneman calls a System 2 task. However, a wealth of psychological study shows only too well that System 2 is easily fatigued and distracted. When we believe we are thinking in System 2, we are all too often loafing in System 1 and using simplistic learned heuristics as a substitute. It is easy to imagine that the visual proxy data might be such a heuristic, a ready reckoner that provides a plausible result in a wide variety of commonly encountered situations.

These behaviours are difficult to identify, even for the most mindful individual. Kahneman notes:

… all of us live much of our lives guided by the impressions of System 1 – and we do not know the source of these impressions. How do you know that a statement is true? If it is strongly linked by logic or association to other beliefs or preferences you hold, or comes from a source you trust and like, you will feel a sense of cognitive ease. The trouble is that there may be other causes for your feeling of ease … and you have no simple way of tracing your feelings to their source”

Thinking, Fast and Slow, p64

The problem is that what Kahneman describes is exactly what I was doing in finding my biases confirmed by this press report. I have had a superficial look at the statistics in this study and I am now less persuaded than when I read the press item. I shall maybe blog about this later and the difficulties I had in interpreting the analysis. Really, this is quite a tentative and suggestive study on a very limited frame. I would certainly like to see more inter-laboratory studies in psychology. The study is open to multiple interpretations and any individual will probably have difficulty making an exhaustive list.  There is always a danger of falling into the trap of What You See Is All There Is (WYSIATI).

That notwithstanding, even anecdotally, the story is another reminder of an important lesson of process management that, even though what we have been doing has worked in the past, we may not understand what it is that has been working.

Walkie-Talkie “death ray” and risk identification

News media have been full of the tale of London’s Walkie-Talkie office block raising temperatures on the nearby highway to car melting levels.

The full story of how the architects and engineers created the problem has yet to be told. It is certainly the case that similar phenomena have been reported elsewhere. According to one news report, the Walkie-Talkie’s architect had worked on a Las Vegas hotel that caused similar problems back in September 2010.

More generally, an external hazard from a product’s optical properties is certainly something that has been noted in the past. It appears from this web page that domestic low-emissivity (low-E) glass was suspected of setting fire to adjacent buildings as long ago as 2007. I have not yet managed to find the Consumer Product Safety Commission report into low-E glass but I now know all about the hazards of snow globes.

The Walkie-Talkie phenomenon marks a signal failure in risk management and it will cost somebody to fix it. It is not yet clear whether this was a miscalculation of a known hazard or whether the hazard was simply neglected from the start.

Risk identification is the most fundamental part of risk management. If you have failed to identify a risk you are not in a position to control, mitigate or externalise it in advance. Risk identification is also the hardest part. In the case of the Walkie-Talkie, modern materials, construction methods and aesthetic tastes have conspired to create a phenomenon that was not, at least as an accidental feature, present in structures before this century. That means that risk identification is not a matter of running down a checklist of known hazards to see which apply. Novel and emergent risks are always the most difficult to identify, especially where they involve the impact of an artefact on its environment. This is a real, as Daniel Kahneman would put it, System 2 task. The standard checklist propels it back to the flawed System 1 level. As we know, even when we think we are applying a System 2 mindset, me may subconsciously be loafing in a subliminal System 1.

It is very difficult to spot when something has been missed out of a risk assessment, even in familiar scenarios. In a famous 1978 study by Fischhoff, Slovic and others, they showed to college students fault trees analysing potential causes of a car’s failure to start (this is 1978). Some of the fault trees had been “pruned”. One branch, representing say “battery charge”, had been removed. The subjects were very poor at spotting that a major, and well known, source of failure had been omitted from the analysis. Where failure modes are unfamiliar, it is even more difficult to identify the lacuna.

Even where failure modes are identified, if they are novel then they still present challenges in effective design and risk management. Henry Petroski, in Design Paradigms, his historical analysis of human error in structural engineering, shows how novel technologies present challenges for the development of new engineering methodologies. As he says:

There is no finite checklist of rules or questions that an engineer can apply and answer in order to declare that a design is perfect and absolutely safe, for such finality is incompatible with the whole process, practice and achievement of engineering. Not only must engineers preface any state-of-the-art analysis with what has variously been called engineering thinking and engineering judgment, they must always supplement the results of their analysis with thoughtful and considered interpretations of the results.

I think there are three principles that can help guard against an overly narrow vision. Firstly, involve as broad a selection of people as possible in hazard identification. Perhaps, diagonal slice the organisation. Do not put everybody in a room together where they can converge rapidly. This is probably a situation where some variant of the Delphi method can be justified.

Secondly, be aware that all assessments are provisional. Make design assumptions explicit. Collect data at every stage, especially on your assumptions. Compare the data with what you predicted would happen. Respond to any surprises by protecting the customer and investigating. Even if you’ve not yet melted a Jaguar, if the glass is looking a little more reflective than you thought it would be, take immediate action. Do not wait until you are in the Evening Standard. There is a reputation management side to this too.

Thirdly, as Petroski advocates, analysis of case studies and reflection on the lessons of history helps to develop broader horizons and develop a sense of humility. It seems nobody’s life is actually in danger from this “death ray” but the history of failures to identify risk leaves a more tangible record of mortality.