Soccer management – signal, noise and contract negotiation

Some poor data journalism here from the BBC on 28 May 2015, concerning turnover in professional soccer managers in England. “Managerial sackings reach highest level for 13 years” says the headline. A classic executive time series. What is the significance of the 13 years? Other than it being the last year with more sackings than the present.

The data was purportedly from the League Managers’ Association (LMA) and their Richard Bevan thought the matter “very concerning”. The BBC provided a chart (fair use claimed).

MgrSackingsto201503

Now, I had a couple of thoughts as soon as I saw this. Firstly, why chart only back to 2005/6? More importantly, this looked to me like a stable system of trouble (for football managers) with the possible exception of this (2014/15) season’s Championship coach turnover. Personally, I detest multiple time series on a common chart unless there is a good reason for doing so. I do not think it the best way of showing variation and/ or association.

Signal and noise

The first task of any analyst looking at data is to seek to separate signal from noise. Nate Silver made this point powerfully in his book The Signal and the Noise: The Art and Science of Prediction. As Don Wheeler put it: all data has noise; some data has signal.

Noise is typically the irregular aggregate of many causes. It is predictable in the same way as a roulette wheel. A signal is a sign of some underlying factor that has had so large an effect that it stands out from the noise. Signals can herald a fundamental unpredictability of future behaviour.

If we find a signal we look for a special cause. If we start assigning special causes to observations that are simply noise then, at best, we spend money and effort to no effect and, at worst, we aggravate the situation.

The Championship data

In any event, I wanted to look at the data for myself. I was most interested in the Championship data as that was where the BBC and LMA had been quick to find a signal. I looked on the LMA’s website and this is the latest data I found. The data only records dismissals up to 31 March of the 2014/15 season. There were 16. The data in the report gives the total number of dismissals for each preceding season back to 2005/6. The report separates out “dismissals” from “resignations” but does not say exactly how the classification was made. It can be ambiguous. A manager may well resign because he feels his club have themselves repudiated his contract, a situation known in England as constructive dismissal.

The BBC’s analysis included dismissals right up to the end of each season including 2014/15. Reading from the chart they had 20. The BBC have added some data for 2014/15 that isn’t in the LMA report and not given the source. I regard that as poor data journalism.

I found one source of further data at website The Sack Race. That told me that since the end of March there had been four terminations.

Manager Club Termination Date
Malky Mackay Wigan Athletic Sacked 6 April
Lee Clark Blackpool Resigned 9 May
Neil Redfearn Leeds United Contract expired 20 May
Steve McClaren Derby County Sacked 25 May

As far as I can tell, “dismissals” include contract non-renewals and terminations by mutual consent. There are then a further three dismissals, not four. However, Clark left Blackpool amid some corporate chaos. That is certainly a termination that is classifiable either way. In any event, I have taken the BBC figure at face value though I am alerted as to some possible data quality issues here.

Signal and noise

Looking at the Championship data, this was the process behaviour chart, plotted as an individuals chart.

MgrSackingsto201503

There is a clear signal for the 2014/15 season with an observation, 20 dismissals,, above the upper natural process limit of 19.18 dismissals. Where there is a signal we should seek a special cause. There is no guarantee that we will find a special cause. Data limitations and bounded rationality are always constraints. In fact, there is no guarantee that there was a special cause. The signal could be a false positive. Such effects cannot be eliminated. However, signals efficiently direct our limited energy for, what Daniel Kahneman calls, System 2 thinking towards the most promising enquiries.

Analysis

The BBC reports one narrative woven round the data.

Bevan said the current tenure of those employed in the second tier was about eight months. And the demand to reach the top flight, where a new record £5.14bn TV deal is set to begin in 2016, had led to clubs hitting the “panic button” too quickly.

It is certainly a plausible view. I compiled a list of the dismissals and non-renewals, not the resignations, with data from Wikipedia and The Sack Race. I only identified 17 which again suggests some data quality issue around classification. I have then charted a scatter plot of date of dismissal against the club’s then league position.

MgrSackings201415

It certainly looks as though risk of relegation is the major driver for dismissal. Aside from that, Watford dismissed Billy McKinlay after only two games when they were third in the league, equal on points with the top two. McKinlay had been an emergency appointment after Oscar Garcia had been compelled to resign through ill health. Watford thought they had quickly found a better manager in Slavisa Jokanovic. Watford ended the season in second place and were promoted to the Premiership.

There were two dismissals after the final game on 2 May by disappointed mid-table teams. Beyond that, the only evidence for impulsive managerial changes in pursuit of promotion is the three mid-season, mid-table dismissals.

Club league position
Manager Club On dismissal At end of season
Nigel Adkins Reading 16 19
Bob Peeters Charlton Athletic 14 12
Stuart Pearce Nottingham Forrest 12 14

A table that speaks for itself. I am not impressed by the argument that there has been the sort of increase in panic sackings that Bevan fears. Both Blackpool and Leeds experienced chaotic executive management which will have resulted in an enhanced force of mortality on their respective coaches. That along with the data quality issues and the technical matter I have described below lead me to feel that there was no great enhanced threat to the typical Championship manager in 2014/15.

Next season I would expect some regression to the mean with a lower number of dismissals. Not much of a prediction really but that’s what the data tells me. If Bevan tries to attribute that to the LMA’s activism them I fear that he will be indulging in Langian statistical analysis. Will he be able to resist?

Techie bit

I have a preference for individuals charts but I did also try plotting the data on an np-chart where I found no signal. It is trite service-course statistics that a Poisson distribution with mean λ has standard deviation √λ so an upper 3-sigma limit for a (homogeneous) Poisson process with mean 11.1 dismissals would be 21.1 dismissals. Kahneman has cogently highlighted how people tend to see patterns in data as signals even where they are typical of mere noise. In this case I am aware that the data is not atypical of a Poisson process so I am unsurprised that I failed to identify a special cause.

A Poisson process with mean 11.1 dismissals is a pretty good model going forwards and that is the basis I would press on any managers in contract negotiations.

Of course, the clubs should remember that when they look for a replacement manager they will then take a random sample from the pool of job seekers. Really!

Advertisements

Anecdotes and p-values

JellyBellyBeans.jpgI have been feeling guilty ever since I recently published a p-value. It led me to sit down and think hard about why I could not resist doing so and what I really think it told me, if anything. I suppose that a collateral question is to ask why I didn’t keep it to myself. To be honest, I quite often calculate p-values though I seldom let on.

It occurred to me that there was something in common between p-values and the anecdotes that I have blogged about here and here. Hence more jellybeans.

What is a p-value?

My starting data was the conversion rates of 10 elite soccer penalty takers. Each of their conversion rates was different. Leighton Baines had the best figures having converted 11 out of 11. Peter Beardsley and Emmanuel Adebayor had the superficially weakest, having converted 18 out of 20 and 9 out of 10 respectively. To an analyst that raises a natural question. Was the variation between the performance signal or was it noise?

In his rather discursive book The Signal and the Noise: The Art and Science of Prediction, Nate Silver observes:

The signal is the truth. The noise is what distracts us from the truth.

In the penalties data the signal, the truth, that we are looking for is Who is the best penalty taker and how good are they? The noise is the sampling variation inherent in a short sequence of penalty kicks. Take a coin and toss it 10 times. Count the number of heads. Make another 10 tosses. And a third 10. It is unlikely that you got the same number of heads but that was not because anything changed in the coin. The variation between the three counts is all down to the short sequence of tosses, the sampling variation.

In Understanding Variation: The Key to Managing ChaosDon Wheeler observes:

All data has noise. Some data has signal.

We first want to know whether the penalty statistics display nothing more than sampling variation or whether there is also a signal that some penalty takers are better than others, some extra variation arising from that cause.

The p-value told me the probability that we could have observed the data we did had the variation been solely down to noise, 0.8%. Unlikely.

p-Values do not answer the exam question

The first problem is that p-values do not give me anything near what I really want. I want to know, given the observed data, what it the probability that penalty conversion rates are just noise. The p-value tells me the probability that, were penalty conversion rates just noise, I would have observed the data I did.

The distinction is between the probability of data given a theory and the probability of a theory give then data. It is usually the latter that is interesting. Now this may seem like a fine distinction without a difference. However, consider the probability that somebody with measles has spots. It is, I think, pretty close to one. Now consider the probability that somebody with spots has measles. Many things other than measles cause spots so that probability is going to be very much less than one. I would need a lot of information to come to an exact assessment.

In general, Bayes’ theorem governs the relationship between the two probabilities. However, practical use requires more information than I have or am likely to get. The p-values consider all the possible data that you might have got if the theory were true. It seems more rational to consider all the different theories that the actual data might support or imply. However, that is not so simple.

A dumb question

In any event, I know the answer to the question of whether some penalty takers are better than others. Of course they are. In that sense p-values fail to answer a question to which I already know the answer. Further, collecting more and more data increases the power of the procedure (the probability that it dodges a false negative). Thus, by doing no more than collecting enough data I can make the p-value as small as I like. A small p-value may have more to do with the number of observations than it has with anything interesting in penalty kicks.

That said, what I was trying to do in the blog was to set a benchmark for elite penalty taking. As such this was an enumerative study. Of course, had I been trying to select a penalty taker for my team, that would have been an analytic study and I would have to have worried additionally about stability.

Problems, problems

There is a further question about whether the data variation arose from happenstance such as one or more players having had the advantage of weather or ineffective goalkeepers. This is an observational study not a designed experiment.

And even if I observe a signal, the p-value does not tell me how big it is. And it doesn’t tell me who is the best or worst penalty taker. As R A Fisher observed, just because we know there had been a murder we do not necessarily know who was the murderer.

E pur si muove

It seems then that individuals will have different ways of interpreting p-values. They do reveal something about the data but it is not easy to say what it is. It is suggestive of a signal but no more. There will be very many cases where there are better alternative analytics about which there is less ambiguity, for example Bayes factors.

However, in the limited case of what I might call alternative-free model criticism I think that the p-value does provide me with some insight. Just to ask the question of whether the data is consistent with the simplest of models. However, it is a similar insight to that of an anecdote: of vague weight with little hope of forming a consensus round its interpretation. I will continue to calculate them but I think it better if I keep quiet about it.

R A Fisher often comes in for censure as having done more than anyone to advance the cult of p-values. I think that is unfair. Fisher only saw p-values as part of the evidence that a researcher would have to hand in reaching a decision. He saw the intelligent use of p-values and significance tests as very different from the, as he saw it, mechanistic practices of hypothesis testing and acceptance procedures on the Neyman-Pearson model.

In an acceptance procedure, on the other hand, acceptance is irreversible, whether the evidence for it was strong or weak. It is the result of applying mechanically rules laid down in advance; no thought is given to the particular case, and the tester’s state of mind, or his capacity for learning is inoperative. By contrast, the conclusions drawn by a scientific worker from a test of significance are provisional, and involve an intelligent attempt to understand the experimental situation.

“Statistical methods and scientific induction”
Journal of the Royal Statistical Society Series B 17: 69–78. 1955, at 74-75

Fisher was well known for his robust, sometimes spiteful, views on other people’s work. However, it was Maurice Kendall in his obituary of Fisher who observed that:

… a man’s attitude toward inference, like his attitude towards religion, is determined by his emotional make-up, not by reason or mathematics.

The art of managing footballers

Van Persie (15300483040) (crop).jpg… or is it a science? Robin van Persie’s penalty miss against West Bromwich Albion on 2 May 2015 was certainly welcome news to my ears. It eased the relegation pressures on West Brom and allowed us to advance to 40 points for the season. Relegation fears are only “mathematical” now. However, the miss also resulted in van Persie being relieved of penalty taking duties, by Manchester United manager Louis van Gaal, until further notice.

He is now at the end of the road. It is always [like that]. Wayne [Rooney] has missed also so when you miss you are at the bottom again.

The Daily Mail report linked above goes on to say that van Persie had converted his previous 6 penalties.

Van Gaal was, of course, referring to Rooney’s shot over the crossbar against West Ham in February 2013, when Rooney had himself invited then manager Sir Alex Ferguson to retire him as designated penalty taker. Rooney’s record had apparently been 9 misses from 27 penalties. I have all this from this Daily Telegraph report.

I wonder if statistics can offer any insight into soccer management?

The benchmark

It was very difficult to find, very quickly, any exhaustive statistics on penalty conversion rates on the web. However, I would like to start by establishing what constituted “good” performance for a penalty taker. As a starting point I have looked at Table 2 on this Premier League website. The data is from February 2014 and shows, at that date, data on the players with the best conversion rates in the League’s history. Players who took fewer than 10 penalties were excluded. It shows that of the ten top converting players, who must rank as the very good if not the ten best, in the aggregate they converted 155 of 166 penalties. That is a conversion rate of 93.4%. At first sight that suggests a useful baseline against which to assess any individual penalty taker.

Several questions come to mind. The aggregate statistics do not tell us how individual players have developed over time, whether improving or losing their nerve. That said, it is difficult to perform that sort of analysis on these comparatively low volumes of data when collected in this way. There is however data (Table 4) on the overall conversion rate in the Premier League since its inception.

Penalties

That looks to me like a fairly stable system. That would be expected as players come and go and this is the aggregate of many effects. Perhaps there is latterly reduced season-to-season variation, which would be odd, but I am not really interested in that and have not pursued it. I am aware that during this period there has been a rule change allowing goalkeepers to move before the kick his taken but I have just spent 30 minutes on the web and failed to establish the date when that happened. The total aggregate statistics up to 2014 are 1,438 penalties converted out of 1,888. That is a conversion rate of 76.2%.

I did wonder if there was any evidence that some of the top ten players were better than others or whether the data was consistent with a common elite conversion rate of 93.4%. In that case the table positions would reflect nothing more than sampling variation. Somewhat reluctantly I calculated the chi-squared statistic for the table of successes and failures (I know! But what else to do?). The statistic came out as 2.02 which, with 9 degrees of freedom, has a p-value (I know!) of 0.8%. That is very suggestive of a genuine ranking among the elite penalty takers.

It inevitably follows that the elite are doing better than the overall success rate of 76.2%. Considering all that together I am happy to proceed with 93.4% as the sort of benchmark for a penalty taker that a team like Manchester United would aspire to.

Van Persie

This website, dated 6 Sept 2012, told me that van Persie had converted 18 penalties with a 77% success rate. That does not quite fit either 18/23 or 18/24 but let us take it at face value. If that is accurate then that is, more or less, the data on which Ferguson gave van Persie the job in February 2013. It is a surprising appointment given the Premier League average of 76.2% and the elite benchmark but perhaps it was the best that could be mustered from the squad.

Rooney’s 9 misses out of 27 yields a success rate of 67%. Not so much lower than van Persie’s historical performance but, in all the circumstances, it was not good enough.

The dismissal

What is fascinating is that, no matter what van Persie’s historical record on which he was appointed penalty taker, before his 2 May miss he had scored 6 out of 6. The miss made it 6 out of 7, 85.7%. That was his recent record of performance, even if selected to some extent to show him in a good light.

Selection of that run is a danger. It is often “convenient” to select a subset of data that favours a cherished hypothesis. Though there might be that selectivity, where was the real signal that van Persie had deteriorated or that the club would perform better were he replaced?

The process

Of course, a manager has more information than the straightforward success/ fail ratio. A coach may have observed goalkeepers increasingly guessing a penalty taker’s shot direction. There may have been many near-saves, a hesitancy on the part of the player, trepidation in training. Those are all factors that a manager must take into account. That may lead to the rotation of even the most impressive performer. Perhaps.

But that is not the process that van Gaal advocates. Keep scoring until you miss then go to the bottom of the list. The bottom! Even scorers in the elite-10 miss sometimes. Is it rational to then replace them with an alternative that will most likely be more average (i.e. worse)? And then make them wait until everyone else has missed.

With an average success rate of 76.2% it is more likely than not that van Persie’s replacement will score their first penalty. Van Gaal will be vindicated. That is the phenomenon called regression to the mean. An extreme event (a miss) is most likely followed by something more average (a goal). Economist Daniel Kahneman explores this at length in his book Thinking, Fast and Slow.

It is an odd strategy to adopt. Keep the able until they fail. Then replace them with somebody less able. But different.