UK railway suicides – 2017 update

The latest UK rail safety statistics were published on 23 November 2017, again absent much of the press fanfare we had seen in the past. Regular readers of this blog will know that I have followed the suicide data series, and the press response, closely in 2016, 20152014, 2013 and 2012. Again I have re-plotted the data myself on a Shewhart chart.

RailwaySuicides20171

Readers should note the following about the chart.

  • Many thanks to Tom Leveson Gower at the Office of Rail and Road who confirmed that the figures are for the year up to the end of March.
  • Some of the numbers for earlier years have been updated by the statistical authority.
  • I have recalculated natural process limits (NPLs) as there are still no more than 20 annual observations, and because the historical data has been updated. The NPLs have therefore changed but, this year, not by much.
  • Again, the pattern of signals, with respect to the NPLs, is similar to last year.

The current chart again shows two signals, an observation above the upper NPL in 2015 and a run of 8 below the centre line from 2002 to 2009. As I always remark, the Terry Weight rule says that a signal gives us license to interpret the ups and downs on the chart. So I shall have a go at doing that.

It will not escape anybody’s attention that this is now the second year in which there has been a fall in the number of fatalities.

I haven’t yet seen any real contemporaneous comment on the numbers from the press. This item appeared on the BBC, a weak performer in the field of data journalism but clearly with privileged access to the numbers, on 30 June 2017, confidently attributing the fall to past initiatives.

Sky News clearly also had advanced sight of the numbers and make the bold claim that:

… for every death, six more lives were saved through interventions.

That item goes on to highlight a campaign to encourage fellow train users to engage with anybody whose behaviour attracted attention.

But what conclusions can we really draw?

In 2015 I was coming to the conclusion that the data increasingly looked like a gradual upward trend. The 2016 data offered a challenge to that but my view was still that it was too soon to say that the trend had reversed. There was nothing in the data incompatible with a continuing trend. This year, 2017, has seen 2016’s fall repeated. A welcome development but does it really show conclusively that the upward trending pattern is broken? Regular readers of this blog will know that Langian statistics like “lowest for six years” carry no probative weight here.

Signal or noise?

Has there been a change to the underlying cause system that drives the suicide numbers? Last year, I fitted a trend line through the data and asked which narrative best fitted what I observed, a continuing increasing trend or a trend that had plateaued or even reversed. You can review my analysis from last year here.

Here is the data and fitted trend updated with this year’s numbers, along with NPLs around the fitted line, the same as I did last year.

RailwaySuicides20172

Let’s think a little deeper about how to analyse the data. The first step of any statistical investigation ought to be the cause and effect diagram.

SuicideCne

The difficulty with the suicide data is that there is very little reproducible and verifiable knowledge as to its causes. I have seen claims, of whose provenance I am uncertain, that railway suicide is virtually unknown in the USA. There is a lot of useful thinking from common human experience and from more general theories in psychology. But the uncertainty is great. It is not possible to come up with a definitive cause and effect diagram on which all will agree, other from the point of view of identifying candidate factors.

The earlier evidence of a trend, however, suggests that there might be some causes that are developing over time. It is not difficult to imagine that economic trends and the cumulative awareness of other fatalities might have an impact. We are talking about a number of things that might appear on the cause and effect diagram and some that do not, the “unknown unknowns”. When I identified “time” as a factor, I was taking sundry “lurking” factors and suspected causes from the cause and effect diagram that might have a secular impact. I aggregated them under the proxy factor “time” for want of a more exact analysis.

What I have tried to do is to split the data into two parts:

  • A trend (linear simply for the sake of exploratory data analysis (EDA); and
  • The residual variation about the trend.

The question I want to ask is whether the residual variation is stable, just plain noise, or whether there is a signal there that might give me a clue that a linear trend does not hold.

There is no signal in the detrended data, no signal that the trend has reversed. The tough truth of the data is that it supports either narrative.

  • The upward trend is continuing and is stable. There has been no reversal of trend yet.
  • The data is not stable. True there is evidence of an upward trend in the past but there is now evidence that deaths are decreasing.

Of course, there is no particular reason, absent the data, to believe in an increasing trend and the initiative to mitigate the situation might well be expected to result in an improvement.

Sometimes, with data, we have to be honest and say that we do not have the conclusive answer. That is the case here. All that can be done is to continue the existing initiatives and look to the future. Nobody ever likes that as a conclusion but it is no good pretending things are unambiguous when that is not the case.

Next steps

Previously I noted proposals to repeat a strategy from Japan of bathing railway platforms with blue light. In the UK, I understand that such lights were installed at Gatwick in summer 2014. In fact my wife and I were on the platform at Gatwick just this week and I had the opportunity to observe them. I also noted, on my way back from court the other day, blue strip lights along the platform edge at East Croydon. I think they are recently installed. However, I have not seen any data or heard of any analysis.

A huge amount of sincere endeavour has gone into this issue but further efforts have to be against the background that there is still no conclusive evidence of improvement.

Suggestions for alternative analyses are always welcomed here.

Advertisements

Building targets, constructing behaviour

Recently, the press reported that UK construction company Bovis Homes Group PLC have run into trouble for encouraging new homeowners to move into unfinished homes and have therefore faced a barrage of complaints about construction defects. It turns out that these practices were motivated by a desire to hit ambitious growth targets. Yet it has all had a substantial impact on trading position and mark downs for Bovis shares.1

I have blogged about targets before. It is worth repeating what I said there about the thoughts of John Pullinger, head of the UK Statistics Authority. He gave a trenchant warning about the “unsophisticated” use of targets. He cautioned:2

Anywhere we have had targets, there is a danger that they become an end in themselves and people lose sight of what they’re trying to achieve. We have numbers everywhere but haven’t been well enough schooled on how to use them and that’s where problems occur.

He went on.

The whole point of all these things is to change behaviour. The trick is to have a sophisticated understanding of what will happen when you put these things out.

That message was clearly one that Bovis didn’t get. They legitimately adopted an ambitious growth target but they forgot a couple of things. They forgot that targets, if not properly risk assessed, can create perverse incentives to distort the system. They forgot to think about how manager behaviour might be influenced. Leaders need to be able to harness insights from behavioural economics. Further, a mature system of goal deployment imposes a range of metrics across a business, each of which has to contribute to the global organisational plan. It is no use only measuring sales if measures of customer satisfaction and input measures about quality are neglected or even deliberately subverted. An organisation needs a rich dashboard and needs to know how to use it.

Critically, it is a matter of discipline. Employees must be left in no doubt that lack of care in maintaining the integrity of the organisational system and pursuing customer excellence will not be excused by mere adherence to a target, no matter how heroic. Bovis was clearly a culture where attention to customer requirements was not thought important by the staff. That is inevitably a failure of leadership.

Compare and contrast

Bovis are an interesting contrast with supermarket chain Sainsbury’s who featured in a law report in the same issue of The Times.3 Bovis and Sainsbury’s clearly have very different approaches as to how they communicate to their managers what is important.

Sainsbury’s operated a rigorous system of surveying staff engagement which aimed to embrace all employees. It was “deeply engrained in Sainsbury’s culture and was a critical part of Sainsbury’s strategy”. An HR manager sent an email to five store managers suggesting that the rigour could be relaxed. Not all employees needed to be engaged, he said, and participation could be restricted to the most enthusiastic. That would have been a clear distortion of the process.

Mr Colin Adesokan was a senior manager who subsequently learned of the email. He asked the HR manager to explain what he had meant but received no response and the email was recirculated. Adesokan did nothing. When his inaction came to the attention of the chief executive, Adesokan was dismissed summarily for gross misconduct.

He sued his employer and the matter ended up in the Court of Appeal, Adesokan arguing that such mere inaction over a colleague’s behaviour was incapable of constituting gross misconduct. The Court of Appeal did not agree. They found that, given the significance placed by Sainsbury’s on the engagement process, the trial judge had been entitled to find that Adesokan had been seriously in dereliction of his duty. That failing constituted gross misconduct because it had the effect of undermining the trust and confidence in the employment relationship. Adesokan seemed to have been indifferent to what, in Sainsbury’s eyes, was a very serious breach of an important procedure. Sainsbury’s had been entitled to dismiss him summarily for gross misconduct.

That is process discipline. That is how to manage it.

Display constancy of purpose in communicating what is important. Do not turn a blind eye to breaches. Do not tolerate those who would turn the blind eye. When you combine that with mature goal deployment and sophistication as to how to interpret variation in metrics then you are beginning to master, at least some parts of, how to run a business.

References

  1. “Share price plunges as Bovis tries to rebuild customers’ trust” (paywall), The Times (London), 20 February 2017
  2. “Targets could be skewing the truth, statistics chief warns” (paywall), The Times (London), 26 May 2014
  3. Adesokan v Sainsbury’s Supermarkets Ltd [2017] EWCA Civ 22, The Times, 21 February 2017 (paywall)

Why would a lawyer blog about statistics?

Brandeis and Taylor… is a question I often get asked. I blog here about statistics, data, quality, data quality, productivity, management and leadership. And evidence. I do it from my perspective as a practising lawyer and some people find that odd. Yet it turns out that the collaboration between law and quantitative management science is a venerable one.

The grandfather of scientific management is surely Frederick Winslow Taylor (1856-1915). Taylor introduced the idea of scientific study of work tasks, using data and quantitative methods to redesign and control business processes.

Yet one of Taylorism’s most effective champions was a lawyer, Louis Brandeis (1856-1941). In fact, it was Brandeis who coined the term scientific management.

Taylor

Taylor was a production engineer who advocated a four stage strategy for productivity improvement.

  1. Replace rule-of-thumb work methods with methods based on a scientific study of the tasks.
  2. Scientifically select, train, and develop each employee rather than passively leaving them to train themselves.
  3. Provide “Detailed instruction and supervision of each worker in the performance of that worker’s discrete task”.1
  4. Divide work nearly equally between managers and workers, so that the managers apply scientific management principles to planning the work and the workers actually perform the tasks.

Points (3) and (4) tend to jar with millennial attitudes towards engagement and collaborative work. Conservative political scientist Francis Fukuyama criticised Taylor’s approach as “[epitomising] the carrying of the low-trust, rule based factory system to its logical conclusion”.2 I have blogged many times on here about the importance of trust.

However, (1) and (2) provided the catalyst for pretty much all subsequent management science from W Edwards Deming, Elton Mayo, and Taiichi Ohno through to Six Sigma and Lean. Subsequent thinking has centred around creating trust in the workplace as inseparable from (1) and (2). Peter Drucker called Taylor the “Isaac Newton (or perhaps the Archimedes) of the science of work”.

Taylor claimed substantial successes with his redesign of work processes based on the evidence he had gathered, avant la lettre, in the gemba. His most cogent lesson was to exhort managers to direct their attention to where value was created rather than to confine their horizons to monthly accounts and executive summaries.

Of course, Taylor was long dead before modern business analytics began with Walter Shewhart in 1924. There is more than a whiff of the #executivetimeseries about some of Taylor’s work. Once management had Measurement System Analysis and the Shewhart chart there would no longer be any hiding place for groundless claims to non-existent improvements.

Brandeis

Brandeis practised as a lawyer in the US from 1878 until he was appointed a Justice of the Supreme Court in 1916. Brandeis’ principles as a commercial lawyer were, “first, that he would never have to deal with intermediaries, but only with the person in charge…[and] second, that he must be permitted to offer advice on any and all aspects of the firm’s affairs”. Brandies was trenchant about the benefits of a coherent commitment to business quality. He also believed that these things were achieved, not by chance, but by the application of policy deployment.

Errors are prevented instead of being corrected. The terrible waste of delays and accidents is avoided. Calculation is substituted for guess; demonstration for opinion.

Brandeis clearly had a healthy distaste for muda.3 Moreover, he was making a land grab for the disputed high ground that these days often earns the vague and fluffy label strategy.

The Eastern Rate Case

The worlds of Taylor and Brandeis embraced in the Eastern Rate Case of 1910. The Eastern Railroad Company had applied to the Interstate Commerce Commission (“the ICC”) arguing that their cost base had inflated and that an increase in their carriage rates was necessary to sustain the business. The ICC was the then regulator of those utilities that had a monopoly element. Brandeis by this time had taken on the role of the People’s Lawyer, acting pro bono in whatever he deemed to be the public interest.

Brandeis opposed the rate increase arguing that the escalation in Eastern’s cost base was the result of management failure, not an inevitable consequence of market conditions. The cost of a monopoly’s ineffective governance should, he submitted, not be born by the public, nor yet by the workers. In court Brandeis was asked what Eastern should do and he advocated scientific management. That is where and when the term was coined.4

Taylor-Brandeis

The insight that profit cannot simply be wished into being by the fiat of cost plus, a fortiori of the hourly rate, is the Milvian bridge to lean.

But everyone wants to occupy the commanding heights of an integrated policy nurturing quality, product development, regulatory compliance, organisational development and the economic exploitation of customer value. What’s so special about lawyers in the mix? I think we ought to remind ourselves that if lawyers know about anything then we know about evidence. And we just might know as much about it as the statisticians, the engineers and the enforcers. Here’s a tale that illustrates our value.

Thereza Imanishi-Kari was a postdoctoral researcher in molecular biology at the Massachusetts Institute of Technology. In 1986 a co-worker raised inconsistencies in Imanishi-Kari’s earlier published work that escalated into allegations that she had fabricated results to validate publicly funded research. Over the following decade, the allegations grew in seriousness, involving the US Congress, the Office of Scientific Integrity and the FBI. Imanishi-Kari was ultimately exonerated by a departmental appeal board constituted of an eminent molecular biologist and two lawyers. The board heard cross-examination of the relevant experts including those in statistics and document examination. It was that cross-examination that exposed the allegations as without foundation.5

Lawyers can make a real contribution to discovering how a business can be run successfully. But we have to live the change we want to be. The first objective is to bring management science to our own business.

The black-letter man may be the man of the present but the man of the future is the man of statistics and the master of economics.

Oliver Wendell Holmes, 1897

References

  1. Montgomery, D (1989) The Fall of the House of Labor: The Workplace, the State, and American Labor Activism, 1865-1925, Cambridge University Press, p250
  2. Fukuyama, F (1995) Trust: The Social Virtues and the Creation of Prosperity, Free Press, p226
  3. Kraines, O (1960) “Brandeis’ philosophy of scientific management” The Western Political Quarterly 13(1), 201
  4. Freedman, L (2013) Strategy: A History, Oxford University Press, pp464-465
  5. Kevles, D J (1998) The Baltimore Case: A Trial of Politics, Science and Character, Norton

Regression done right: Part 3: Forecasts to believe in

There are three Sources of Uncertainty in a forecast.

  1. Whether the forecast is of “an environment that is sufficiently regular to be predictable”.1
  2. Uncertainty arising from the unexplained (residual) system variation.
  3. Technical statistical sampling error in the regression calculation.

Source of Uncertainty (3) is the one that fascinates statistical theorists. Sources (1) and (2) are the ones that obsess the rest of us. I looked at the first in Part 1 of this blog and, the second in Part 2. Now I want to look at the third Source of Uncertainty and try to put everything together.

If you are really most interested in (1) and (2), read “Prediction intervals” then skip forwards to “The fundamental theorem of forecasting”.

Prediction intervals

A prediction interval2 captures the range in which a future observation is expected to fall. Bafflingly, not all statistical software generates prediction intervals automatically so it is necessary, I fear, to know how to calculate them from first principles. However, understanding the calculation is, in itself, instructive.

But I emphasise that prediction intervals rely on a presumption that what is being forecast is “an environment that is sufficiently regular to be predictable”, that the (residual) business process data is exchangeable. If that presumption fails then all bets are off and we have to rely on a Cardinal Newman analysis. Of course, when I say that “all bets are off”, they aren’t. You will still be held to your existing contractual commitments even though your confidence in achieving them is now devastated. More on that another time.

Sources of variation in predictions

In the particular case of linear regression we need further to break down the third Source of Uncertainty.

  1. Uncertainty arising from the unexplained (residual) variation.
  2. Technical statistical sampling error in the regression calculation.
    1. Sampling error of the mean.
    2. Sampling error of the slope

Remember that we are, for the time being, assuming Source of Uncertainty (1) above can be disregarded. Let’s look at the other Sources of Uncertainty in turn: (2), (3A) and (3B).

Source of Variation (2) – Residual variation

We start with the Source of Uncertainty arising from the residual variation. This is the uncertainty because of all the things we don’t know. We talked about this a lot in Part 2. We are content, for the moment, that they are sufficiently stable to form a basis for prediction. We call this common cause variation. This variation has variance s2, where s is the residual standard deviation that will be output by your regression software.

RegressionResExpl2

Source of Variation (3A) – Sampling error in mean

To understand the next Source of Variation we need to know a little bit about how the regression is calculated. The calculations start off with the respective means of the X values ( X̄ ) and of the Y values ( Ȳ ). Uncertainty in estimating the mean of the Y , is the next contribution to the global prediction uncertainty.

An important part of calculating the regression line is to calculate the mean of the Ys. That mean is subject to sampling error. The variance of the sampling error is the familiar result from the statistics service course.

RegEq2

— where n is the number of pairs of X and Y. Obviously, as we collect more and more data this term gets more and more negligible.

RegressionMeanExpl

Source of Variation (3B) – Sampling error in slope

This is a bit more complicated. Skip forwards if you are already confused. Let me first give you the equation for the variance of predictions referable to sampling error in the slope.

RegEq3

This has now introduced the mysterious sum of squaresSXX. However, before we learn exactly what this is, we immediately notice two things.

  1. As we move away from the centre of the training data the variance gets larger.3
  2. As SXX gets larger the variance gets smaller.

The reason for the increasing sampling error as we move from the mean of X is obvious from thinking about how variation in slope works. The regression line pivots on the mean. Travelling further from the mean amplifies any disturbance in the slope.

RegressionSlopeExpl

Let’s look at where SXX comes from. The sum of squares is calculated from the Xs alone without considering the Ys. It is a characteristic of the sampling frame that we used to train the model. We take the difference of each X value from the mean of X, and then square that distance. To get the sum of squares we then add up all those individual squares. Note that this is a sum of the individual squares, not their average.

RegressionSXXTable

Two things then become obvious (if you think about it).

  1. As we get more and more data, SXX gets larger.
  2. As the individual Xs spread out over a greater range of XSXX gets larger.

What that (3B) term does emphasise is that even sampling error escalates as we exploit the edge of the original training data. As we extrapolate clear of the original sampling frame, the pure sampling error can quickly exceed even the residual variation.

Yet it is only a lower bound on the uncertainty in extrapolation. As we move away from the original range of Xs then, however happy we were previously with Source of Uncertainty (1), that the data was from “an environment that is sufficiently regular to be predictable”, then the question barges back in. We are now remote from our experience base in time and boundary. Nothing outside the original X-range will ever be a candidate for a comfort zone.

The fundamental theorem of prediction

Variances, generally, add up so we can sum the three Sources of Variation (2), (3A) and (3B). That gives the variance of an individual prediction, spred2. By an individual prediction I mean that somebody gives me an X and I use the regression formula to give them the (as yet unknown) corresponding Ypred.

RegEq4

It is immediately obvious that s2 is common to all three terms. However, the second and third terms, the sampling errors, can be made as small as we like by collecting more and more data. Collecting more and more data will have no impact on the first term. That arises from the residual variation. The stuff we don’t yet understand. It has variance s2, where s is the residual standard deviation that will be output by your regression software.

This, I say, is the fundamental theorem of prediction. The unexplained variation provides a hard limit on the precision of forecasts.

It is then a very simple step to convert the variance into a standard deviation, spred. This is the standard error of the prediction.4,5

RegEq5

Now, in general, where we have a measurement or prediction that has an uncertainty that can be characterised by a standard error u, there is an old trick for putting an interval round it. Remember that u is a measure of the variation in z. We can therefore put an interval around z as a number of standard errors, z±ku. Here, k is a constant of your choice. A prediction interval for the regression that generates prediction Ypred then becomes:

RegEq7

Choosing k=3 is very popular, conservative and robust.6,7 Other choices of k are available on the advice of a specialist mathematician.

It was Shewhart himself who took this all a bit further and defined tolerance intervals which contain a given proportion of future observations with a given probability.8 They are very much for the specialist.

Source of Variation (1) – Special causes

But all that assumes that we are sampling from “an environment that is sufficiently regular to be predictable”, that the residual variation is solely common cause. We checked that out on our original training data but the price of predictability is eternal vigilance. It can never be taken for granted. At any time fresh causes of variation may infiltrate the environment, or become newly salient because of some sensitising event or exotic interaction.

The real trouble with this world of ours is not that it is an unreasonable world, nor even that it is a reasonable one. The commonest kind of trouble is that it is nearly reasonable, but not quite. Life is not an illogicality; yet it is a trap for logicians. It looks just a little more mathematical and regular than it is; its exactitude is obvious, but its inexactitude is hidden; its wildness lies in wait.

G K Chesterton

The remedy for this risk is to continue plotting the residuals, the differences between the observed value and, now, the prediction. This is mandatory.

RegressionPBC2

Whenever we observe a signal of a potential special cause it puts us on notice to protect the forecast-user because our ability to predict the future has been exposed as deficient and fallible. But it also presents an opportunity. With timely investigation, a signal of a possible special cause may provide deeper insight into the variation of the cause-system. That in itself may lead to identifying further factors to build into the regression and a consequential reduction in s2.

It is reducing s2, by progressively accumulating understanding of the cause-system and developing the model, that leads to more precise, and more reliable, predictions.

Notes

  1. Kahneman, D (2011) Thinking, Fast and Slow, Allen Lane, p240
  2. Hahn, G J & Meeker, W Q (1991) Statistical Intervals: A Guide for Practitioners, Wiley, p31
  3. In fact s2/SXX is the sampling variance of the slope. The standard error of the slope is, notoriously, s/√SXX. A useful result sometimes. It is then obvious from the figure how variation is slope is amplified as we travel father from the centre of the Xs.
  4. Draper, N R & Smith, H (1998) Applied Regression Analysis, 3rd ed., Wiley, pp81-83
  5. Hahn & Meeker (1991) p232
  6. Wheeler, D J (2000) Normality and the Process Behaviour Chart, SPC Press, Chapter 6
  7. Vysochanskij, D F & Petunin, Y I (1980) “Justification of the 3σ rule for unimodal distributions”, Theory of Probability and Mathematical Statistics 21: 25–36
  8. Hahn & Meeker (1991) p231

Regression done right: Part 1: Can I predict the future?

I recently saw an article in the Harvard Business Review called “Refresher on Regression Analysis”. I thought it was horrible so I wanted to set the record straight.

Linear regression from the viewpoint of machine learning

Linear regression is important, not only because it is a useful tool in itself, but because it is (almost) the simplest statistical model. The issues that arise in a relatively straightforward form are issues that beset the whole of statistical modelling and predictive analytics. Anyone who understands linear regression properly is able to ask probing questions about more complicated models. The complex internal algorithms of Kalman filters, ARIMA processes and artificial neural networks are accessible only to the specialist mathematician. However, each has several general features in common with simple linear regression. A thorough understanding of linear regression enables a due diligence of the claims made by the machine learning advocate. Linear regression is the paradigmatic exemplar of machine learning.

There are two principal questions that I want to talk about that are the big takeaways of linear regression. They are always the first two questions to ask in looking at any statistical modelling or machine learning scenario.

  1. What predictions can I make (if any)?
  2. Is it worth the trouble?

I am going to start looking at (1) in this blog and complete it in a future Part 2. I will then look at (2) in a further Part 3.

Variation, variation, variation

Variation is a major problem for business, the tendency of key measures to fluctuate irregularly. Variation leads to uncertainty. Will the next report be high or low? Or in the middle? Because of the uncertainty we have to allow safety margins or swallow some non-conformancies. We have good days and bad days, good products and not so good. We have to carry costly working capital because of variation in cash flow. And so on.

We learned in our high school statistics class to characterise variation in a key process measure, call it the Big Y, by an histogram of observations. Perhaps we are bothered by the fluctuating level of monthly sales.

RegressionHistogram

The variation arises from a whole ecology of competing and interacting effects and factors that we call the cause-system of the outcome. In general, it is very difficult to single out individual factors as having been the cause of a particular observation, so entangled are they. It is still useful to capture them for reference on a cause and effect diagram.

RegressionIshikawa

One of the strengths of the cause and effect diagram is that it may prompt the thought that one of the factors is particularly important, call it Big X, perhaps it is “hours of TV advertising” (my age is showing). Motivated by that we can generate a sample of corresponding measurements data of both the Y and X and plot them on a scatter plot.

RegressionScatter1

Well what else is there to say? The scatter plot shows us all the information in the sample. Scatter plots are an important part of what statistician John Tukey called Exploratory Data Analysis (EDA). We have some hunches and ideas, or perhaps hardly any idea at all, and we attack the problem by plotting the data in any way we can think of. So much easier now than when W Edwards Deming wrote:1

[Statistical practice] means tedious work, such as studying the data in various forms, making tables and charts and re-making them, trying to use and preserve the evidence in the results and to be clear enough to the reader: to endure disappointment and discouragement.

Or as Chicago economist Ronald Coase put it.

If you torture the data enough, nature will always confess.

The scatter plot is a fearsome instrument of data torture. It tells me everything. It might even tempt me to think that I have a basis on which to make predictions.

Prediction

In machine learning terms, we can think of the sample used for the scatter plot as a training set of data. It can be used to set up, “train”, a numerical model that we will then fix and use to predict future outcomes. The scatter plot strongly suggests that if we know a future X alone we can have a go at predicting the corresponding future Y. To see that more clearly we can draw a straight line by hand on the scatter plot, just as we did in high school before anybody suggested anything more sophisticated.

RegressionScatter2

Given any particular X we can read off the corresponding Y.

RegressionScatter3

The immediate insight that comes from drawing in the line is that not all the observations lie on the line. There is variation about the line so that there is actually a range of values of Y that seem plausible and consistent for any specified X. More on that in Parts 2 and 3.

In understanding machine learning it makes sense to start by thinking about human learning. Psychologists Gary Klein and Daniel Kahneman investigated how firefighters were able to perform so successfully in assessing a fire scene and making rapid, safety critical decisions. Lives of the public and of other firefighters were at stake. This is the sort of human learning situation that machines, or rather their expert engineers, aspire to emulate. Together, Klein and Kahneman set out to describe how the brain could build up reliable memories that would be activated in the future, even in the agony of the moment. They came to the conclusion that there are two fundamental conditions for a human to acquire a skill.2

  • An environment that is sufficiently regular to be predictable.
  • An opportunity to learn these regularities through prolonged practice

The first bullet point is pretty much the most important idea in the whole of statistics. Before we can make any prediction from the regression, we have to be confident that the data has been sampled from “an environment that is sufficiently regular to be predictable”. The regression “learns” from those regularities, where they exist. The “learning” turns out to be the rather prosaic mechanics of matrix algebra as set out in all the standard texts.3 But that, after all, is what all machine “learning” is really about.

Statisticians capture the psychologists’ “sufficiently regular” through the mathematical concept of exchangeability. If a process is exchangeable then we can assume that the distribution of events in the future will be like the past. We can project our historic histogram forward. With regression we can do better than that.

Residuals analysis

Formally, the linear regression calculations calculate the characteristics of the model:

Y = mX + c + “stuff”

The “mX+c” bit is the familiar high school mathematics equation for a straight line. The “stuff” is variation about the straight line. What the linear regression mathematics does is (objectively) to calculate the m and c and then also tell us something about the “stuff”. It splits the variation in Y into two components:

  • What can be explained by the variation in X; and
  • The, as yet unexplained, variation in the “stuff”.

The first thing to learn about regression is that it is the “stuff” that is the interesting bit. In 1849 British astronomer Sir John Herschel observed that:

Almost all the greatest discoveries in astronomy have resulted from the consideration of what we have elsewhere termed RESIDUAL PHENOMENA, of a quantitative or numerical kind, that is to say, of such portions of the numerical or quantitative results of observation as remain outstanding and unaccounted for after subducting and allowing for all that would result from the strict application of known principles.

The straight line represents what we guessed about the causes of variation in Y and which the scatter plot confirmed. The “stuff” represents the causes of variation that we failed to identify and that continue to limit our ability to predict and manage. We call the predicted Ys that correspond to the measured Xs, and lie on the fitted straight line, the fits.

fiti = mXic

The residual values, or residuals, are obtained by subtracting the fits from the respective observed Y values. The residuals represent the “stuff”. Statistical software does this for us routinely. If yours doesn’t then bin it.

residuali = Yi – fiti

RegressionScatter4

There are a number of properties that the residuals need to satisfy for the regression to work. Investigating those properties is called residuals analysis.4 As far as use for prediction in concerned, it is sufficient that the “stuff”, the variation about the straight line, be exchangeable.5 That means that the “stuff” so far must appear from the data to be exchangeable and further that we have a rational belief that such a cause system will continue unchanged into the future. Shewhart charts are the best heuristics for checking the requirement for exchangeability, certainly as far as the historical data is concerned. Our first and, be under no illusion, mandatory check on the ability of the linear regression, or any statistical model, to make predictions is to plot the residuals against time on a Shewhart chart.

RegressionPBC

If there are any signals of special causes then the model cannot be used for prediction. It just can’t. For prediction we need residuals that are all noise and no signal. However, like all signals of special causes, such will provide an opportunity to explore and understand more about the cause system. The signal that prevents us from using this regression for prediction may be the very thing that enables an investigation leading to a superior model, able to predict more exactly than we ever hoped the failed model could. And even if there is sufficient evidence of exchangeability from the training data, we still need to continue vigilance and scrutiny of all future residuals to look out for any novel signals of special causes. Special causes that arise post-training provide fresh information about the cause system while at the same time compromising the reliability of the predictions.

Thorough regression diagnostics will also be able to identify issues such as serial correlation, lack of fit, leverage and heteroscedasticity. It is essential to regression and its ommision is intolerable. Residuals analysis is one of Stephen Stigler’s Seven Pillars of Statistical Wisdom.6 As Tukey said:

The greatest value of a picture is when it forces us to notice what we never expected to see.

To come:

Part 2: Is my regression significant? … is a dumb question.
Part 3: Quantifying predictions with statistical intervals.

References

  1. Deming, W E (‎1975) “On probability as a basis for action”, The American Statistician 29(4) pp146-152
  2. Kahneman, D (2011) Thinking, Fast and Slow, Allen Lane, p240
  3. Draper, N R & Smith, H (1998) Applied Regression Analysis, 3rd ed., Wiley, p44
  4. Draper & Smith (1998) Chs 2, 8
  5. I have to admit that weaker conditions may be adequate in some cases but these are far beyond any other than a specialist mathematician.
  6. Stigler, S M (2016) The Seven Pillars of Statistical Wisdom, Harvard University Press, Chapter 7

UK railway suicides – 2015 update

The latest UK rail safety statistics were published in September 2015 absent the usual press fanfare. Regular readers of this blog will know that I have followed the suicide data series, and the press response, closely in 2014, 2013 and 2012.

This year I am conscious that one of those units is not a mere statistic but a dear colleague, Nigel Clements. It was poet W B Yeats who observed, in his valedictory verse Under Ben Bulben that “Measurement began our might.” He ends the poem by inviting us to “Cast a cold eye/ On life, on death.” Sometimes, with statistics, we cast the cold eye but the personal reminds us that it must never be an academic exercise.

Nigel’s death gives me an additional reason for following this series. I originally latched onto it because I felt that exaggerated claims  as to trends were being made. It struck me as a closely bounded problem that should be susceptible to taught measurement. And it was something important.  Again I have re-plotted the data myself on a Shewhart chart.

RailwaySuicides4

Readers should note the following about the chart.

  • Some of the numbers for earlier years have been updated by the statistical authority.
  • I have recalculated natural process limits as there are still no more than 20 annual observations.
  • The signal noted last year has persisted (in red) with two consecutive observations above the upper natural process limit. There are also now eight points below the centre line at the beginning of the series.

As my colleague Terry Weight always taught me, a signal gives us license to interpret the ups and downs on the chart. This increasingly looks like a gradual upward trend.

Though there was this year little coverage in the press, I did find this article in The Guardian newspaper. I had previously wondered whether the railway data simply reflected an increasing trend in UK suicide in general. The Guardian report is eager to emphasise:

The total number [of suicides] in the UK has risen in recent years, with the latest Office for National Statistics figures showing 6,233 suicides registered in the UK in 2013, a 4% increase on the previous year.

Well, #executivetimeseries! I have low expectations of press data journalism so I do not know why I am disappointed. In any event I decided to plot the data. There were a few problems. The railway data is not collected by calendar year so the latest observation is 2014/15. I have not managed to identify which months are included though, while I was hunting I found out that the railway data does not include London Underground. I can find no railway data before 2001/02. The national suicide data is collected by calendar year and the last year published is 2013. I have done my best by (not quite) arbitrarily identifying 2013/14 in the railway data with 2013 nationally. I also tried the obvious shift by one year and it did not change the picture.

RailwaySuicides5

I have added a LOWESS line (with smoothing parameter 0.4) to the national data the better to pick out the minimum around 2007, just before the start of the financial crisis. That is where the steady decline over the previous quarter century reverses. It is in itself an arresting statistic. But I don’t see the national trend mirrored in the railway data, thereby explaining that trend.

Previously I noted proposals to repeat a strategy from Japan of bathing railway platforms with blue light. Professor Michiko Ueda of Syracuse University was kind enough to send me details of the research. The conclusions were encouraging but tentative and, unfortunately, the Japanese rail companies have not made any fresh data available for analysis since 2010. In the UK, I understand that such lights were installed at Gatwick in summer 2014 but I have not seen any data.

A huge amount of sincere endeavour has gone into this issue but further efforts have to be against the background that there is an escalating and unexplained problem.

Things and actions are what they are and the consequences of them will be what they will be: why then should we desire to be deceived?

Joseph Butler

How to predict floods

File:Llanrwst Floods 2015 1.ogvI started my grown-up working life on a project seeking to predict extreme ocean currents off the north west coast of the UK. As a result I follow environmental disasters very closely. I fear that it’s natural that incidents in my own country have particular salience. I don’t want to minimise disasters elsewhere in the world when I talk about recent flooding in the north of England. It’s just that they are close enough to home for me to get a better understanding of the essential features.

The causes of the flooding are multi-factorial and many of the factors are well beyond my expertise. However, The Times (London) reported on 28 December 2015 that “Some scientists say that [the UK Environment Agency] has been repeatedly caught out by the recent heavy rainfall because it sets too much store by predictions based on historical records” (p7). Setting store by predictions based on historical records is very much where my hands-on experience of statistics began.

The starting point of prediction is extreme value theory, developed by Sir Ronald Fisher and L H C Tippett in the 1920s. Extreme value analysis (EVA) aims to put probabilistic bounds on events outside the existing experience base by predicating that such events follow a special form of probability distribution. Historical data can be used to fit such a distribution using the usual statistical estimation methods. Prediction is then based on a double extrapolation: firstly in the exact form of the tail of the extreme value distribution and secondly from the past data to future safety. As the old saying goes, “Interpolation is (almost) always safe. Extrapolation is always dangerous.”

EVA rests on some non-trivial assumptions about the process under scrutiny. No statistical method yields more than was input in the first place. If we are being allowed to extrapolate beyond the experience base then there are inevitably some assumptions. Where the real world process doesn’t follow those assumptions the extrapolation is compromised. To some extent there is no cure for this other than to come to a rational decision about the sensitivity of the analysis to the assumptions and to apply a substantial safety factor to the physical engineering solutions.

One of those assumptions also plays to the dimension of extrapolation from past to future. Statisticians often demand that the data be independent and identically distributed. However, that is a weird thing to demand of data. Real world data is hardly ever independent as every successive observation provides more information about the distribution and alters the probability of future observations. We need a better idea to capture process stability.

Historical data can only be projected into the future if it comes from a process that is “sufficiently regular to be predictable”. That regularity is effectively characterised by the property of exchangeability. Deciding whether data is exchangeable demands, not only statistical evidence of its past regularity, but also domain knowledge of the physical process that it measures. The exchangeability must continue into the predicable future if historical data is to provide any guide. In the matter of flooding, knowledge of hydrology, climatology, planning and engineering, law, in addition to local knowledge about economics and infrastructure changes already in development, is essential. Exchangeability is always a judgment. And a critical one.

Predicting extreme floods is a complex business and I send my good wishes to all involved. It is an example of something that is essentially a team enterprise as it demands the co-operative inputs of diverse sets of experience and skills.

In many ways this is an exemplary model of how to act on data. There is no mechanistic process of inference that stands outside a substantial knowledge of what is being measured. The secret of data analysis, which often hinges on judgments about exchangeability, is to visualize the data in a compelling and transparent way so that it can be subjected to collaborative criticism by a diverse team.