Just says, “in mice”; just says, “in boys”

If anybody doubts that twitter has a valuable role in the world they should turn their attention to the twitter sensation that is @justsaysinmice.

The twitter feed exposes bad science journalism where extravagant claims are advanced with a penumbra of implication that something relevant to human life or happiness has been certified by peer reviewed science. It often turns out that, when the original research in interrogated, and in fairness at the very bottom of the journalistic puff piece, it just says, “in mice”. Cauliflower, cabbage, broccoli harbour prostate cancer inhibiting compound, was a recent subeditor’s attention grabbing headline. But the body of the article just says, “in mice”. Most days the author finds at least one item to tweet.

Population – Frame – Sample

The big point here is one of the really big points in understanding statistics.
We start generating data and doing statistics because there is something out there we are interested in. Some things or events. We call the things and events we are bothered about the population. The problem is that, in the real world, it is often difficult to get hold of all those things or events. In an opinion poll, we don’t know who will vote at the next election, or even who will still be alive. We don’t know all the people who follow a particular sports club. We can’t find everyone who’s ever tasted Marmite and expressed an opinion. Sometimes the events or things we are interested in don’t even exist yet and lie wholly in the future. That’s called prediction and forecasting.

In order to do the sort of statistical sampling that text books tell us about, we need to identify some relevant material that is available to us to measure or interrogate. For the opinion poll it would be everyone on the electoral register, perhaps. Or everyone who can be reached by dialing random numbers in the region of interest. Or everyone who signs up to an online database (seriously). Those won’t be the exact people who will be doing the voting at the next election. Some of them likely will be. But we have to make a judgment that they are, somehow, representative.

Similarly, if we want to survey sports club supporters we could use the club’s supporter database. Or the people who but tickets online. Or who tweet. Not perfect but, hey! And, perhaps, in some way representative.

The collection of things we are going to do the sampling on is called the sampling frame. We don’t need to look at the whole of the frame. We can sample. And statistical theory assures us about how much the sample can tell us about the frame, usually quite a lot if done properly. But as to the differences between population and frame, that is another question.

Enumerative and analytic statistics

These real world situations lie in contrast to the sort of simplified situations found in statistics text books. A inspector randomly samples 5 widgets from a batch of 100 and decides whether to accept or reject the batch (though why anyone would do this still defies rational explanation). Here the frame and population are identical. No need to worry.

W Edwards Deming was a statistician who, among his other achievements, developed the sampling techniques used in the 1940 US census. Deming thought deeply about sampling and continually emphasised the distinction between the sort of problems where population and frame were identical, what he called enumerative statistics, and the sundry real world situations where they were not, analytic statistics.1

The key to Deming’s thinking is that, where we are doing analytic statistics, we are not trying to learn about the frame, that is not what interests us, we are trying to learn something useful about the population of concern. That means that we have to use the frame data to learn about the cause system that is common to frame and population. By cause system, Deming meant the aggregate of competing, interacting and evolving factors, inherent and environmental, that influence the outcomes both in frame and population. As Donald Rumsfeld put it, the known knowns, the known unknowns and the unknown unknowns.

The task of understanding how any particular frame and population depend on a common cause-system requires deep subject matter knowledge. As does knowing the scope for reading across conclusions.

Just says, “in mice”

Experimenting on people is not straightforward. That’s why we do experiments on mice.

But here the frame and population are wildly disjoint.
Mice frameSo why? Well apparently, their genetic, biological and behavior characteristics closely resemble those of humans, and many symptoms of human conditions can be replicated in mice.2 That is, their cause systems have something in common. Not everything but things useful to researchers and subject matter experts.

Mice cause

Now, that means that experimental results in mice can’t just be read across as though we had done the experiment on humans. But they help subject matter experts learn more about those parts of the cause-system that are common. That might then lead to tentative theories about human welfare that can then be tested in the inevitably more ethically stringent regime of human trials.

So, not only is bad, often sensationalist, data journalism exposed, but we learn a little more about how science is done.

Just says, “in boys”

If the importance of this point needed emphasising then Caroline Criado Perez makes the case compellingly in her recent book Invisible Women.3

It turns out, that much medical research, much development of treatments and even assessment of motor vehicle safety have historically been performed on frames dominated by men, but with results then read across as though representative of men and women. Perez goes on to show how this has made women’s lives less safe and less healthy than they need have been.

It seems that it is not only journalists who are addicted to bad science.

Anyone doing statistics needs aggressively to scrutinise their sampling frame and how it matches the population of interest. Contrasts in respective cause systems need to be interrogated and distinguished with domain knowledge, background information and contextual data. Involvement in statistics carries responsibilities.


  1. Deming, W E (1975) “On probability as a basis for action”, American Statistician29 146
  2. Melina, R (2010) “Why Do Medical Researchers Use Mice?“, Live Science, retrieved 18:32 UCT 2/6/19
  3. Perez, C C (2019) Invisible Women: Exposing Data Bias in a World Designed for Men, Chatto & Windus

The risks of lead in the environment – social choice and individual values


Almost one in five deaths in the US can be linked to lead pollution, with even low levels of exposure potentially fatal, researchers have said.

That, in any event, was the headline in the Times (London) (£paywall) last week.

Gas pump lead warning

Historical environmental lead

The item turned out to be based on academic research by Professor Bruce Lanphear of Simon Fraser University, and others. You can find their published paper here in The Lancet: Public Health.1 It is publicly available at no charge, a practice very much to be encouraged. You know that I bristle at publicly funded research not being made available to the public.

As it was, no specific thing in either news report or the academic research struck me as wholly wrong. However, it made me wonder about the implied message of the news item and broader issues about communicating risk. I have some criticisms of the academic work, or at least how it is presented, but I will come to those below. I don’t have major doubts about the conclusions.

The pot odds of a jaywalker

Lanphear’s  principal result concerned hazard rates so it is worth talking a little about what they are. Suppose I stand still in the middle of the carriageway at Hyde Park Corner (London) or Time Square (New York) or … . Suppose the pedestrian lights are showing “Don’t walk”. The probability that I get hit by a motor car is fairly high. A good 70 to 80% in my judgment, if I stand there long enough.

Now, suppose I sprint across under the same conditions. My chances of emerging unscathed still aren’t great but I think they are better. A big difference is what engineers call the Time at Risk (TAR). In general, the longer I expose myself to a hazardous situation, the greater the probability that I encounter my nemesis.

Now, there might be other differences between the risks in the two situations. A moving target might be harder to hit or less easy to avoid. However, it feels difficult to make a fair comparison of the risk because of the different TARs. Hazard rates provide a common basis for comparing what actuaries call the force of mortality without the confounding effect of exposure time. Hazard rates, effectively, offer a probability per unit time. They are measured in units like “percent per hour”. The math is actually quite complicated but hazard rates translate into probabilities when you multiply them by TAR. Roughly.

I was recently reading of the British Army’s mission to Helmand Province in Afghanistan.2 In Operation Oqab Tsuka, military planners had to analyse the ground transport of a turbine to an hydroelectric plant. Terrain made the transport painfully slow along a route beset with insurgents and hostile militias. The highway had been seeded with IEDs (“Improvised Explosive Devices”) which slowed progress still more. The analysis predicted in the region of 50 British service deaths to get the turbine to its destination. The extended time to traverse the route escalated the TAR and hence the hazard, literally the force of mortality. That analysis led to a different transport route being explored and adopted.

So hazard rates provide a baseline of risk disregarding exposure time.

Lanphear’s results

Lanphear was working with a well established sampling frame of 18,825 adults in the USA whose lead levels had been measured some time in 1988 to 1994 when they were recruited to the panel. The cohort had been followed up in a longitudinal study so that data was to hand as to their subsequent morbidity and mortality.

What Lanphear actually looked at was a ratio of hazard rates. For the avoidance of doubt, the hazard that he was looking at was death from heart disease. There was already evidence of a link with lead exposure. He looked at, among other things, how much the hazard rate changed between the cohort members with the lowest measured blood-lead levels and with the highest. That is, as measured back in the period 1988 to 1994. He found, this is his headline result, that an increase in historical blood-lead from 1.0 μg/dL (microgram per decilitre) to 6.7 μg/dL was associated with an estimated 37% increase in hazard rate for heart disease.

Moreover, 1.0 and 6.7 μg/dL represented the lower and upper limits of the middle 80% of the sample. These were not wildly atypical levels. So in going from the blood-lead level that marks the 10% least exposed to the level of the 10% most exposed we get a 37% increase in instantaneous risk from heart disease.

Now there are a few things to note. Firstly, it is fairly obvious that historical lead in blood would be associated with other things that influence the onset of heart disease, location in an industrial zone, income, exercise regime etc. Lanphear took those into account, as far as is possible, in his statistical modelling. These are the known unknowns. It is also obvious that some things have an impact on heart disease that we don’t know about yet or which are simply too difficult, or too costly or too unethical, to measure. These are the unknown unknowns. Variation in these factors causes variation in morbidity and mortality. But we can’t assign the variation to an individual cause. Further, that variation causes uncertainty in all the estimates. It’s not exactly 37%. However, bearing all that in mind rather tentatively, this is all we have got.

Despite those other sources of variation, I happen to know my personal baseline risk of suffering cardiovascular disease. As I explored here, it is 5% over 10 years. Well, that was 4 years ago so its 3% over the next 6. Now, I was brought up in the industrial West Midlands of the UK, Rowley Regis to be exact, in the 1960s. Our nineteenth-century-built house had water supplied through lead pipes and there was no diligent running-off of drinking water before use. Who knew? Our house was beside a busy highway.3 I would guess that, on any determination of historical exposure to environmental lead, I would rate in the top 10%.

That gives me a personal probability over the next 6 years of 1.37 × 3% = 4%. Or so. Am I bothered?

Well, no. Neither should you be.

But …

Social Choice and Individual Values

That was the title of a seminal 1951 book by Nobel laureate economist Kenneth Arrow.4 Arrow applied his mind to the question of how society as a whole should respond when individuals in the society had differing views as to the right and the good, or even the true and the just.

The distinction between individual choice and social policy lies, I think, at the heart of the confusion of tone of the Times piece. The marginal risk to an individual, myself in particular, from historical lead is de minimis. I have taken a liberty in multiplying my hazard rate for morbidity by a hazard ratio for mortality but I think you get my point. There is no reason at all why I, or you, should be bothered in the slightest as to our personal health. Even with an egregious historical exposure. However, those minimal effects, aggregated across a national scale, add up to a real impact on the economy. Loss of productive hours, resources diverted to healthcare, developing professional expertise terminated early by disease. All these things have an impact on national wealth. A little elementary statistics, and a few not unreasonable assumptions, allows an estimate of the excess number of deaths that would not have occurred “but for” the environmental lead exposure. That number turns out to be 441,000 US deaths each year with an estimated annual impact on the economy of over $100 billion. If you are skeptical, perhaps it is one tenth of that.

Now, nobody is suggesting that environmental lead has precipitated some crisis in public health that ought to make us fear for our lives. That is where the Times article was badly framed. Lanphear and his colleagues are at pains to point out just how deaths from heart disease have declined over the past 50 years, how much healthier and long-lived we now are.

The analysis kicks in when policy makers come to consider choices between various taxation schemes, trade deals, international political actions, or infrastructure investment strategies. There, the impact of policy choices on environmental lead can be mapped directly into economic consequences. Here the figures matter a great deal. But to me? Not so much.

What is to be done?

How do we manage economy level policy when an individual might not perceive much of a stake? Arrow found that neither the ballot box nor markets offered a tremendously helpful solution. That leaves us with dependence on the bureaucratic professions, or the liberal elite as we are told we have to call them in these politically correct times. That in turn leads us back to Robert Michels’ Iron Law of Oligarchy. Historically, those elites have proved resistant to popular sentiments and democratic control. The modern solution is democratic governance. However, that is exactly what Michels viewed as doomed to fail. The account of the British Army in Afghanistan that I referred to above is a further anecdote of failure.5

But I am going to remain an optimist that bureaucrats can be controlled. Much of the difficulty arises from governance functions’ statistical naivety and lack of data smarts. Politicians aren’t usually the most data critical people around. The Times piece does not help. One of the things everyone can do is to be clearer that there are individual impacts and economy-wide impacts, and that they are different things. Just because you can discount a personal hazard does not mean there is not something that governments should be working to improve.

It’s not all about me.

Some remarks on the academic work

As I keep on saying, the most (sic) important part of any, at least conventional, regression modelling is residuals analysis and regression diagnostics.6 However, Lanphear and his colleagues were doing something a lot more complicated than the simple linear case. The were using proportional hazards modelling. Now, I know that there are really serious difficulties in residuals analysis for such models and in giving a neat summary figure of how much of the variation in the data is “explained” by the factors being investigated. However, there are diagnostic tools for proportional hazards and I would like to have seen something reported. Perhaps the analysis was done but my trenchant view is that it is vital that it is shared. For all the difficulties in this, progress will only be made by domain experts trying to develop practice collaboratively.

My mind is always haunted by the question Was the regression worth it? And please remember that p-values in no way answer that question.

References and notes

  1. Lanphear, BP (2018) Low-level lead exposure and mortality in US adults: a population-based cohort study, The Lancet: Public Health. Published online.
  2. Farrell, T (2017) Unwinnable: Britain’s War in Afghanistan 2001-2014, London: The Bodley Head, pp239-244
  3. During the industrial revolution, this had been the important Oldbury to Halesowen turnpike-road. Even in the 1960s it carried a lot of traffic. My Black Country grandfather always referred to it as the ‘oss road. a road so significant that one might find horses on it. Keep out o’ the ‘oss road, m’ mon. He knew about risk.
  4. Arrow, KJ [1951] (2012) Social Choice and Individual Values, Martino Fine Books
  5. Farrell Op. cit.
  6. Draper, NR & Smith, H (1998) Applied Regression Analysis, 3rd ed., New York:  Wiley, Chapters 2 and 8

UK Election of June 2017 – Polling review


Here are all the published opinion polls for the June 2017 UK general election, plotted as a Shewhart chart.

The Conservative lead over Labour had been pretty constant at 16% from February 2017, after May’s Lancaster House speech. The initial Natural Process Limits (“NPLs”) on the chart extend back to that date. Then something odd happened in the polls around Easter. There were several polls above the upper NPL. That does not seem to fit with any surrounding event. Article 50 had been declared two weeks before and had had no real immediate impact.

I suspect that the “fugue state” around Easter was reflected in the respective parties’ private polling. It is possible that public reaction to the election announcement somehow locked in the phenomenon for a short while.

Things then seem to settle down to the 16% lead level again. However, the local election results at the bottom of the range of polls ought to have sounded some alarm bells. Local election results are not a reliable predictor of general elections but this data should not have felt very comforting.

Then the slide in lead begins. But when exactly? A lot of commentators have assumed that it was the badly received Conservative Party manifesto that started the decline. It is not possible to be definitive from the chart but it is certainly arguable that it was the leak of the Labour Party manifesto that started to shift voting intention.

Then the swing from Conservative to Labour continued unabated to polling day.

Polling performance

How did the individual pollsters fair? I have, somewhat arbitrarily, summarised all polls conducted in the 10 days before the election (29 May to 7 June). Here is the plot along with the actual popular poll result which gave a 2.5% margin of Conservative over Labour. That is the number that everybody was trying to predict.


The red points are the surveys from the 5 days before the election (3 to 7 June). Visually, they seem to be no closer, in general, than the other points (6 to 10 days before). The vertical lines are just an aid for the eye in grouping the points. The absence of “closing in” is confirmed by looking at the mean squared error (MSE) (in %2) for the points over 10 days (31.1) and 5 days (34.8). There is no evidence of polls closing in on the final result. The overall Shewhart chart certainly doesn’t suggest that.

Taking the polls over the 10 day period, then, here is the performance of the pollsters in terms of MSE. Lower MSE is better.

Pollster MSE
Norstat 2.25
Survation 2.31
Kantar Public 6.25
Survey Monkey 8.25
YouGov 9.03
Opinium 16.50
Qriously 20.25
Ipsos MORI 20.50
Panelbase 30.25
ORB 42.25
ComRes 74.25
ICM 78.36
BMG 110.25

Norstat and Survation pollsters will have been enjoying bonuses on the morning after the election. There are a few other commendable performances.

YouGov model

I should also mention the YouGov model (the green line on the Shewhart chart) that has an MSE of 2.25. YouGov conduct web-based surveys against at huge data base or around 50,000 registered participants. They also collect, with permission, deep demographic data on those individuals concerning income, profession, education and other factors. There is enough published demographic data from the national census to judge whether that is a representative frame from which to sample.

YouGov did not poll and publish the raw, or even adjusted, voting intention. They used their poll to  construct a model, perhaps a logistic regression or an artificial neural network, they don’t say, to predict voting intention from demographic factors. They then input into that model, not their own demographic data but data from the national census. That then gave their published forecast. I have to say that this looks about the best possible method for eliminating sampling frame effects.

It remains to be seen how widely this approach is adopted next time.

Grenfell Tower – Elites on trial – Trust in bureaucracy revisited

Grenfell Tower fire (wider view).jpg

Grenfell Tower fire1

Nobody can react to the Grenfell Tower fire with anything other than horror, sadness, anger and resolve.

Much of that anger is, legitimately, directed at the elite professions who make the decisions on which individual safety turns. I am proud to have been a member of two elite professions during my lifetime: engineering and law. I wanted to say something about the nature of practice, of responsibility and of blame.

It is too early to be confident of causes, remedies or punishments. Those will have to await full investigation but professionals of all disciplines will need little encouragement to spend the coming weeks searching their own souls over their wider obligations to society. For, that is what membership of a profession entails.

The need for bureaucracy

“Bureaucracy” is a word most often used pejoratively, as a rebuke to a turgid rigidity that frustrates spontaneity, creativity, efficiency and expedition. It is that. But once society starts to enjoy systems of reasonable complexity, civil aviation, networked electricity supply, international transport of goods etc., much decision making is going to be reserved to a cadre of experts. Diane Vaughan’s analysis of the Space Shuttle Challenger disaster2 is a relevant and salutary account of engineering as a bureaucratic profession.

Of course, you can embed some of your bureaucracy in software but don’t expect that to improve spontaneity, creativity or flexibility. Efficiency and expedition, perhaps. Even putting a bunch of flowers on your dining table requires this.

I have often, on this blog, cited Robert Michels’ iron law of oligarchy. Michels contended that any team of bureaucrats soon realised the power they held in controlling the levers of policy. A willingness to pull those levers in the direction of their own self-interest, and a jealous protection of their professional status and expertise, soon followed. That sometimes put them at odds with the objectives they were supposed to be implementing on behalf of their principals. As one political scientist put it:3

Many governance dysfunctions arise because the agents have different agendas from the principals, and the problem of institutional design is related to incentivising the agents to do the principal’s bidding.

Max Weber had realised all this earlier in the nineteenth century. Weber was a child of that mother of all bureaucracies, the Prussian civil service. The self interest of managing elites bothered him and he sought to inculcate an ethic of responsibility whereby professionals thought hard about the wider consequences of their decisions.4 That is the ethic that modern professions seek to foster among their members. Engagement in a profession carries responsibilities.

Faith in regulation

So much marginally informed debate among journalists has been about “the building regulations”. I guess that they mean the Building Regulations 2010. You can examine the relevant parts here, for what they are worth. Scroll down to Part B. Of course the Regulations themselves are supported by the statutory guidance. Here that is. The guidance refers to a legion of British Standards. As I learned during my time in the railway industry, the scientific basis of the guidance is not always easy to trace. Expect to hear more of that as inquiries progress.

The fundamental truth of such regulations is that it is the building professionals themselves who write them. Who else? That does not mean that the professionals, even in conclave, are infallible. Nobel laureate psychologist Daniel Kahneman has written extensively about the bounded rationality that limits everybody’s individual, or group, vision beyond a limited range of experiences, values and prejudices. Experts are just as prone as anybody. You and me too.5 It is unlikely that software will do a better job. Expert systems will work, here I go again, in “an environment that is sufficiently regular to be predictable”. I heard Daniel Dennett speak in London recently. Software will provide us with tissues not colleagues.

All this feeds into the collateral phenomenon whereby businesses actively use their expert involvement in setting regulations as a strategy to capture market share, promote their own products and erect barriers against entry for would-be competitors. The extreme consequence here is regulator capture, where the regulator becomes so dependent upon the expertise of the regulated that she is glad to let them define the regulatory regime.

To some extent, the self validating nature of expertise is reinforced, at least in the UK, by the approach of the courts. In assessing the negligence of a professional an individual is judged against the standards of his profession. Only where no reasonable member of his profession would have acted as he did is he negligent.6, 7 But the courts have warned that, in some circumstances, they might call into question the standards of a whole profession if there were a failure of logic. That is something that the courts would never do lightly.8 It would be a spectacle indeed.

When it comes to professional responsibility, the courts refuse to be dazzled by statutory regulations or industry standards. Professionals are expected to exercise their judgment and not hide behind mere compliance.9 In 2003, giving judgment against a firm of architects for inadequate fire precautions in a food factory refurbishment, Judge Bowsher QC observed:10

I should add that I was not the slightest impressed by the submission that since the defendants had complied with their statutory requirements … they had fully performed their duties.

This is what a judge said in a different case concerning the safety of a flight of stairs.11

Looking at a photograph of the stairs, I myself would form the view that they are reasonably safe … But it is the fact that the stairs did not comply with the Building Regulations, or the relevant British Standard. That is evidence which we must certainly take into account. It represents the current professional opinion as to what is desirable in order that accidents should be avoided. But it is one thing to lay down regulations and standards, with that objective, and another to define what is reasonably safe in the circumstances of a particular case [emphasis added].

In any event, trying to manage a risk by statutory regulation is not so efficient a means as you might think. Regulations do not always ensure the best outcome for society.12

Trust in elites

That all leaves the elite professions with grave responsibilities. Let none of us deny that another salient feature of the professions is that they are businesses run to make a profit for the professionals. Members get the further reward of status in society. I know that we have all constructed narratives of our own expertise and that challenges, particularly from clients, are not always welcome. We think we know best and we don’t always want to waste the client’s time explaining what to us seems so obvious.

And when things go wrong, and they will, all that is thrown back at us. Quite justifiably. Trust in bureaucracy has been a recurrent theme on this blog. It is a complex matter. When it leads to herd immunity from disease it is good. When it leads to complicity in torture it is bad. The public trust we aspire to is not blind faith. It is collaboration. Blind faith leads to bad consequences, collaboration to an environment where professionals are able to explain and reassure. Reflecting on that, I think there are some things was all can do to improve that relationship of trust.

Listen The most useful person on a project is often the person who knows nothing about it. She can ask the dumb question. Physicists told Gulgielmo Marconi he would not be able to transmit a radio signal across the Atlantic. But he did, not because he knew something the physicists didn’t but because sometimes it takes an unashamed maverick to test an orthodoxy.13 There are sundry examples of rumours and folk tales that have sparked scientific curiosity and discovery. Sometimes data is the plural of anecdote. It’s not even all about testing scientific theories. People sometimes need confidence and reassurance in unfamiliar situations. They need to be told, in language they understand, what is happening and why you think this is a good idea. Their questions and reservations need to be taken seriously.

A signal is a signal One of the key skills for any professional is being able to distinguish signal from noise. Where there is a signal, a surprise, that suggests an established orthodoxy has stopped working then you must immediately take action to protect those at risk. The “regular environment” you relied on is blown. Don’t wait to see if it happens again. Don’t dismiss it as a “one off” or, heaven forbid, the most useless word in the English language, an “outlier”. It is the signals that contain all the information. Don’t relax when the signal isn’t repeated immediately. That is just regression to the mean. It’s what signals do. Something that you didn’t expect has happened. Pierce the veil of bounded rationality. Protect the client, investigate and look to update your practice.

Noise is noise The corollary to taking signals seriously is not mistaking noise for signal. When that happens we start looking for causes specific of an individual outcome when the true causes were generic to all outcomes. Professionals also need to know when they are embedded in a “stable system of trouble”. That brings its own challenges, not least of which is the cost and effort of perpetually protecting the client.

Humility Professionals don’t always get it right. There are individual errors. There are systemic failures of practice. If you have to start hiding behind the shield that you are beyond challenge and that dissenting views are outlawed then you are probably dismissing the best hope you have for avoiding problems.

Continual improvement We have to keep listening to counsels of despair from politicians about productivity. It is down to us. Continual improvement is not just for our individual domain expertise, it’s also about getting better at listening, distinguishing signal from noise and practising humility. It’s about getting better at improving too.

Professionals have bodies to kick and souls to damn

Over two hundred years ago, English judge Edward Thurlow famously observed that corporations have neither bodies to kick nor souls to damn. I am always baffled by calls, in cases like that of Grenfell Tower, for prosecutions for corporate manslaughter. The calls seem to reflect a mistaken sentiment that corporate manslaughter is some sort of aggravated form of manslaughter. This isn’t just manslaughter, it’s corporate manslaughter. But why would anybody want to relieve individuals of responsibility and impose it on a faceless abstraction?

Part of the deal when seeking certification as a professional is that you assume a responsibility to society. That’s where you get your status from. When you fail you will be held to account. There are always voices calling for an end to blame culture. But have no doubt, it is a professional’s duty to act within the standards she has adopted. If she falls below those standards then reparation is expected, to the extent that it remains possible. Anybody who causes death when they fall sufficiently far below standard can expect to be indicted for manslaughter and, on conviction, punished and shamed. There is a principle in criminal law called fair labelling. The name of a crime must reflect the offence. Manslaughter is a fair label in such cases.

There has been an increasing tendency in the UK for legislators to take power to order reparation away from the civil courts and to attempt to regulate with criminal sanctions. I am not persuaded that is always the right approach.

Trust in elites, bureaucrats, experts, call them what you will, is important. South African statesman Paul Kruger once remarked:

When I look at history I’m a pessimist. When I look at pre-history I’m an optimist.

If you live in the UK then, on any measure you can dream up, life is getting safer and better. That is the triumph of elite engineers, planners, security professionals, physicians … I could go on. If people at large lose faith in professionals then it will be to our common ruin. Only the professionals can work on building the trust we need. Politicians won’t do it.

What did you do today?


  1. Wikimedia Commons contributors, “File:Grenfell Tower fire (wider view).jpg,” Wikimedia Commons, the free media repository, https://commons.wikimedia.org/w/index.php title=File:Grenfell_Tower_fire_(wider_view).jpg&oldid=248417865 (accessed June 25, 2017)
  2. Vaughan, D (1996) The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA, University Of Chicago Press
  3. Fukuyama, F (2012) The Origins of Political Order: From Prehuman Times to the French Revolution, Profile Books, p207
  4. Kim, Sung Ho, “Max Weber”, The Stanford Encyclopedia of Philosophy (Fall 2012 Edition), Edward N. Zalta (ed.)
  5. Kahneman, D (2011) Thinking, Fast and Slow, Allen Lane, pp199-254
  6. Bolam v Friern Hospital Management Committee [1957] 1 WLR 582
  7. Pantelli Associates Ltd v Corporate City Developments Number Two Ltd [2010] EWHC 3189 (TCC)
  8. Bolitho v City and Hackney Health Authority [1996] 4 All ER 771
  9. Charlesworth & Percy on Negligence, 12th ed., 2010 and supplements, 7-46
  10. Sahib Foods Ltd & Ors v Paskin Kyriakides Sands (A Firm) [2003] EWHC 142 (TCC) at [43]
  11. Green v Building Scene Limited [1994] PIQR P259, CA at 269
  12. Coase, R H (1960) “The problem of social cost” Journal of Law and Economics 3, 1-44
  13. Raboy, M (2016) Marconi: The Man Who Networked the World, Oxford, p176

UK railway suicides – 2016 update

The latest UK rail safety statistics were published in September 2016, again absent much of the press press fanfare we had seen in the past. Apologies for the long delay but the day job has been busy. Regular readers of this blog will know that I have followed the suicide data series, and the press response, closely in 20152014, 2013 and 2012. Again, I “Cast a cold eye/ On life, on death.” Again I have re-plotted the data myself on a Shewhart chart.


Readers should note the following about the chart.

  • Many thanks to Tom Leveson Gower at the Office of Rail and Road who confirmed that the figures are for the year up to the end of March.
  • Some of the numbers for earlier years have been updated by the statistical authority.
  • I have recalculated natural process limits (NPLs) as there are still no more than 20 annual observations, and because the historical data has been updated. The NPLs have therefore changed in that the 2014 total is no longer above the upper NPL.
  • The observation above the upper NPL in 2015 has not persisted. The latest total is within the NPLs. We have to think about how to interpret this.

The current chart shows two signals, an observation above the upper NPL in 2015 and a run of 8 below the centre line from 2002 to 2009. As I always remark, the Terry Weight rule says that a signal gives us license to interpret the ups and downs on the chart. So I shall have a go at doing that. Last year I was coming to the conclusion that the data increasingly looked like a gradual upward trend. Has the 2016 data changed that?

The Samaritans posted on their website, “Rail suicides fall by 12%,” and went on to say:

Suicide prevention measures put in place as part of the partnership between Samaritans, Network Rail and the wider rail industry are saving more lives on the railways.

In fairness, the Samaritans qualified their headline with the following footnote.

We must be mindful that suicide data is best understood by looking at trends over longer periods of time, and year-on-year fluctuations may not be indicative of longer term trends. It is however very encouraging to see such a decrease which we would hope to see continuing in future years.

The Huffington Post, no, not sure I really think of them as part of the MSM, were less cautious in banking the 12% by stating, “It is the first time the number has dropped in three years.” True, but #executivetimeseries!

Signal or noise?

What shall we make of the decrease, a decrease to  “back within” the NPLs? First, the mere fact that there are fewer suicides is good news. That is a “better” outcome. The question still remains as to whether we are making progress in reducing the frequency of suicides. Has there been a change to the underlying cause system that drives the suicide numbers? We might just be observing noise unrelated to an underlying signal or trend. Remember that extremely high measurements are usually followed by lower ones because of the principle of regression to the mean.1 Such a decrease is no evidence of an underlying improvement but merely a deceptive characteristic of common cause variation.

One thing that I can do is to try to fit a trend line through the data and to ask which narrative best fits what I observe, a continuing increasing trend or a trend that has plateaued or even reversed. As you know, I am very critical of the uncritical casting of regression lines on data plots. However, this time I have a definite purpose in mind. Here is the data with a fitted linear regression line.


What I wanted to do was to split the data into two parts:

  • A trend (linear simply for the sake of exploratory data analysis (EDA); and
  • The residual variation about the trend.

The question I want to ask is whether the residual variation is stable, just plain noise, or whether there is a signal there that might give me a clue that a linear trend does not hold. The way that I do that is to plot the residuals on a Shewhart chart.


That shows a stable pattern of residuals. If I try to interpret the chart as a linear trend plus exchangeable noise then nothing in the data contradicts that. The original chart invites an interpretation, because of the signals. I adopt the interpretation of an increasing trend. Nothing in the data contradicts that. I can put the pictures together to show this model.


My opinion is that, when I plot the data that way, I have a compelling picture of a growing trend about which there is some stable common cause variation. Had there been an observation below the lower NPL on the last chart then that could have been evidence that the trend was slowing or even reversing. But not here.

I note that there’s also a report here from Anna Taylor and her colleagues at the University of Bristol. They too find an increasing trend with no signal of amelioration. They have used a different approach from mine and the fact that we have both got to the same broad result should reinforce confidence in out common conclusion.

Measurement Systems Analysis

Of course, we should not draw any conclusions from the data without thinking about the measurement system. In this case there is a legal issue. It concerns the standard of proof that the law requires coroners to apply before finding suicide as the cause of death. Findings of fact in inquests in England and Wales are generally made if they satisfy the civil standard of proof, the balance of probabilities. However, a finding of suicide can only be returned if such a conclusion satisfies the higher standard of beyond reasonable doubt, the typical criminal standard.2 There have long been suggestions that that leads to under reporting of suicides.3 The Matthew Elvidge Trust is currently campaigning for the general civil standard of balance of probabilities to be adopted.4

Next steps

Previously I noted proposals to repeat a strategy from Japan of bathing railway platforms with blue light. In the UK, I understand that such lights were installed at Gatwick in summer 2014 but I have not seen any data or heard anything more about it.

A huge amount of sincere endeavour has gone into this issue but further efforts have to be against the background that there is an escalating and unexplained problem.


  1. Kahneman, D (2011) Thinking, Fast and Slow, Allen Lane, pp175-184
  2. Jervis on Coroners 13th ed. 13-70
  3. Chambers, D R (1989) “The coroner, the inquest and the verdict of suicide”, Medicine, Science and the Law 29, 181
  4. Trust responds to Coroner’s Consultation“, Mathew Elvidge Trust, retrieved 4/1/17

Plan B, gut feel and Shewhart charts

Elizabeth Holmes 2014 (cropped).jpgI honestly had the idea for this blog and started drafting it six months ago when I first saw this, now quite infamous, quote being shared around the internet.

The minute you have a back-up plan, you’ve admitted you’re not going to succeed.

Elizabeth Holmes

Good advice? I think not! Let’s review some science.

Confidence and trustworthiness

As far back as the 1970s, psychologists carried out a series of experiments on individual confidence.1 They took a sample of people and set each of them a series of general knowledge questions. The participants were to work independently of each other. The questions were things like What is the capital city of France? The respondents had, not only to do their best to answer the question, but also then to state the probability that they had answered correctly.

As a headline to their results the researchers found that, of all those answers in the aggregate about which people said they were 100% sure that they had answered correctly, more than 20% were answered incorrectly.

Now, we know that people who go around assigning 100% probabilities to things that happen only 80% of the time are setting themselves up for inevitable financial loss.2 Yet, this sort of over confidence in the quality and reliability of our individual, internal cognitive processes has been identified and repeated over multiple experiments and sundry real life situations.

There is even a theory that the only people whose probabilities are reliably calibrated against frequencies are those suffering from clinically diagnosed depression. The theory of depressive realism remains, however, controversial.

Psychologists like Daniel Kahneman have emphasised that human reasoning is limited by a bounded rationality. All our cognitive processes are built on individual experience, knowledge, cultural assumptions, habits for interpreting data (good, bad and indifferent) … everything. All those things are aggregated imperfectly, incompletely and partially. Nobody can can take the quality of their own judgments for granted.

Kahneman points out that, in particular, wherever individuals engage sophisticated techniques of analysis and rationalisation, and especially those tools that require long experience, education and training to acquire, there is over confidence in outcomes.3 Kahneman calls this the illusion of validity. The more thoroughly we construct an internally consistent narrative for ourselves, the more we are seduced by it. And it is instinctive for humans to seek such cogent models for experience and aspiration. Kahneman says:4

Confidence is a feeling, which reflects the coherence of the information and the cognitive ease of processing it. It is wise to take admissions of uncertainty seriously, but declarations of high confidence mainly tell you that an individual has constructed a coherent story in their mind, not necessarily that the story is true.

If illusion is the spectre of confidence then having a Plan B seems like a good idea. Of course, Holmes is correct that having a Plan B will tempt you to use it. When disappointments accumulate, in escalating costs, stagnating revenues or emerging political risks, it is very tempting to seek the repose of a lesser ambition or even a managed mitigation of residual losses.

But to proscribe a Plan B in order to motivate success is to display the risk appetite of a Kamikaze pilot. Sometimes reality tells you that your business plan is predicated on a false prospectus. Given the science of over confidence and the narrative of bounded rationality, we know that it will happen a lot of the time.

GenericPBCHolmes is also correct that disappointment is, in itself, no reason to change plan. What she neglects is that there is a phenomenon that does legitimately invite change: a surprise. It is a surprise that alerts us to an inconsistency between the real world and our design. A surprise ought to make us go back to our working business plan and examine the assumptions against the real world data. A switch to Plan B is not inevitable. There may be other means of mitigation: Act, Adapt or Abandon. The surprise could even be an opportunity to be grasped. The Plan B doesn’t have to be negative.

How then are we to tell a surprise from a disappointment? With a Shewhart chart of course. The chart has the benefits that:

  • Narrative building is shared not personal.
  • Narratives are challenged with data and context.
  • Surprise and disappointment are distinguished.
  • Predictive power is tested.

Analysis versus “gut feel”

I suppose that what lies behind Holmes’ quote is the theory that commitment and belief can, in themselves, overcome opposing forces, and that a commitment borne of emotion and instinctive confidence is all the more potent. Here is an old Linkedin post that caught my eye a while ago celebrating the virtues of “gut feel”.

The author believed that gut feel came from experience and individuals of long exposure to a complex world should be able to trump data with their intuition. Intuition forms part of what Kahneman called System 1 thinking which he contrasted with the System 2 thinking that we engage in when we perform careful and lengthy data analysis (we hope).5 System 1 thinking can be valuable. Philip Tetlock, a psychologist who researched the science of forecasting, noted this.6

Whether intuition generates delusion or insight depends on whether you work in a world full of valid cues you can unconsciously register for future use.

In fact, whether the world is full of the sorts of valid clues that support useful predictions is exactly the question that Shewhart charts are designed to answer. Whether we make decisions on data or on gut feel, either can mislead us with the illusion of validity.

Again, what the chart supports is the continual testing of the reliability and utility of intuitions. Gut feel is not forbidden but be sure that the successive predictions and revisions will be recorded and subjected to the scrutiny of the Shewhart chart. Impressive records of forecasting will form the armature of a continually developing shared narrative of organisational excellence. Unimpressive forecasters will have to yield ground.


  1. Lichtenstein, S et al. (1982) “Calibration of probabilities: The state of the art to 1980” in Kahneman, D et al. Judgment Under Uncertainty: Heuristics and Biases, Cambridge University Press
  2. De Finetti, B (1974) Theory of Probability: A Critical Introductory Treatment, Vol.1, trans. Machi, A & Smith, A; Wiley, p113
  3. Kahneman, D (2011) Thinking, Fast and Slow, Allen Lane, p217
  4. p212
  5. pp19-24
  6. Tetlock, P (2015) Superforecasting: The Art and Science of Prediction, Crown Publishing, Kindle loc 1031

Productivity and how to improve it: II – Profit = Customer value – Cost

I said I would be posting on this topic way back here. Perhaps that says something about my personal productivity but I have been being productive on other things. I have a day job.

I wanted to start off addressing customer value and waste. Here are a couple of revealing stories from the press.

Blue dollars and green dollars


This story appeared on the BBC website about a pizza restaurant transferring the task of slicing lemons from the waiters to the kitchen staff. As you know I am rarely impressed by standards of data journalism at the state owned BBC. This item makes one of the gravest errors of attempted business improvement. It had been the practice that waiters, as their first job in the morning, would chop lemons for the day’s anticipated drinks orders. A pizza chef commented that chopping was one of the chefs’ trade skills. Lemon chopping should be transferred to the chefs. That would, purportedly, save the waiters from having to “take a break from their usual tasks, wash their hands, clear a space and then clean up after themselves.” The item goes on:

“Just by changing who chops the lemons, we were able to make a significant saving in hours which translates into a significant financial saving,” says Richard Hodgson, Pizza Express’ chief executive.

This looks, to the uncritical eye, like a saving. But it is a saving in what we call blue dollars (or pounds or euros). It appears in blue ink on an executive summary or monthly report. Did Pizza Express actually save any cash, as we call it green dollars (or …)? Did the initiative put a ding in the profit and loss account?

Perhaps it did but perhaps not. It is, actually, very easy to eliminate, or perhaps hide or redeploy, tasks or purchases and claim a saving in blue dollars. Demonstrating that this then mapped into a saving in green dollars requires committed analytics and the trenchant criticism of historical data. The blue dollars will turn into green dollars if Pizza Express can achieve a time saving that allows:

  • A reduction in payroll; or
  • Redeployment of time into an activity that creates greater value for the customer.

That is assuming that the initiative did result in a time saving. What it certainly lost was a team building opportunity between waiters and chefs and a signal for waiters to wash their hands.

The jury is out as to whether Pizza Express improved productivity. Translation of blue dollars into green dollars is not easy. It is certainly not automatic. Turning blue dollars into green dollars is the really tricky bit in improvement. The bit that requires all the skill and know-how. It turns on the Nolan and Provost question: How will you know when a change is an improvement? More work is needed here to persuade anybody of anything. More work is certainly needed by the BBC in improving their journalism.

Politicians don’t get it

I asked above if the freed time could be translated into an activity that creates greater value for the customer. The value of a thing is what somebody is willing to pay for it. When we say that an activity creates value we mean that it increases the price at which we can sell output. The importance of price is that it captures a revealed preference rather than just a casual attitude for which the subject will never have to give an account. Any activity that does not create value for the customer is waste. The Japanese word muda has become fashionable. It is at the core of achieving operational excellence that unrelenting, gradual and progressive elimination of waste is a daily activity for everybody in the organisation. Waste, everything that does not create value for the customer. Everything that does not make the customer willing to pay more. If the customer will not pay more there is no value for them.

John Redwood was a middle ranking official in John Major’s government of the 1990s though he had frustrated ambitions for higher office. He offered us his personal thoughts on productivity here. I think he illustrates how poorly politicians understand what productivity is. Redwood thinks that we are over simplifying things when we say that productivity is:


or, a better definition:


Redwood thinks that, in the service sector, “labour intensity is often seen as better service rather than as worse productivity”. It may be true but only in so far as the customer sees it as such and is willing to pay proportionately for the staffing. Where the customer will not pay then productivity is reduced and insisting that labour intensity is an inherent virtue is a delusion. I think this is the basis of what Redwood is trying to say about purchasing coffee from a store. The test is that the customer is willing to pay for the experience.

However imperfect the statistics, they do seek to capture what the customers have been willing to pay. The spend at the coffee stand should show up on the aggregated statistics for “customer value created” and so the retail coffee phenomenon will not manifest itself as a decrease in productivity. Redwood has completely misunderstood.

Of course there are measurement issues and they are serious ones. There is nothing though that suggests that the concept or its definition are at fault.

What is worrying is that Redwood’s background is in banking though I certainly know bankers who are less out of touch with the real world. Redwood needs to get that the fundamental theorem of business is that:

profit = price – cost

— and that price is set by the market. There are only two things to do to improve.

  • Develop products that enhance customer value.
  • Eliminate costs that do not contribute to customer value.

UK figures

I could not find a long-term productivity time series on the UK Office for National Statistics website (“ONS”). I think that is shameful. You know that I am always suspicious of politicians’ unwillingness to encourage sharing long term statistical series. I managed to find what I was looking for here at www.tradingeconomics.com. Click on the “MAX” tab on the chart.

That chart gave me a suspicion. The ONS website does have the data from 2008. There is a link to this data after Figure 3 of the ONS publication Labour Productivity: Oct to Dec 2015. However, all the charts in that publication are fairly hideous and lacking in graphical excellence. Here is the 2008 to 2015 data replotted.


I am satisfied that, following the steep drop in UK productivity coinciding with the world financial crisis of 2007/08, there has been a (fairly) steady rise in productivity to the region of pre-crash levels. Confirming that with a Shewhart chart is left as an exercise for the reader. Of course, there is common cause variation around the upward trend. And, I suspect, some special causes too. However, I think that inferences of gloom following the Quarter 4 2015 figures, the last observation plotted, are premature. A bad case of #executivetimeseries.

I think that makes me less gloomy about UK productivity than the press and politicians. I have a suspicion that growth since 2008 has been slower than historically but I do not want to take that too far here.

Coming next: Productivity and how to improve it III – Signal and noise