#executivetimeseries

ExecTS1OxfordDon Wheeler coined the term executive time series. I was just leaving court in Oxford the other day when I saw this announcement on a hoarding. I immediately thought to myself “#executivetimeseries”.

Wheeler introduced the phrase in his 2000 book Understanding Variation: The Key to Managing Chaos. He meant to criticise the habitual way that statistics are presented in business and government. A comparison is made between performance at two instants in time. Grave significance is attached as to whether performance is better or worse at the second instant. Well, it was always unlikely that it would be the same.

The executive time series has the following characteristics.

  • It as applied to some statistic, metric, Key Performance Indicator (KPI) or other measure that will be perceived as important by its audience.
  • Two time instants are chosen.
  • The statistic is quoted at each of the two instants.
  • If the latter is greater than the first then an increase is inferred. A decrease is inferred from the converse.
  • Great significance is attached to the increase or decrease.

Why is this bad?

At its best it provides incomplete information devoid of context. At its worst it is subject to gross manipulation. The following problems arise.

  • Though a signal is usually suggested there is inadequate information to infer this.
  • There is seldom explanation of how the time points were chosen. It is open to manipulation.
  • Data is presented absent its context.
  • There is no basis for predicting the future.

The Oxford billboard is even worse than the usual example because it doesn’t even attempt to tell us over what period the carbon reduction is being claimed.

Signal and noise

Let’s first think about noise. As Daniel Kahneman put it “A random event does not … lend itself to explanation, but collections of random events do behave in a highly regular fashion.” Noise is a collection of random events. Some people also call it common cause variation.

Imagine a bucket of thousands of beads. Of the beads, 80% are white and 20%, red. You are given a paddle that will hold 50 beads. Use the paddle to stir the beads then draw out 50 with the paddle. Count the red beads. Repeat this, let us say once a week, until you have 20 counts. The data might look something like this.

RedBeads1

What we observe in Figure 1 is the irregular variation in the number of red beads. However, it is not totally unpredictable. In fact, it may be one of the most predictable things you have ever seen. Though we cannot forecast exactly how many red beads we will see in the coming week, it will most likely be in the rough range of 4 to 14 with rather more counts around 10 than at the extremities. The odd one below 4 or above 14 would not surprise you I think.

But nothing changed in the characteristics of the underlying process. It didn’t get better or worse. The percentage of reds in the bucket was constant. It is a stable system of trouble. And yet measured variation extended between 4 and 14 red beads. That is why an executive time series is so dangerous. It alleges change while the underlying cause-system is constant.

Figure 2 shows how an executive time series could be constructed in week 3.

RedBeads2

The number of beads has increase from 4 to 10, a 150% increase. Surely a “significant result”. And it will always be possible to find some managerial initiative between week 2 and 3 that can be invoked as the cause. “Between weeks 2 and 3 we changed the angle of inserting the paddle and it has increased the number of red beads by 150%.”

But Figure 2 is not the only executive time series that the data will support. In Figure 3 the manager can claim a 57% reduction from 14 to 6. More than the Oxford banner. Again, it will always be possible to find some factor or incident supposed to have caused the reduction. But nothing really changed.

RedBeads3

The executive can be even more ambitious. “Between week 2 and 17 we achieved a 250% increase in red beads.” Now that cannot be dismissed as a mere statistical blip.

RedBeads4

#executivetimeseries

Data has no meaning apart from its context.

Walter Shewhart

Not everyone who cites an executive time series is seeking to deceive. But many are. So anybody who relies on an executive times series, devoid of context, invites suspicion that they are manipulating the message. This is Langian statistics. par excellence. The fallacy of What you see is all there is. It is essential to treat all such claims with the utmost caution. What properly communicates the present reality of some measure is a plot against time that exposes its variation, its stability (or otherwise) and sets it in the time context of surrounding events.

We should call out the perpetrators. #executivetimeseries

Techie note

The data here is generated from a sequence of 20 Bernoulli experiments with probability of “red” equal to 0.2 and 50 independent trials in each experiment.

Advertisements

Deconstructing Deming XI B – Eliminate numerical goals for management

11. Part B. Eliminate numerical goals for management.

W. Edwards Deming.jpgA supposed corollary to the elimination of numerical quotas for the workforce.

This topic seems to form a very large part of what passes for exploration and development of Deming’s ideas in the present day. It gets tied in to criticisms of remuneration practices and annual appraisal, and target-setting in general (management by objectives). It seems to me that interest flows principally from a community who have some passionately held emotional attitudes to these issues. Advocates are enthusiastic to advance the views of theorists like Alfie Kohn who deny, in terms, the effectiveness of traditional incentives. It is sad that those attitudes stifle analytical debate. I fear that the problem started with Deming himself.

Deming’s detailed arguments are set out in Out of the Crisis (at pp75-76). There are two principle reasoned objections.

  1. Managers will seek empty justification from the most convenient executive time series to hand.
  2. Surely, if we can improve now, we would have done so previously, so managers will fall back on (1).

The executive time series

I’ve used the time series below in some other blogs (here in 2013 and here in 2012). It represents the anual number of suicides on UK railways. This is just the data up to 2013.
RailwaySuicides2

The process behaviour chart shows a stable system of trouble. There is variation from year to year but no significant (sic) pattern. There is noise but no signal. There is an average of just over 200 fatalities, varying irregularly between around 175 and 250. Sadly, as I have discussed in earlier blogs, simply selecting a pair of observations enables a polemicist to advance any theory they choose.

In Railway Suicides in the UK: risk factors and prevention strategies, Kamaldeep Bhui and Jason Chalangary of the Wolfson Institute of Preventive Medicine, and Edgar Jones of the Institute of Psychiatry, King’s College, London quoted the Rail Safety and Standards Board (RSSB) in the following two assertions.

  • Suicides rose from 192 in 2001-02 to a peak 233 in 2009-10; and
  • The total fell from 233 to 208 in 2010-11 because of actions taken.

Each of these points is what Don Wheeler calls an executive time series. Selective attention, or inattention, on just two numbers from a sequence of irregular variation can be used to justify any theory. Deming feared such behaviour could be perverted to justify satisfaction of any goal. Of course, the process behaviour chart, nowhere more strongly advocated than by Deming himself in Out of the Crisis, is the robust defence against such deceptions. Diligent criticism of historical data by means of process behaviour charts is exactly what is needed to improve the business and exactly what guards against success-oriented interpretations.

Wishful thinking, and the more subtle cognitive biases studied by Daniel Kahneman and others, will always assist us in finding support for our position somewhere in the data. Process behaviour charts keep us objective.

If not now, when?

If I am not for myself, then who will be for me?
And when I am for myself, then what am “I”?
And if not now, when?

Hillel the Elder

Deming criticises managerial targets on the grounds that, were the means of achieving the target known, it would already have been achieved and, further, that without having the means efforts are futile at best. It’s important to remember that Deming is not here, I think, talking about efforts to stabilise a business process. Deming is talking about working to improve an already stable, but incapable, process.

There are trite reasons why a target might legitimately be mandated where it has not been historically realised. External market conditions change. A manager might unremarkably be instructed to “Make 20% more of product X and 40% less of product Y“. That plays in to the broader picture of targets’ role in co-ordinating the parts of a system, internal to the organisation of more widely. It may be a straightforward matter to change the output of a well-understood, stable system by an adjustment of the inputs.

Deming says:

If you have a stable system, then there is no use to specify a goal. You will get whatever the system will deliver.

But it is the manager’s job to work on a stable system to improve its capability (Out of the Crisis at pp321-322). That requires capital and a plan. It involves a target because the target captures the consensus of the whole system as to what is required, how much to spend, what the new system looks like to its customer. Simply settling for the existing process, being managed through systematic productivity to do its best, is exactly what Deming criticises at his Point 1 (Constancy of purpose for improvement).

Numerical goals are essential

… a manager is an information channel of decidedly limited capacity.

Kenneth Arrow
Essays in the Theory of Risk-Bearing

Deming’s followers have, to some extent, conceded those criticisms. They say that it is only arbitrary targets that are deprecated and not the legitimate Voice of the Customer/ Voice of the Business. But I think they make a distinction without a difference through the weasel words “arbitrary” and “legitimate”. Deming himself was content to allow managerial targets relating to two categories of existential risk.

However, those two examples are not of any qualitatively different type from the “Increase sales by 10%” that he condemns. Certainly back when Deming was writing Out of the Crisis most OELs were based on LD50 studies, a methodology that I am sure Deming would have been the first to criticise.

Properly defined targets are essential to business survival as they are one of the principal means by which the integrated function of the whole system is communicated. If my factory is producing more than I can sell, I will not work on increasing capacity until somebody promises me that there is a plan to improve sales. And I need to know the target of the sales plan to know where to aim with plant capacity. It is no good just to say “Make as much as you can. Sell as much as you can.” That is to guarantee discoordination and inefficiency. It is unsurprising that Deming’s thinking has found so little real world implementation when he seeks to deprive managers of one of the principle tools of managing.

Targets are dangerous

I have previously blogged about what is needed to implement effective targets. An ill judged target can induce perverse incentives. These can be catastrophic for an organisation, particularly one where the rigorous criticism of historical data is absent.

UK railway suicides – 2014 update

It’s taken me a while to sit down and blog about this news item from October 2014: Sharp Rise in Railway Suicides Say Network Rail . Regular readers of this blog will know that I have followed this data series closely in 2013 and 2012.

The headline was based on the latest UK government data. However, I baulk at the way these things are reported by the press. The news item states as follows.

The number of people who have committed suicide on Britain’s railways in the last year has almost reached 300, Network Rail and the Samaritans have warned. Official figures for 2013-14 show there have already been 279 suicides on the UK’s rail network – the highest number on record and up from 246 in the previous year.

I don’t think it’s helpful to characterise 279 deaths as “almost … 300”, where there is, in any event, no particular significance in the number 300. It arbitrarily conveys the impression that some pivotal threshold is threatened. Further, there is no especial significance in an increase from 246 to 279 deaths. Another executive time series. Every one of the 279 is a tragedy as is every one of the 246. The experience base has varied from year to year and there is no surprise that it has varied again. To assess the tone of the news report I have replotted the data myself.

RailwaySuicides3

Readers should note the following about the chart.

  • Some of the numbers for earlier years have been updated by the statistical authority.
  • I have recalculated natural process limits as there are still no more than 20 annual observations.
  • There is now a signal (in red) of an observation above the upper natural process limit.

The news report is justified, unlike the earlier ones. There is a signal in the chart and an objective basis for concluding that there is more than just a stable system of trouble. There is a signal and not just noise.

As my colleague Terry Weight always taught me, a signal gives us license to interpret the ups and downs on the chart. There are two possible narratives that immediately suggest themselves from the chart.

  • A sudden increase in deaths in 2013/14; or
  • A gradual increasing trend from around 200 in 2001/02.

The chart supports either story. To distinguish would require other sources of information, possibly historical data that can provide some borrowing strength, or a plan for future data collection. Once there is a signal, it makes sense to ask what was its cause. Building  a narrative around the data is a critical part of that enquiry. A manager needs to seek the cause of the signal so that he or she can take action to improve system outcomes. Reliably identifying a cause requires trenchant criticism of historical data.

My first thought here was to wonder whether the railway data simply reflected an increasing trend in suicide in general. Certainly a very quick look at the data here suggests that the broader trend of suicides has been downwards and certainly not increasing. It appears that there is some factor localised to railways at work.

I have seen proposals to repeat a strategy from Japan of bathing railway platforms with blue light. I have not scrutinised the Japanese data but the claims made in this paper and this are impressive in terms of purported incident reduction. If these modifications are implemented at British stations we can look at the chart to see whether there is a signal of fewer suicides. That is the only real evidence that counts.

Those who were advocating a narrative of increasing railway suicides in earlier years may feel vindicated. However, until this latest evidence there was no signal on the chart. There is always competition for resources and directing effort on a false assumptions leads to misallocation. Intervening in a stable system of trouble, a system featuring only noise, on the false belief that there is a signal will usually make the situation worse. Failing to listen to the voice of the process on the chart risks diverting vital resources and using them to make outcomes worse.

Of course, data in terms of time between incidents is much more powerful in spotting an early signal. I have not had the opportunity to look at such data but it would have provided more, better and earlier evidence.

Where there is a perception of a trend there will always be an instinctive temptation to fit a straight line through the data. I always ask myself why this should help in identifying the causes of the signal. In terms of analysis at this stage I cannot see how it would help. However, when we come to look for a signal of improvement in future years it may well be a helpful step.

Bang! UK Passport Office hits the kerb

Her Majesty's Passport OfficeThe UK’s Passport Office is in difficulties. They have a backlog that is resulting in customers’ passport applications being delayed. This is not a mere internal procedural inconvenience. The public has noticed the problem and started complaining. Emergency measures are being put in place to deal with the backlog. Politicians have become involved and are looking over their shoulders at their careers.

It is a typical organisational mess. There is a problem. Resources are thrown at it. Personalities wager their reputations. Any hero able to solve the problem will be feted and rewarded. There will be blame and punishment. Solutions will involve huge cost. The costs will be passed on to the customer because, in the end, there is no one else to pay.

A suggestion for investigation

From the outside, it is impossible to know the realities of what has caused the problem at HM Passport Office. However, I think I can respectfully and tentatively suggest some questions to ask in any inquiry as to how the mess occurred.

  • Had any surprising variation in passport processing occurred before the crisis hit?
  • If so, what action, if any, was taken?
  • Why was the action ineffective?
  • If no surprising variation was observed, were the managers measuring “upstream” indicators of process performance in addition to mere volumes?
  • Was historic data routinely interrogated to find signals among the noise?
  • If signals were only observed once it was too late to protect the customer, was the issuing process only marginally capable?

“Managing the passport issuing process on historical data is like …”

… trying to drive a car by watching the line in the rear-view mirror.

Myron Tribus

And, of course, that is what HM Passport Office and every manager has to do. There is only historical data. There is no data on the future. You cannot see out of the windscreen of the organisational SUV. Management is about subjecting the historic experience base to continual, rigorous statistical criticism to separate signal from noise. It is about having a good rear view mirror.

A properly managed, capable process will operate reliably, well within customer expectations. In process management terms, the Voice of the Process will be reliably aligned with the Voice of the Customer.

Forever improving the capability of the process gives it the elbow room or “rattle space” within which signals can occur that the customer never perceives. Those signals could represent changes in customer behaviour, problems within the organisation, or external events that have an impact. But the fact that they are unnoticed by the customer does not mean those signals are unimportant or can be neglected. It is by taking action to investigate those signals when they are detected, and by making necessary adjustments to work processes, that a future crisis can be averted.

While the customer is unaffected, the problem can be thoroughly investigated, solutions considered calmly and alternative remedies tested. Because the problem is invisible to the outside world there will be no sense of panic, political pressure, cash-flow deficit, reputational damage or destruction of employee engagement. The matter can be addressed soundly and privately.

Continual statistical analysis is the “rear view mirror”. It gives an historical picture as to how well the Voice of the Process emulates the Voice of the Customer. Coupled with a “roadmap” of the business, some supportive data from the “speedometer” and a little basic numeracy, the “rear view mirror” enables sensible predictions to be made about the near future.

Without that historical data, properly presented on live process behaviour charts to provide running statistical insight, then there is no rear view mirror. That is when the only business guidance is the Bang! when the organisation hits the kerb.

It looks like that is what happened at HM Passport Office. Everything was fine until the customers started complaining to the press. Bang! That’s how it looks to the customer and that is the only reality that counts.

#Bang!youhitthekerb

Deconstructing Deming III – Cease reliance on inspection

3. Cease dependence on inspection to achieve quality. Eliminate the need for massive inspection by building quality into the product in the first place.

W Edwards Deming Point 3 of Deming’s 14 Points. This at least cannot be controversial. For me it goes to the heart of Deming’s thinking.

The point is that every defective item produced (or defective service delivered) has taken cash from the pockets of customers or shareholders. They should be more angry. One day they will be. Inputs have been purchased with their cash, their resources have been deployed to transform the inputs and they will get nothing back in return. They will even face the costs of disposing of the scrap, especially if it is environmentally noxious.

That you have an efficient system for segregating non-conforming from conforming is unimpressive. That you spend even more of other people’s money reworking the product ought to be a matter of shame. Lean Six Sigma practitioners often talk of the hidden factory where the rework takes place. A factory hidden out of embarrassment. The costs remain whether you recognise them or not. Segregation is still more problematic in service industries.

The insight is not unique to Deming. This is a common theme in Lean, Six Sigma, Theory of Constraints and other approaches to operational excellence. However, Deming elucidated the profound statistical truths that belie the superficial effectiveness of inspection.

Inspection is inefficient

When I used to work in the railway industry I was once asked to look at what percentage of signalling scheme designs needed to be rechecked to defend against the danger of a logical error creeping through. The problem requires a simple application of Bayes’ theorem. I was rather taken aback at the result. There were only two strategies that made sense: recheck everything or recheck nothing. I didn’t at that point realise that this is a standard statistical result in inspection theory. For a wide class of real world situations, where the objective is to segregate non-conforming from conforming, the only sensible sampling schemes are 100% or 0%.

Where the inspection technique is destructive, such as a weld strength test, there really is only one option.

Inspection is ineffective

All inspection methods are imperfect. There will be false-positives and false-negatives. You will spend some money scrapping product you could have sold for cash. Some defective product will escape onto the market. Can you think of any examples in your own experience? Further, some of the conforming product will be only marginally conforming. It won’t delight the customer.

So build quality into the product

… and the process for producing the product (or delivering the service). Deming was a champion of the engineering philosophy of Genechi Taguchi who put forward a three-stage approach for achieving, what he called, off-line quality control.

  1. System design – in developing a product (or process) concept think about how variation in inputs and environment will affect performance. Choose concepts that are robust against sources of variation that are difficult or costly to control.
  2. Parameter design – choose product dimensions and process settings that minimise the sensitivity of performance to variation.
  3. Tolerance design – work out the residual sources of variation to which performance remains sensitive. Develop control plans for measuring, managing and continually reducing such variation.

Is there now no need to measure?

Conventional inspection aimed at approving or condemning a completed batch of output. The only thing of interest was the product and whether it conformed. Action would be taken on the batch. Deming called the application of statistics to such problems an enumerative study.

But the thing managers really need to know about is future outcomes and how they will be influenced by present decisions. There is no way of sampling the future. So sampling of the past has to go beyond mere characterisation and quantification of the outcomes. You are stuck with those and will have to take the consequences one way or another. Sampling (of the past) has to aim principally at understanding the causes of those historic outcomes. Only that enables managers to take a view on whether those causes will persist in the future, in what way they might change and how they might be adjusted. This is what Deming called an analytic study.

Essential to the ability to project data into the future is the recognition of common and special causes of variation. Only when managers are confident in thinking and speaking in those terms will their organisations have a sound basis for action. Then it becomes apparent that the results of inspection represent the occult interaction of inherent variation with threshold effects. Inspection obscures the distinction between common and special causes. It seduces the unwary into misguided action that exacerbates quality problems and reputational damage. It obscures the sad truth that, as Terry Weight put it, a disappointment is not necessarily a surprise.

The programme

  1. Drive out sensitivity to variation at the design stage.
  2. Routinely measure the inputs whose variation threatens product performance.
  3. Measure product performance too. Your bounded rationality may have led you to get (2) wrong.
  4. No need to measure every unit. We are trying to understand the cause system not segregate items.
  5. Plot data on a process behaviour chart.
  6. Stabilise the system.
  7. Establish capability.
  8. Keep on measuring to maintain stability and improve capability.

Some people think they have absorbed Deming’s thinking, mastered it even. Yet the test is the extent to which they are able to analyse problems in terms of common and special causes of variation. Is that the language that their organisation uses to communicate exceptions and business performance, and to share analytics, plans, successes and failures?

There has always been some distaste for Deming’s thinking among those who consider it cold, statistically driven and paralysed by data. But the data are only a means to getting beyond the emotional reaction to those two impostors: triumph and disaster. The language of common and special causes is a profound tool for building engagement, fostering communication and sharing understanding. Above that, it is the only sound approach to business measurement.

Trouble at the EU

I enjoy Metro the UK national free morning newspaper. It has a very straightforward non-partisan style. This morning there was an article dealing with the European Union’s (EU’s) accounting difficulties. There were a couple of very telling admissions from an EU bureaucrat. We lawyers love an admission.

Aidas Palubinskas, from the European Court of Auditors, … described the error rate as ‘relatively stable from year to year’.

He admits that the EU’s accounting is a stable system of trouble. That is a system where there is only common cause variation, variation common to the whole of the output, but where the system is still incapable of reliably delivering what the customer wants. Recognising that one is embedded in such a problem is the first step towards operational improvement. W Edwards Deming addressed the implications of the stable system and the strategy for its improvement at length in his seminal book Out of the Crisis (1982). The problems are not intractable but the solution demands leadership and adoption of the correct improvement approach.

Unfortunately, the second half of the quote is less encouraging.

He said the errors highlighted in its report were ‘examples of inefficiency, but not necessarily of waste’.

This makes me fear that the correct approach is far off for the EU. Everything that is not efficient, timely and effective delivery of what the customer wants is waste, as Toyota call it muda. Waste represents the scope of opportunity for improvement, for improving service and simultaneously reducing its cost. The first step in improvement is taken by accepting that waste is not inevitable and that it can be incrementally eliminated through use of appropriate tools under competent leadership.

The next step to improvement is to commit to the discipline of eliminating waste progressively. That requires leadership. That sort of leadership is often found in successful organisations. The EU, however, faces particular difficulties as an international bureaucracy with a multi-partisan political master and a democratically disengaged public. It is not easy to see where leadership will come from. This is a common problem of state bureaucracies.

Palubinskas is right to seek to analyse the problems as a stable system of trouble. However, beyond that, the path to radical improvement lies in rejecting the casual acceptance of waste and in committing to continual improvement of every process for delivery of service.

Rationing in UK health care – signal or noise?

The NHS in England appears to be rationing access to vital non-emergency hospital care, a review suggests.

This was the rather weaselly BBC headline last Friday. It referred to a report from Dr Foster Intelligence which appears to be a trading arm of Imperial College London.

The analysis alleged that the number of operations in three categories (cataract, knee and hip) had risen steadily between 2002 and 2008 but then “plateaued”. As evidence for this the BBC reproduced the following chart.

NHS_DrFoster_Dec13

Dr Foster Intelligence apparently argued that, as the UK population had continued to age since 2008, a “plateau” in the number of such operations must be evidence of “rationing”. Otherwise the rising trend would have continued. I find myself using a lot of quotes when I try to follow the BBC’s “data journalism”.

Unfortunately, I was unable to find the report or the raw data on the Dr Foster Intelligence website. It could be that my search skills are limited but I think I am fairly typical of the sort of people who might be interested in this. I would be very happy if somebody pointed me to the report and data. If I try to interpret the BBC’s journalism, the argument goes something like this.

  1. The rise in cataract, knee and hip operations has “plateaued”.
  2. Need for such operations has not plateaued.
  3. That is evidence of a decreased tendency to perform such operations when needed.
  4. Such a decreased tendency is because of “rationing”.

Now there are a lot of unanswered questions and unsupported assertions behind 2, 3 and 4 but I want to focus on 1. What the researchers say is that the experience base showed a steady rise in operations but that ceased some time around 2008. In other words, since 2008 there has been a signal that something has changed over the historical data.

Signals are seldom straightforward to spot. As Nate Silver emphasises, signals need to be contrasted with, and understood in the context of, noise, the irregular variation that is common to the whole of the historical data. The problem with common cause variation is that it can lead us to be, as Nassim Taleb puts it, fooled by randomness.

Unfortunately, without the data, I cannot test this out on a process behaviour chart. Can I be persuaded that this data represents an increasing trend then a signal of a “plateau”?

The first question is whether there is a signal of a trend at all. I suspect that in this case there is if the data is plotted on a process behaviour chart. The next question is whether there is any variation in the slope of that trend. One simple approach to this is to fit a linear regression line through the data and put the residuals on a process behaviour chart. Only if there is a signal on the residuals chart is an inference of a “plateau” left open. Looking at the data my suspicion is that there would be no such signal.

More complex analyses are possible. One possibility would be to adjust the number of operations by a measure of population age then look at the stability and predictability of those numbers. However, I see no evidence of that analysis either.

I think that where anybody claims to have detected a signal, the legal maxim should prevail: He who asserts must prove. I see no evidence in the chart alone to support the assertion of a rising trend followed by a “plateau”.