Why would a lawyer blog about statistics?

Brandeis and Taylor… is a question I often get asked. I blog here about statistics, data, quality, data quality, productivity, management and leadership. And evidence. I do it from my perspective as a practising lawyer and some people find that odd. Yet it turns out that the collaboration between law and quantitative management science is a venerable one.

The grandfather of scientific management is surely Frederick Winslow Taylor (1856-1915). Taylor introduced the idea of scientific study of work tasks, using data and quantitative methods to redesign and control business processes.

Yet one of Taylorism’s most effective champions was a lawyer, Louis Brandeis (1856-1941). In fact, it was Brandeis who coined the term scientific management.

Taylor

Taylor was a production engineer who advocated a four stage strategy for productivity improvement.

  1. Replace rule-of-thumb work methods with methods based on a scientific study of the tasks.
  2. Scientifically select, train, and develop each employee rather than passively leaving them to train themselves.
  3. Provide “Detailed instruction and supervision of each worker in the performance of that worker’s discrete task”.1
  4. Divide work nearly equally between managers and workers, so that the managers apply scientific management principles to planning the work and the workers actually perform the tasks.

Points (3) and (4) tend to jar with millennial attitudes towards engagement and collaborative work. Conservative political scientist Francis Fukuyama criticised Taylor’s approach as “[epitomising] the carrying of the low-trust, rule based factory system to its logical conclusion”.2 I have blogged many times on here about the importance of trust.

However, (1) and (2) provided the catalyst for pretty much all subsequent management science from W Edwards Deming, Elton Mayo, and Taiichi Ohno through to Six Sigma and Lean. Subsequent thinking has centred around creating trust in the workplace as inseparable from (1) and (2). Peter Drucker called Taylor the “Isaac Newton (or perhaps the Archimedes) of the science of work”.

Taylor claimed substantial successes with his redesign of work processes based on the evidence he had gathered, avant la lettre, in the gemba. His most cogent lesson was to exhort managers to direct their attention to where value was created rather than to confine their horizons to monthly accounts and executive summaries.

Of course, Taylor was long dead before modern business analytics began with Walter Shewhart in 1924. There is more than a whiff of the #executivetimeseries about some of Taylor’s work. Once management had Measurement System Analysis and the Shewhart chart there would no longer be any hiding place for groundless claims to non-existent improvements.

Brandeis

Brandeis practised as a lawyer in the US from 1878 until he was appointed a Justice of the Supreme Court in 1916. Brandeis’ principles as a commercial lawyer were, “first, that he would never have to deal with intermediaries, but only with the person in charge…[and] second, that he must be permitted to offer advice on any and all aspects of the firm’s affairs”. Brandies was trenchant about the benefits of a coherent commitment to business quality. He also believed that these things were achieved, not by chance, but by the application of policy deployment.

Errors are prevented instead of being corrected. The terrible waste of delays and accidents is avoided. Calculation is substituted for guess; demonstration for opinion.

Brandeis clearly had a healthy distaste for muda.3 Moreover, he was making a land grab for the disputed high ground that these days often earns the vague and fluffy label strategy.

The Eastern Rate Case

The worlds of Taylor and Brandeis embraced in the Eastern Rate Case of 1910. The Eastern Railroad Company had applied to the Interstate Commerce Commission (“the ICC”) arguing that their cost base had inflated and that an increase in their carriage rates was necessary to sustain the business. The ICC was the then regulator of those utilities that had a monopoly element. Brandeis by this time had taken on the role of the People’s Lawyer, acting pro bono in whatever he deemed to be the public interest.

Brandeis opposed the rate increase arguing that the escalation in Eastern’s cost base was the result of management failure, not an inevitable consequence of market conditions. The cost of a monopoly’s ineffective governance should, he submitted, not be born by the public, nor yet by the workers. In court Brandeis was asked what Eastern should do and he advocated scientific management. That is where and when the term was coined.4

Taylor-Brandeis

The insight that profit cannot simply be wished into being by the fiat of cost plus, a fortiori of the hourly rate, is the Milvian bridge to lean.

But everyone wants to occupy the commanding heights of an integrated policy nurturing quality, product development, regulatory compliance, organisational development and the economic exploitation of customer value. What’s so special about lawyers in the mix? I think we ought to remind ourselves that if lawyers know about anything then we know about evidence. And we just might know as much about it as the statisticians, the engineers and the enforcers. Here’s a tale that illustrates our value.

Thereza Imanishi-Kari was a postdoctoral researcher in molecular biology at the Massachusetts Institute of Technology. In 1986 a co-worker raised inconsistencies in Imanishi-Kari’s earlier published work that escalated into allegations that she had fabricated results to validate publicly funded research. Over the following decade, the allegations grew in seriousness, involving the US Congress, the Office of Scientific Integrity and the FBI. Imanishi-Kari was ultimately exonerated by a departmental appeal board constituted of an eminent molecular biologist and two lawyers. The board heard cross-examination of the relevant experts including those in statistics and document examination. It was that cross-examination that exposed the allegations as without foundation.5

Lawyers can make a real contribution to discovering how a business can be run successfully. But we have to live the change we want to be. The first objective is to bring management science to our own business.

The black-letter man may be the man of the present but the man of the future is the man of statistics and the master of economics.

Oliver Wendell Holmes, 1897

References

  1. Montgomery, D (1989) The Fall of the House of Labor: The Workplace, the State, and American Labor Activism, 1865-1925, Cambridge University Press, p250
  2. Fukuyama, F (1995) Trust: The Social Virtues and the Creation of Prosperity, Free Press, p226
  3. Kraines, O (1960) “Brandeis’ philosophy of scientific management” The Western Political Quarterly 13(1), 201
  4. Freedman, L (2013) Strategy: A History, Oxford University Press, pp464-465
  5. Kevles, D J (1998) The Baltimore Case: A Trial of Politics, Science and Character, Norton
Advertisements

Data science sold down the Amazon? Jeff Bezos and the culture of rigour

This blog appeared on the Royal Statistical Society website Statslife on 25 August 2015

Jeff Bezos' iconic laugh.jpgThis recent item in the New York Times has catalysed discussion among managers. The article tells of Amazon’s founder, Jeff Bezos, and his pursuit of rigorous data driven management. It also tells employees’ own negative stories of how that felt emotionally.

The New York Times says that Amazon is pervaded with abundant data streams that are used to judge individual human performance and which drive reward and advancement. They inform termination decisions too.

The recollections of former employees are not the best source of evidence about how a company conducts its business. Amazon’s share of the retail market is impressive and they must be doing something right. What everybody else wants to know is, what is it? Amazon are very coy about how they operate and there is a danger that the business world at large takes the wrong messages.

Targets

Targets are essential to business. The marketing director predicts that his new advertising campaign will create demand for 12,000 units next year. The operations director looks at her historical production data. She concludes that the process lacks the capability reliably to produce those volumes. She estimates the budget required to upgrade the process and to achieve 12,000 units annually. The executive board considers the business case and signs off the investment. Both marketing and operations directors now have a target.

Targets communicate improvement priorities. They build confidence between interfacing processes. They provide constraints and parameters that prevent the system causing harm. Harm to others or harm to itself. They allow the pace and substance of multiple business processes, and diverse entities, to be matched and aligned.

But everyone who has worked in business sees it as less simple than that. The marketing and operations directors are people.

Signal and noise

Drawing conclusions from data might be an uncontroversial matter were it not for the most common feature of data, fluctuation. Call it variation if you prefer. Business measures do not stand still. Every month, week, day and hour is different. All data features noise. Sometimes is goes up, sometimes down. A whole ecology of occult causes, weakly characterised, unknown and as yet unsuspected, interact to cause irregular variation. They are what cause a coin variously to fall “heads” or “tails”. That variation may often be stable enough, or if you like “exchangeable“, so as to allow statistical predictions to be made, as in the case of the coin toss.

If all data features noise then some data features signals. A signal is a sign, an indicator that some palpable cause has made the data stand out from the background noise. It is that assignable cause which enables inferences to be drawn about what interventions in the business process have had a tangible effect and what future innovations might cement any gains or lead to bigger prospective wins. Signal and noise lead to wholly different business strategies.

The relevance for business is that people, where not exposed to rigorous decision support, are really bad at telling the difference between signal and noise. Nobel laureate economist and psychologist Daniel Kahneman has amassed a lifetime of experimental and anecdotal data capturing noise misinterpreted as signal and judgments in the face of compelling data, distorted by emotional and contextual distractions.

Signal and accountability

It is a familiar trope of business, and government, that extravagant promises are made, impressive business cases set out and targets signed off. Yet the ultimate scrutiny as to whether that envisaged performance was realised often lacks rigour. Noise, with its irregular ups and downs, allows those seeking solace from failure to pick out select data points and cast self-serving narratives on the evidence.

Our hypothetical marketing director may fail to achieve his target but recount how there were two individual months where sales exceeded 1,000, construct elaborate rationales as to why only they are representative of his efforts and point to purported external factors that frustrated the remaining ten reports. Pairs of individual data points can always be selected to support any story, Don Wheeler’s classic executive time series.

This is where the ability to distinguish signal and noise is critical. To establish whether targets have been achieved requires crisp definition of business measures, not only outcomes but also the leading indicators that provide context and advise judgment as to prediction reliability. Distinguishing signal and noise requires transparent reporting that allows diverse streams of data criticism. It requires a rigorous approach to characterising noise and a systematic approach not only to identifying signals but to reacting to them in an agile and sustainable manner.

Data is essential to celebrating a target successfully achieved and to responding constructively to a failure. But where noise is gifted the status of signal to confirm a fanciful business case, or to protect a heavily invested reputation, then the business is misled, costs increased, profits foregone and investors cheated.

Where employees believe that success and reward is being fudged, whether because of wishful thinking or lack of data skills, or mistakenly through lack of transparency, then cynicism and demotivation will breed virulently. Employees watch the behaviours of their seniors carefully as models of what will lead to their own advancement. Where it is deceit or innumeracy that succeed, that is what will thrive.

Noise and blame

Here is some data of the number of defects caused by production workers last month.

Worker Defects
Al 10
Simone 6
Jose 10
Gabriela 16
Stan 10

What is to be done about Gabriela? Move to an easier job? Perhaps retraining? Or should she be let go? And Simone? Promote to supervisor?

Well, the numbers were just random numbers that I generated. I didn’t add anything in to make Gabriela’s score higher and there was nothing in the way that I generated the data to suggest who would come top or bottom. The data are simply noise. They are the sort of thing that you might observe in a manufacturing plant that presented a “stable system of trouble”. Nothing in the data signals any behaviour, attitude, skill or diligence that Gabriela lacked or wrongly exercised. The next month’s data would likely show a different candidate for dismissal.

Mistaking signal for noise is, like mistaking noise for signal, the path to business under performance and employee disillusionment. It has a particularly corrosive effect where used, as it might be in Gabriela’s case, to justify termination. The remaining staff will be bemused as to what Gabriela was actually doing wrong and start to attach myriad and irrational doubts to all sorts of things in the business. There may be a resort to magical thinking. The survivors will be less open and less willing to share problems with their supervisors. The business itself has the costs of recruitment to replace Gabriela. The saddest aspect of the whole business is the likelihood that Gabriela’s replacement will perform better than did Gabriela, vindicating the dismissal in the mind of her supervisor. This is the familiar statistical artefact of regression to the mean. An extreme event is likely to be followed by one less extreme. Again, Kahneman has collected sundry examples of managers so deceived by singular human performance and disappointed by its modest follow-up.

It was W Edwards Deming who observed that every time you recruit a new employee you take a random sample from the pool of job seekers. That’s why you get the regression to the mean. It must be true at Amazon too as their human resources executive Mr Tony Galbato explains their termination statistics by admitting that “We don’t always get it right.” Of course, everybody thinks that their recruitment procedures are better than average. That’s a management claim that could well do with rigorous testing by data.

Further, mistaking noise for signal brings the additional business expense of over adjustment, spending money to add costly variation while degrading customer satisfaction. Nobody in the business feels good about that.

Target quality, data quality

I admitted above that the evidence we have about Amazon’s operations is not of the highest quality. I’m not in a position to judge what goes on at Amazon. But all should fix in their minds that setting targets demands rigorous risk assessment, analysis of perverse incentives and intense customer focus.

It is a sad reality that, if you set incentives perversely enough,some individuals will find ways of misreporting data. BNFL’s embarrassment with Kansai Electric and Steven Eaton’s criminal conviction were not isolated incidents.

One thing that especially bothered me about the Amazon report was the soi-disant Anytime Feedback Tool that allowed unsolicited anonymous peer appraisal. Apparently, this formed part of the “data” that determined individual advancement or termination. The description was unchallenged by Amazon’s spokesman (sic) Mr Craig Berman. I’m afraid, and I say this as a practising lawyer, unsourced and unchallenged “evidence” carries the spoor of the Star Chamber and the party purge. I would have thought that a pretty reliable method for generating unreliable data would be to maximise the personal incentives for distortion while protecting it from scrutiny or governance.

Kahneman observed that:

… we pay more attention to the content of messages than to information about their reliability, and as a result end up with a view of the world around us that is simpler and more coherent than the data justify.

It is the perverse confluence of fluctuations and individual psychology that makes statistical science essential, data analytics interesting and business, law and government difficult.

Productivity and how to improve it: I -The foundational narrative

Again, much talk in the UK media recently about weak productivity statistics. Chancellor of the Exchequer (Finance Minister) George Osborne has launched a 15 point macroeconomic strategy aimed at improving national productivity. Some of the points are aimed at incentivising investment and training. There will be few who argue against that though I shall come back to the investment issue when I come to talk about signal and noise. I have already discussed training here. In any event, the strategy is fine as far as these things go. Which is not very far.

There remains the microeconomic task for all of us of actually improving our own productivity and that of the systems we manage. That is not the job of government.

Neither can I offer any generalised system for improving productivity. It will always be industry and organisation dependent. However, I wanted to write about some of the things that you have to understand if your efforts to improve output are going to be successful and sustainable.

  • Customer value and waste.
  • The difference between signal and noise.
  • How to recognise flow and manage a constraint.

Before going on to those in future weeks I first wanted to go back and look at what has become the foundational narrative of productivity improvement, the Hawthorne experiments. They still offer some surprising insights.

The Hawthorne experiments

In 1923, the US electrical engineering industry was looking to increase the adoption of electric lighting in American factories. Uptake had been disappointing despite the claims being made for increased productivity.

[Tests in nine companies have shown that] raising the average initial illumination from about 2.3 to 11.2 foot-candles resulted in an increase in production of more than 15%, at an additional cost of only 1.9% of the payroll.

Earl A Anderson
General Electric
Electrical World (1923)

E P Hyde, director of research at GE’s National Lamp Works, lobbied government for the establishment of a Committee on Industrial Lighting (“the CIL”) to co-ordinate marketing-oriented research. Western Electric volunteered to host tests at their Hawthorne Works in Cicero, IL.

Western Electric came up with a study design that comprised a team of experienced workers assembling relays, winding their coils and inspecting them. Tests commenced in November 1924 with active support from an elite group of academic and industrial engineers including the young Vannevar Bush, who would himself go on to an eminent career in government and science policy. Thomas Edison became honorary chairman of the CIL.

It’s a tantalising historical fact that Walter Shewhart was employed at the Hawthorne Works at the time but I have never seen anything suggesting his involvement in the experiments, nor that of his mentor George G Edwards, nor protégé Joseph Juran. In later life, Juran was dismissive of the personal impact that Shewhart had had on operations there.

However, initial results showed no influence of light level on productivity at all. Productivity rose throughout the test but was wholly uncorrelated with lighting level. Theories about the impact of human factors such as supervision and motivation started to proliferate.

A further schedule of tests was programmed starting in September 1926. Now, the lighting level was to be reduced to near darkness so that the threshold of effective work could be identified. Here is the summary data (from Richard Gillespie Manufacturing Knowledge: A History of the Hawthorne Experiments, Cambridge, 1991).

Hawthorne data-1

It requires no sophisticated statistical analysis to see that the data is all noise and no signal. Much to the disappointment of the CIL, and the industry, there was no evidence that illumination made any difference at all, even down to conditions of near darkness. It’s striking that the highest lighting levels embraced the full range of variation in productivity from the lowest to the highest. What had seemed so self evidently a boon to productivity was purely incidental. It is never safe to assume that a change will be an improvement. As W Edwards Deming insisted, “In God was trust. All others bring data.”

But the data still seemed to show a relentless improvement of productivity over time. The participants were all very experienced in the task at the start of the study so there should have been no learning by doing. There seemed no other explanation than that the participants were somehow subliminally motivated by the experimental setting. Or something.

Hawthorne data-2

That subliminally motivated increase in productivity came to be known as the Hawthorne effect. Attempts to explain it led to the development of whole fields of investigation and organisational theory, by Elton Mayo and others. It really was the foundation of the management consulting industry. Gillespie (supra) gives a rich and intriguing account.

A revisionist narrative

Because of the “failure” of the experiments’ purpose there was a falling off of interest and only the above summary results were ever published. The raw data were believed destroyed. Now “you know, at least you ought to know, for I have often told you so” about Shewhart’s two rules for data presentation.

  1. Data should always be presented in such a way as to preserve the evidence in the data for all the predictions that might be made from the data.
  2. Whenever an average, range or histogram is used to summarise observations, the summary must not mislead the user into taking any action that the user would not take if the data were presented in context.

The lack of any systematic investigation of the raw data led to the development of a discipline myth that every single experimental adjustment had led forthwith to an increase in productivity.

In 2009, Steven Levitt, best known to the public as the author of Freakonomics, along with John List and their research team, miraculously discovered a microfiche of the raw study data at a “small library in Milwaukee, WI” and the remainder in Boston, MA. They went on to analyse the data from scratch (Was there Really a Hawthorne Effect at the Hawthorne Plant? An Analysis of the Original Illumination Experiments, National Bureau of Economic Research, Working Paper 15016, 2009).

LevittHawthonePlot

Figure 3 of Levitt and List’s paper (reproduced above) shows the raw productivity measurements for each of the experiments. Levitt and List show how a simple plot such as this reveals important insights into how the experiments developed. It is a plot that yields a lot of information.

Levitt and List note that, in the first phase of experiments, productivity rose then fell when experiments were suspended. They speculate as to whether there was a seasonal effect with lower summer productivity.

The second period of experiments is that between the third and fourth vertical lines in the figure. Only room 1 experienced experimental variation in this period yet Levitt and List contend that productivity increased in all three rooms, falling again at the end of experimentation.

During the final period, data was only collected from room 1 where productivity continued to rise, even beyond the end of the experiment. Looking at the data overall, Levitt and List find some evidence that productivity responded more to changes in artificial light than to natural light. The evidence that increases in productivity were associated with every single experimental adjustment is weak. To this day, there is no compelling explanation of the increases in productivity.

Lessons in productivity improvement

Deming used to talk of “disappointment in great ideas”, the propensity for things that looked so good on paper simply to fail to deliver the anticipated benefits. Nobel laureate psychologist Daniel Kahneman warns against our individual bounded rationality.

To guard against entrapment by the vanity of imagination we need measurement and data to answer the ineluctable question of whether the change we implemented so passionately resulted in improvement. To be able to answer that question demands the separation of signal from noise. That requires trenchant data criticism.

And even then, some factors may yet be beyond our current knowledge. Bounded rationality again. That is why the trick of continual improvement in productivity is to use the rigorous criticism of historical data to build collective knowledge incrementally.

If you torture the data enough, nature will always confess.

Ronald Coase

Eventually.

Does noise make you fat?

“A new study has unearthed some eye-opening facts about the effects of noise pollution on obesity,” proclaimed The Huffington Post recently in another piece or poorly uncritical data journalism.

Journalistic standards notwithstanding, in Exposure to traffic noise and markers of obesity (BMJ Occupational and environmental medicine, May 2015) Andrei Pyko and eight (sic) collaborators found “evidence of a link between traffic noise and metabolic outcomes, especially central obesity.” The particular conclusion picked up by the press was that each 5 dB increase in traffic noise could add 2 mm to the waistline.

Not trusting the press I decided I wanted to have a look at this research myself. I was fortunate that the paper was available for free download for a brief period after the press release. It took some finding though. The BMJ insists that you will now have to pay. I do find that objectionable as I see that the research was funded in part by the European Union. Us European citizens have all paid once. Why should we have to pay again?

On reading …

I was though shocked reading Pyko’s paper as the Huffington Post journalists obviously hadn’t. They state “Lack of sleep causes reduced energy levels, which can then lead to a more sedentary lifestyle and make residents less willing to exercise.” Pyko’s paper says no such thing. The researchers had, in particular, conditioned on level of exercise so that effect had been taken out. It cannot stand as an explanation of the results. Pyko’s narrative concerned noise-induced stress and cortisol production, not lack of exercise.

In any event, the paper is densely written and not at all easy to analyse and understand. I have tried to pick out the points that I found most bothering but first a statistics lesson.

Prediction 101

Frame(Almost) the first thing to learn in statistics is the relationship between population, frame and sample. We are concerned about the population. The frame is the enumerable and accessible set of things that approximate the population. The sample is a subset of the frame, selected in an economic, systematic and well characterised manner.

In Some Theory of Sampling (1950), W Edwards Deming drew a distinction between two broad types of statistical studies, enumerative and analytic.

  • Enumerative: Action will be taken on the frame.
  • Analytic: Action will be on the cause-system that produced the frame.

It is explicit in Pyko’s work that the sampling frame was metropolitan Stockholm, Sweden between the years 2002 and 2006. It was a cross-sectional study. I take it from the institutional funding that the study intended to advise policy makers as to future health interventions. Concern was beyond the population of Stockholm, or even Sweden. This was an analytic study. It aspired to draw generalised lessons about the causal mechanisms whereby traffic noise aggravated obesity so as to support future society-wide health improvement.

How representative was the frame of global urban areas stretching over future decades? I have not the knowledge to make a judgment. The issue is mentioned in the paper but, I think, with insufficient weight.

There are further issues as to the sampling from the frame. Data was taken from participants in a pre-existing study into diabetes that had itself specific criteria for recruitment. These are set out in the paper but intensify the questions of whether the sample is representative of the population of interest.

The study

The researchers chose three measures of obesity, waist circumference, waist-hip ratio and BMI. Each has been put forwards, from time to time, as a measure of health risk.

There were 5,075 individual participants in the study, a sample of 5,075 observations. The researchers performed both a linear regression simpliciter and a logistic regression. For want of time and space I am only going to comment on the former. It is the origin of the headline 2 mm per 5 dB claim.

The researchers have quoted p-values but they haven’t committed the worst of sins as they have shown the size of the effects with confidence intervals. It’s not surprising that they found so many soi-disant significant effects given the sample size.

However, there was little assistance in judging how much of the observed variation in obesity was down to traffic noise. I would have liked to see a good old fashioned analysis of variance table. I could then at least have had a go at comparing variation from the measurement process, traffic noise and other effects. I could also have calculated myself an adjusted R2.

Measurement Systems Analysis

Understanding variation from the measurement process is critical to any analysis. I have looked at the World Health Organisation’s definitive 2011 report on the effects of waist circumference on health. Such Measurement Systems Analysis as there is occurs at p7. They report a “technical error” (me neither) of 1.31 cm from intrameasurer error (I’m guessing repeatability) and 1.56 cm from intermeasurer error (I’m guessing reproducibility). They remark that “Even when the same protocol is used, there may be variability within and between measurers when more than one measurement is made.” They recommend further research but I have found none. There is no way of knowing from what is published by Pyko whether the reported effects are real or flow from confounding between traffic noise and intermeasurer variation.

When it comes to waist-hip ratio I presume that there are similar issues in measuring hip circumference. When the two dimensions are divided then the individual measurement uncertainties aggregate. More problems, not addressed.

Noise data

The key predictor of obesity was supposed to be noise. The noise data used were not in situ measurements in the participants’ respective homes. The road traffic noise data were themselves predicted from a mathematical model using “terrain data, ground surface, building height, traffic data, including 24 h yearly average traffic flow, diurnal distribution and speed limits, as well as information on noise barriers”. The model output provided 5 dB contours. The authors then applied some further ad hoc treatments to the data.

The authors recognise that there is likely to be some error in the actual noise levels, not least from the granularity. However, they then seem to assume that this is simply an errors in variables situation. That would do no more than (conservatively) bias any observed effect towards zero. However, it does seem to me that there is potential for much more structured systematic effects to be introduced here and I think this should have been explored further.

Model criticism

The authors state that they carried out a residuals analysis but they give no details and there are no charts, even in the supplementary material. I would like to have had a look myself as the residuals are actually the interesting bit. Residuals analysis is essential in establishing stability.

In fact, in the current study there is so much data that I would have expected the authors to have saved some of the data for cross-validation. That would have provided some powerful material for model criticism and validation.

Given that this is an analytic study these are all very serious failings. With nine researchers on the job I would have expected some effort on these matters and some attention from whoever was the statistical referee.

Results

Separate results are presented for road, rail and air traffic noise. Again, for brevity I am looking at the headline 2 mm / 5 dB quoted for road traffic noise. Now, waist circumference is dependent on gross body size. Men are bigger than women and have larger waists. Similarly, the tall are larger-waisted than the short. Pyko’s regression does not condition on height (as a gross characterisation of body size).

BMI is a factor that attempts to allow for body size. Pyko found no significant influence on BMI from road traffic noise.

Waist-hip ration is another parameter that attempts to allow for body size. It is often now cited as a better predictor of morbidity than BMI. That of course is irrelevant to the question of whether noise makes you fat. As far as I can tell from Pyko’s published results, a 5 dB increase in road traffic noise accounted for a 0.16 increase in waist-hip ratio. Now, let us look at this broadly. Consider a woman with waist circumference 85 cm, hip 100 cm, hence waist-hip ratio, 0.85. All pretty typical for the study. Predictively the study is suggesting that a 5 dB increase in road traffic noise might unremarkably take her waist-hip ratio up over 1.0. That seems barely consistent with the results from waist circumference alone where there would not only be millimetres of growth. It is incredible physically.

I must certainly have misunderstood what the waist-hip result means but I could find no elucidation in Pyko’s paper.

Policy

Research such as this has to be aimed at advising future interventions to control traffic noise in urban environments. Broadly speaking, 5 dB is a level of noise change that is noticeable to human hearing but no more. All the same, achieving such a reduction in an urban environment is something that requires considerable economic resources. Yet, taking the research at its highest, it only delivers 2 mm on the waistline.

I had many criticisms other than those above and I do not, in any event, consider this study adequate for making any prediction about a future intervention. Nothing in it makes me feel the subject deserves further study. Or that I need to avoid noise to stay slim.

Deconstructing Deming XI B – Eliminate numerical goals for management

11. Part B. Eliminate numerical goals for management.

W. Edwards Deming.jpgA supposed corollary to the elimination of numerical quotas for the workforce.

This topic seems to form a very large part of what passes for exploration and development of Deming’s ideas in the present day. It gets tied in to criticisms of remuneration practices and annual appraisal, and target-setting in general (management by objectives). It seems to me that interest flows principally from a community who have some passionately held emotional attitudes to these issues. Advocates are enthusiastic to advance the views of theorists like Alfie Kohn who deny, in terms, the effectiveness of traditional incentives. It is sad that those attitudes stifle analytical debate. I fear that the problem started with Deming himself.

Deming’s detailed arguments are set out in Out of the Crisis (at pp75-76). There are two principle reasoned objections.

  1. Managers will seek empty justification from the most convenient executive time series to hand.
  2. Surely, if we can improve now, we would have done so previously, so managers will fall back on (1).

The executive time series

I’ve used the time series below in some other blogs (here in 2013 and here in 2012). It represents the anual number of suicides on UK railways. This is just the data up to 2013.
RailwaySuicides2

The process behaviour chart shows a stable system of trouble. There is variation from year to year but no significant (sic) pattern. There is noise but no signal. There is an average of just over 200 fatalities, varying irregularly between around 175 and 250. Sadly, as I have discussed in earlier blogs, simply selecting a pair of observations enables a polemicist to advance any theory they choose.

In Railway Suicides in the UK: risk factors and prevention strategies, Kamaldeep Bhui and Jason Chalangary of the Wolfson Institute of Preventive Medicine, and Edgar Jones of the Institute of Psychiatry, King’s College, London quoted the Rail Safety and Standards Board (RSSB) in the following two assertions.

  • Suicides rose from 192 in 2001-02 to a peak 233 in 2009-10; and
  • The total fell from 233 to 208 in 2010-11 because of actions taken.

Each of these points is what Don Wheeler calls an executive time series. Selective attention, or inattention, on just two numbers from a sequence of irregular variation can be used to justify any theory. Deming feared such behaviour could be perverted to justify satisfaction of any goal. Of course, the process behaviour chart, nowhere more strongly advocated than by Deming himself in Out of the Crisis, is the robust defence against such deceptions. Diligent criticism of historical data by means of process behaviour charts is exactly what is needed to improve the business and exactly what guards against success-oriented interpretations.

Wishful thinking, and the more subtle cognitive biases studied by Daniel Kahneman and others, will always assist us in finding support for our position somewhere in the data. Process behaviour charts keep us objective.

If not now, when?

If I am not for myself, then who will be for me?
And when I am for myself, then what am “I”?
And if not now, when?

Hillel the Elder

Deming criticises managerial targets on the grounds that, were the means of achieving the target known, it would already have been achieved and, further, that without having the means efforts are futile at best. It’s important to remember that Deming is not here, I think, talking about efforts to stabilise a business process. Deming is talking about working to improve an already stable, but incapable, process.

There are trite reasons why a target might legitimately be mandated where it has not been historically realised. External market conditions change. A manager might unremarkably be instructed to “Make 20% more of product X and 40% less of product Y“. That plays in to the broader picture of targets’ role in co-ordinating the parts of a system, internal to the organisation of more widely. It may be a straightforward matter to change the output of a well-understood, stable system by an adjustment of the inputs.

Deming says:

If you have a stable system, then there is no use to specify a goal. You will get whatever the system will deliver.

But it is the manager’s job to work on a stable system to improve its capability (Out of the Crisis at pp321-322). That requires capital and a plan. It involves a target because the target captures the consensus of the whole system as to what is required, how much to spend, what the new system looks like to its customer. Simply settling for the existing process, being managed through systematic productivity to do its best, is exactly what Deming criticises at his Point 1 (Constancy of purpose for improvement).

Numerical goals are essential

… a manager is an information channel of decidedly limited capacity.

Kenneth Arrow
Essays in the Theory of Risk-Bearing

Deming’s followers have, to some extent, conceded those criticisms. They say that it is only arbitrary targets that are deprecated and not the legitimate Voice of the Customer/ Voice of the Business. But I think they make a distinction without a difference through the weasel words “arbitrary” and “legitimate”. Deming himself was content to allow managerial targets relating to two categories of existential risk.

However, those two examples are not of any qualitatively different type from the “Increase sales by 10%” that he condemns. Certainly back when Deming was writing Out of the Crisis most OELs were based on LD50 studies, a methodology that I am sure Deming would have been the first to criticise.

Properly defined targets are essential to business survival as they are one of the principal means by which the integrated function of the whole system is communicated. If my factory is producing more than I can sell, I will not work on increasing capacity until somebody promises me that there is a plan to improve sales. And I need to know the target of the sales plan to know where to aim with plant capacity. It is no good just to say “Make as much as you can. Sell as much as you can.” That is to guarantee discoordination and inefficiency. It is unsurprising that Deming’s thinking has found so little real world implementation when he seeks to deprive managers of one of the principle tools of managing.

Targets are dangerous

I have previously blogged about what is needed to implement effective targets. An ill judged target can induce perverse incentives. These can be catastrophic for an organisation, particularly one where the rigorous criticism of historical data is absent.

Deconstructing Deming XI A – Eliminate numerical quotas for the workforce

11. Part A. Eliminate numerical quotas for the workforce.

W Edwards DemingI find this probably the most confused part of Deming’s thinking. Carefully reading Out of the Crisis (at pp70-75) Deming’s attack is not on standardised work, that is advocated as central to his message, but against specifications for the volume of work: calls answered per hour, finished parts per day.

Deming recognises management’s need to predict costs and revenues but condemns quotas as destructive of achieving productivity.

Deming also deprecates such quotas as corroding workplace pride. I shall return to that in Point 12.

Deming’s criticism of work quotas goes as follows.

  • Some individuals may achieve them easily and their productive capacity will then stand idle.
  • Some individuals may struggle and suffer poor moral.
  • Some individuals may compromise quality so as to make a quota or so as to make it sooner.
  • Achievement of quotas may be frustrated by faults in “the system” which are outside the individual worker’s control.

Deming gives the following example of how he would advise financial planning in a call centre of 500 people (at pp73-74).

  1. Set a preliminary budget.
  2. Make it clear to every one of the 500 that their aim is to give satisfaction to the customer, to take pride in their work.
  3. Everybody will keep a record of calls made.
  4. Customers with special problems will be referred to the supervisor.
  5. At the end of each week, sample 100 individuals’ record and summarise the data.
  6. Repeat steps 2 to 5 for several weeks.
  7. Analyse the data.
  8. Establish a continuing study following the above steps but on a reducing basis.
  9. Use the data to predict costs.

Now there is much merit in forecasting costs based on actual data. Further, improving performance based on the relentless criticism of historical data is essential. However, I think Deming’s prescription naïve and idealistic. The trick is to extract the ideals and industrialise them.

Planning

The simple matter is that any new enterprise has to be established on the basis of a robust business plan. There is competition for resources: people, capital, infrastructure … and everyone has to make their case. It is impossible to do that without judgment. No matter how much historical data or even qualitative experience is to hand we cannot simply project it into the future without establishing further conditions (RearView). It is unlikely this can ever be done exactly in a new establishment.

That competition for resources then prevents us from taking an overly conservative view of what can be achieved. Setting the bar too low for call centre operators starts off from an uncompetitive position. Further, the modest answering rate in the plan has to be resourced with infrastructure. Intentions to improve the answering rate post-launch are all very well but what will happen to the personnel and materiel that we bought in to accommodate the unambitious start-up?

Sometimes work needs to be set at a rate that is recognised by a team of co-workers and other parts of the organisation. Excess production is as contrary to the philosophy of lean operations as is shortage. The idea of takt time allows production lines to be balanced, receipts and deliveries co-ordinated, stock turns to be minimised and cash flows improved. In many situations that is sufficient to answer Deming’s fears about individuals distorting production to bank an accomplished target.

Stretch

What is now proved was once but imagined.

William Blake

Is it so wrong to set a target that nobody involved has seen achieved before? Deming would say that it was fine so long as there was a plan defining the means by which this could be achieved. There are many compelling stories from sports science telling how records have been broken by incremental improvement (e.g. Dave Brailsford and the GB cycling team).

But what about setting an ambitious stretch target without a plan for achieving it? That would be brave indeed. It would be based on no more than an exhortation to the call centre operators to work more furiously, more furiously than anyone had ever done before. I cannot say that would never work. In my athletics days I ran some of my best times when team mates were urging me on from the sidelines. However, as a business strategy it faces the social realities of employees’ collective ability to resist quietly that to which they do not assent. With a carefully recruited and motivated team it could work. It would certainly require a high degree of collective problem solving and improvement by the operators. But of all strategies for operational excellence it looks the most limited and the most risky. There is no obvious Plan B.

The Ringelmann effect

There is a tension between unrealistic stretch targets and a further problem that Deming ignores entirely, the Ringelmann effect. It may sadden the hearts of those who believe in the inherent fulfilling joy of work and best intentions of workers to do a good job but evidence is overwhelming that there are situations where individuals exert less effort in a group environment than they would if acting individually.

In 1913, Max Ringelmann conducted experiments that showed that individuals pulled less strenuously on a rope when pulling in a group than when pulling alone.

A realistically set and communicated takt time can assist in concentrating effort and communicating common work standards and the expectations of peers.

The poor supervisor

If Deming was so pessimistic as to believe that workers would sacrifice quality to hit targets then they would surely be more than happy to shunt enquiries off to their supervisor in order to post commendable performance. All that Deming’s proposal does is to divert the whole problem of difficult calls to the supervisor who, presumably, is either beset with his own performance problems or operates outside business measurement.

Deconstructing Deming X – Eliminate slogans!

10. Eliminate slogans, exhortations and targets for the workforce.

W Edwards Deming

Neither snow nor rain nor heat nor gloom of night stays these couriers from the swift completion of their appointed rounds.

Inscription on the James Farley Post Office, New York City, New York, USA
William Mitchell Kendall pace Herodotus

Now, that’s what I call a slogan. Is this what Point 10 of Deming’s 14 Points was condemning? There are three heads here, all making quite distinct criticisms of modern management. The important dimension of this criticism is the way in which managers use data in communicating with the wider organisation, in setting imperatives and priorities and in determining what individual workers will consider important when they are free from immediate supervision.

Eliminate slogans!

The US postal inscription at the head of this blog certainly falls within the category of slogans. Apparently the root of the word “slogan” is the Scottish Gaelic sluagh-ghairm meaning a battle cry. It seeks to articulate a solidarity and commitment to purpose that transcends individual doubts or rationalisation. That is what the US postal inscription seeks to do. Beyond the data on customer satisfaction, the demands of the business to protect and promote its reputation, the service levels in place for individual value streams, the tension between current performance and aspiration, the disappointment of missed objectives, it seeks to draw together the whole of the organisation around an ideal.

Slogans are part of the broader oral culture of an organisation. In the words of Lawrence Freedman (Strategy: A History, Oxford, 2013, p564) stories, and I think by extension slogans:

[make] it possible to avoid abstractions, reduce complexity, and make vital points indirectly, stressing the importance of being alert to serendipitous opportunities, discontented staff, or the one small point that might ruin an otherwise brilliant campaign.

But Freedman was quick to point out the use of stories by consultants and in organisations frequently confused anecdote with data. They were commonly used selectively and often contrived. Freedman sought to extract some residual value from the culture of business stories, in particular drawing on the work of psychologist Jerome Bruner along with Daniel Kahneman’s System 1 and System 2 thinking. The purpose of the narrative of an organisation, including its slogans and shared stories, is not to predict events but to define a context for action when reality is inevitably overtaken by a special cause.

In building such a rich narrative, slogans alone are an inert and lifeless tactic unless woven with the continual, rigorous criticism of historical data. In fact, it is the process behaviour chart that acts as the armature around which the narrative can be wound. Building the narrative will be critical to how individuals respond to the messages of the chart.

Deming himself coined plenty of slogans: “Drive out fear”, “Create joy in work”, … . They are not forbidden. But to be effective they must form a verisimilar commentary on, and motivation for, the hard numbers and ineluctable signals of the process behaviour chart.

Eliminate exhortations!

I had thought I would dismiss this in a single clause. It is, though, a little more complicated. The sports team captain who urges her teammates onwards to take the last gasp scoring opportunity doesn’t necessarily urge in vain. There is no analysis of this scenario. It is only muscle, nerve, sweat and emotion.

The English team just suffered a humiliating exit from the Cricket World Cup. The head coach’s response was “We’ll have to look at the data.” Andrew Miller in The Times (London) (10 March 2015) reflected most cricket fans’ view when he observed that “a team of meticulously prepared cricketers suffered a collective loss of nerve and confidence.” Exhortations might not have gone amiss.

It is not, though, a management strategy. If your principal means of managing risk, achieving compelling objectives, creating value and consistently delivering customer excellence, day in, day out is to yell “one more heave!” then you had better not lose your voice. In the long run, I am on the side of the analysts.

Slogans and exhortations will prove a brittle veneer on a stable system of trouble (RearView). It is there that they will inevitably corrode engagement, breed cynicism, foster distrust, and mask decline. Only the process behaviour chart can guard against the risk.

Eliminate targets for the workforce!

This one is more complicated. How do I communicate to the rest of the organisation what I need from them? What are the consequences when they don’t deliver? How do the rest of the organisation communicate with me? This really breaks down into two separate topics and they happen to be the two halves of Deming’s Point 11.

I shall return to those in my next two posts in the Deconstructing Deming series.