First thoughts on VW’s emmissions debacle

It is far too soon to tell exactly what went on at VW, in the wider motor industry, within the respective regulators and within governments. However, the way that the news has come out, and the financial and operational impact that it is likely to have, are enough to encourage all enterprises to revisit their risk management, governance and customer reputation management policies. Corporate scandals are not a new phenomenon, from the collapse of the Medici Bank in 1494, Warren Hastings’ alleged despotism in the British East India Company, down to the FIFA corruption allegations that broke earlier this year. Organisational scandals are as old as organisations. The bigger the organisations get, the bigger the scandals are going to be.

Normal Scandals

In 1984, Scott Perrow published his pessimistic analysis of what he saw as the inevitability of Normal Accidents in complex technologies. I am sure that there is a market for a book entitled Normal Scandals: Living with High-Risk Organisational Structures. But I don’t share Perrow’s pessimism. Life is getting safer. Let’s adopt the spirit of continual improvement to make investment safer too. That’s investment for those of us trying to accumulate a modest portfolio for retirement. Those who aspire to join the super rich will still have to take their chances.

I fully understand that organisations sometimes have to take existential risks to stay in business. The development of Rolls-Royce’s RB211 aero-engine well illustrates what happens when a manufacturer finds itself with proven technologies that are inadequately aligned with the Voice of the Customer. The market will not wait while the business catches up. There is time to develop a response but only if that solution works first time. In the case of Rolls-Royce it didn’t and insolvency followed. However, there was no alternative but to try.

What happened at VW? I just wonder whether the Iron Law of Oligarchy was at work. To imagine that a supervisory board sits around discussing the details of engine management software is naïve. In fact it was the RB211 crisis that condemned such signal failures of management to delegate. Do VW’s woes flow from a decision taken by a middle manager, or a blind eye turned, that escaped an inadequate system of governance? Perhaps a short term patch in anticipation of an ultimate solution?

Cardinal Newman’s contribution to governance theory

John Henry Newman learned about risk management the hard way. Newman was an English Anglican divine who converted to the Catholic Church in 1845. In 1850 Newman became involved in the controversy surrounding Giacinto Achilli, a priest expelled from the Catholic Church for rape and sexual assault but who was making a name from himself in England as a champion of the protestant evangelical cause. Conflict between Catholic and protestant was a significant feature of the nineteenth century English political landscape. Newman was minded to ensure that Achilli’s background was widely known. He took legal advice from counsel James Hope-Scott about the risks of a libel action from Achilli. Hope-Scott was reassuring and Newman published. The publication resulted in Newman’s prosecution and conviction for criminal libel.

Speculation about what legal advice VW have received as to their emissions strategy would be inappropriate. However, I trust that, if they imagined they were externalising any risk thereby, they checked the value of their legal advisors’ professional indemnity insurance.

Newman certainly seems to have learned his lesson and subsequently had much to teach the modern world about risk management and governance. After the Achilli trial Newman started work on his philosophical apologia, The Grammar of Assent. One argument in that book has had such an impact on modern thinking about evidence and probability that it was quoted in full by Bruno de Finetti in Volume 1 of his 1974 Theory of Probability.

Supposes a thesis (e.g. the guilt of an accused man) is supported by a great deal of circumstantial evidence of different forms, but in agreement with each other; then even if each piece of evidence is in itself insufficient to produce any strong belief, the thesis is decisively strengthened by their joint effect.

De Finetti set out the detailed mathematics and called this the Cardinal Newman principle. It is fundamental to the modern concept of borrowing strength.

The standard means of defeating governance are all well known to oligarchs, regulator capture, “stake-driving” – taking actions outside the oversight of governance that will not be undone without engaging the regulator in controversy, “whipsawing” – promising A that approval will be forthcoming from B while telling B that A has relied upon her anticipated, and surely “uncontroversial”, approval. There are plenty of others. Robert Caro’s biography The Power Broker: Robert Moses and the Fall of New York sets out the locus classicus.

Governance functions need to exploit the borrowing strength of diverse data sources to identify misreporting and misconduct. And continually improve how they do that. The answer is trenchant and candid criticism of historical data. That’s the only data you have. A rigorous system of goal deployment and mature use of process behaviour charts delivers a potent stimulus to reluctant data sharers.

Things and actions are what they are and the consequences of them will be what they will be: why then should we desire to be deceived?

Bishop Joseph Butler

 

Amazon II: The sales story

Jeff Bezos' iconic laugh.jpgI recently commented on an item in the New York Times about Amazon’s pursuit of “rigorous data driven management”. Dina Vaccari, one of the employees cited in the original New York Times article, has taken the opportunity to tell her own story in this piece. I found it enlightening as to what goes on at Amazon. Of course, it is only another anecdote from a former employee, a data source of notoriously limited quality. However, as Arthur Koestler once observed:

Without the hard little bits of marble which are called ‘facts’ or ‘data’ one cannot compose a mosaic; what matters, however, are not so much the individual bits, but the successive patterns into which you arrange them, then break them up and rearrange them.

Vaccari’s role was to sell Amazon gift cards. The measure of her success was how many she sold. Vaccari had read Timothy Ferriss’ transgressive little book The 4-Hour Workweek. She decided to employ a subcontractor from Chennai, India to generate for her 100 leads daily for $10. The idea worked out well. Another illustration of the law of comparative advantage.

Vaccari them emailed the leads, not with the standard email that she had been instructed to use by Amazon, but with a formula of her own. Vacarri claims a 10 to 50% response rate. She then followed up using her traditional sales skills, exceeding her sales target and besting the rest of the sales team.

That drew attention from her supervisor. Not unnaturally he wanted to capture good practice. When he saw Vaccari’s non-standard email he was critical. We now know that process discipline is important at Amazon. Nothing wrong with that though if you really want to exercise your mind on the topic you would do well to watch the Hollywood movie Crimson Tide.

What is more interesting is that, when Vaccari answered the criticism by pointing to her response and sales figures, the supervisor retorted that this was “just luck”.

So there we have it. Somebody made a change and the organisation couldn’t agree whether or not it was an improvement. Vaccari said she saw a signal. Her supervisor said that it was just noise.

The supervisor’s response was particularly odd as he was shadowing Vacarri because of his favourable perception of her performance. It is as though his assessment as to whether Vacarri’s results were signal or noise depended on his approval or disapproval of how she had achieved them. It certainly seems that this is not normative behaviour at Amazon. Vaccari criticises her supervisor for failing to display Amazon Leadership Principles. The exchange illustrates what happens if an organisation generates data but is then unable to turn it into a reliable basis for action because there is no systematic and transparent method for creating a consensus around what is signal and what, noise. Vicarri’s exchange with her supervisor is reassuring in that both recognised that there is an important distinction. Vacarri knew that a signal should be a tocsin for action, in this case to embed a successful innovation through company wide standardisation. Her supervisor knew that to mistake noise for a signal would lead to a degraded process performance. Or at least he hid behind that to project his disapproval. Vacarri’s recall of the incident makes her “cringe”. Numbers aren’t just about numbers.

Trenchant data criticism, motivated by the rigorous segregation of signal and noise, is the catalyst of continual improvement in sales, product quality, economic efficiency and market agility.

The goal is not to be driven by data but to be led by the insights it yields.

Productivity and how to improve it: I -The foundational narrative

Again, much talk in the UK media recently about weak productivity statistics. Chancellor of the Exchequer (Finance Minister) George Osborne has launched a 15 point macroeconomic strategy aimed at improving national productivity. Some of the points are aimed at incentivising investment and training. There will be few who argue against that though I shall come back to the investment issue when I come to talk about signal and noise. I have already discussed training here. In any event, the strategy is fine as far as these things go. Which is not very far.

There remains the microeconomic task for all of us of actually improving our own productivity and that of the systems we manage. That is not the job of government.

Neither can I offer any generalised system for improving productivity. It will always be industry and organisation dependent. However, I wanted to write about some of the things that you have to understand if your efforts to improve output are going to be successful and sustainable.

  • Customer value and waste.
  • The difference between signal and noise.
  • How to recognise flow and manage a constraint.

Before going on to those in future weeks I first wanted to go back and look at what has become the foundational narrative of productivity improvement, the Hawthorne experiments. They still offer some surprising insights.

The Hawthorne experiments

In 1923, the US electrical engineering industry was looking to increase the adoption of electric lighting in American factories. Uptake had been disappointing despite the claims being made for increased productivity.

[Tests in nine companies have shown that] raising the average initial illumination from about 2.3 to 11.2 foot-candles resulted in an increase in production of more than 15%, at an additional cost of only 1.9% of the payroll.

Earl A Anderson
General Electric
Electrical World (1923)

E P Hyde, director of research at GE’s National Lamp Works, lobbied government for the establishment of a Committee on Industrial Lighting (“the CIL”) to co-ordinate marketing-oriented research. Western Electric volunteered to host tests at their Hawthorne Works in Cicero, IL.

Western Electric came up with a study design that comprised a team of experienced workers assembling relays, winding their coils and inspecting them. Tests commenced in November 1924 with active support from an elite group of academic and industrial engineers including the young Vannevar Bush, who would himself go on to an eminent career in government and science policy. Thomas Edison became honorary chairman of the CIL.

It’s a tantalising historical fact that Walter Shewhart was employed at the Hawthorne Works at the time but I have never seen anything suggesting his involvement in the experiments, nor that of his mentor George G Edwards, nor protégé Joseph Juran. In later life, Juran was dismissive of the personal impact that Shewhart had had on operations there.

However, initial results showed no influence of light level on productivity at all. Productivity rose throughout the test but was wholly uncorrelated with lighting level. Theories about the impact of human factors such as supervision and motivation started to proliferate.

A further schedule of tests was programmed starting in September 1926. Now, the lighting level was to be reduced to near darkness so that the threshold of effective work could be identified. Here is the summary data (from Richard Gillespie Manufacturing Knowledge: A History of the Hawthorne Experiments, Cambridge, 1991).

Hawthorne data-1

It requires no sophisticated statistical analysis to see that the data is all noise and no signal. Much to the disappointment of the CIL, and the industry, there was no evidence that illumination made any difference at all, even down to conditions of near darkness. It’s striking that the highest lighting levels embraced the full range of variation in productivity from the lowest to the highest. What had seemed so self evidently a boon to productivity was purely incidental. It is never safe to assume that a change will be an improvement. As W Edwards Deming insisted, “In God was trust. All others bring data.”

But the data still seemed to show a relentless improvement of productivity over time. The participants were all very experienced in the task at the start of the study so there should have been no learning by doing. There seemed no other explanation than that the participants were somehow subliminally motivated by the experimental setting. Or something.

Hawthorne data-2

That subliminally motivated increase in productivity came to be known as the Hawthorne effect. Attempts to explain it led to the development of whole fields of investigation and organisational theory, by Elton Mayo and others. It really was the foundation of the management consulting industry. Gillespie (supra) gives a rich and intriguing account.

A revisionist narrative

Because of the “failure” of the experiments’ purpose there was a falling off of interest and only the above summary results were ever published. The raw data were believed destroyed. Now “you know, at least you ought to know, for I have often told you so” about Shewhart’s two rules for data presentation.

  1. Data should always be presented in such a way as to preserve the evidence in the data for all the predictions that might be made from the data.
  2. Whenever an average, range or histogram is used to summarise observations, the summary must not mislead the user into taking any action that the user would not take if the data were presented in context.

The lack of any systematic investigation of the raw data led to the development of a discipline myth that every single experimental adjustment had led forthwith to an increase in productivity.

In 2009, Steven Levitt, best known to the public as the author of Freakonomics, along with John List and their research team, miraculously discovered a microfiche of the raw study data at a “small library in Milwaukee, WI” and the remainder in Boston, MA. They went on to analyse the data from scratch (Was there Really a Hawthorne Effect at the Hawthorne Plant? An Analysis of the Original Illumination Experiments, National Bureau of Economic Research, Working Paper 15016, 2009).

LevittHawthonePlot

Figure 3 of Levitt and List’s paper (reproduced above) shows the raw productivity measurements for each of the experiments. Levitt and List show how a simple plot such as this reveals important insights into how the experiments developed. It is a plot that yields a lot of information.

Levitt and List note that, in the first phase of experiments, productivity rose then fell when experiments were suspended. They speculate as to whether there was a seasonal effect with lower summer productivity.

The second period of experiments is that between the third and fourth vertical lines in the figure. Only room 1 experienced experimental variation in this period yet Levitt and List contend that productivity increased in all three rooms, falling again at the end of experimentation.

During the final period, data was only collected from room 1 where productivity continued to rise, even beyond the end of the experiment. Looking at the data overall, Levitt and List find some evidence that productivity responded more to changes in artificial light than to natural light. The evidence that increases in productivity were associated with every single experimental adjustment is weak. To this day, there is no compelling explanation of the increases in productivity.

Lessons in productivity improvement

Deming used to talk of “disappointment in great ideas”, the propensity for things that looked so good on paper simply to fail to deliver the anticipated benefits. Nobel laureate psychologist Daniel Kahneman warns against our individual bounded rationality.

To guard against entrapment by the vanity of imagination we need measurement and data to answer the ineluctable question of whether the change we implemented so passionately resulted in improvement. To be able to answer that question demands the separation of signal from noise. That requires trenchant data criticism.

And even then, some factors may yet be beyond our current knowledge. Bounded rationality again. That is why the trick of continual improvement in productivity is to use the rigorous criticism of historical data to build collective knowledge incrementally.

If you torture the data enough, nature will always confess.

Ronald Coase

Eventually.

FIFA and the Iron Law of Oligarchy

Йозеф Блаттер.jpgIn 1911, Robert Michels embarked on one of the earliest investigations into organisational culture. Michels was a pioneering sociologist, a student of Max Weber. In his book Political Parties he aggregated evidence about a range of trade unions and political groups, in particular the German Social Democratic Party.

He concluded that, as organisations become larger and more complex, a bureaucracy inevitably forms to take, co-ordinate and optimise decisions. It is the most straightforward way of creating alignment in decision making and unified direction of purpose and policy. Decision taking power ends up in the hands of a few bureaucrats and they increasingly use such power to further their own interests, isolating themselves from the rest of the organisation to protect their privilege. Michels called this the Iron Law of Oligarchy.

These are very difficult matters to capture quantitavely and Michels’ limited evidential sampling frame has more of the feel of anecdote than data. “Iron Law” surely takes the matter too far. However, when we look at the allegations concerning misconduct within FIFA it is tempting to feel that Michels’ theory is validated, or at least has gathered another anecdote to take the evidence base closer to data.

But beyond that, what Michels surely identifies is a danger that a bureaucracy, a management cadre, can successfully isolate itself from superior and inferior strata in an organisation, limiting the mobility of business data and fostering their own ease. The legitimate objectives of the organisation suffer.

Michels failed to identify a realistic solution, being seduced by the easy, but misguided, certainties of fascism. However, I think that a rigorous approach to the use of data can guard against some abuses without compromising human rights.

Oligarchs love traffic lights

I remember hearing the story of a CEO newly installed in a mature organisation. His direct reports had instituted a “traffic light” system to report status to the weekly management meeting. A green light meant all was well. An amber light meant that some intervention was needed. A red light signalled that threats to the company’s goals had emerged. At his first meeting, the CEO found that nearly all “lights” were green, with a few amber. The new CEO perceived an opportunity to assert his authority and show his analytical skills. He insisted that could not be so. There must be more problems and he demanded that the next meeting be an opportunity for honesty and confronting reality.

At the next meeting there was a kaleidoscope of red, amber and green “lights”. Of course, it turned out that the managers had flagged as red the things that were either actually fine or could be remedied quickly. They could then report green at the following meeting. Real career limiting problems were hidden behind green lights. The direct reports certainly didn’t want those exposed.

Openness and accountability

I’ve quoted Nobel laureate economist Kenneth Arrow before.

… a manager is an information channel of decidedly limited capacity.

Essays in the Theory of Risk-Bearing

Perhaps the fundamental problem of organisational design is how to enable communication of information so that:

  • Individual managers are not overloaded.
  • Confidence in the reliable satisfaction of process and organisational goals is shared.
  • Systemic shortfalls in process capability are transparent to the managers responsible, and their managers.
  • Leading indicators yield early warnings of threats to the system.
  • Agile responses to market opportunities are catalysed.
  • Governance functions can exploit the borrowing strength of diverse data sources to identify misreporting and misconduct.

All that requires using analytics to distinguish between signal and noise. Traffic lights offer a lousy system of intra-organisational analytics. Traffic light systems leave it up to the individual manager to decide what is “signal” and what “noise”. Nobel laureate psychologist Daniel Kahneman has studied how easily managers are confused and misled in subjective attempts to separate signal and noise. It is dangerous to think that What you see is all there is. Traffic lights offer a motley cloak to an oligarch wishing to shield his sphere of responsibility from scrutiny.

The answer is trenchant and candid criticism of historical data. That’s the only data you have. A rigorous system of goal deployment and mature use of process behaviour charts delivers a potent stimulus to reluctant data sharers. Process behaviour charts capture the development of process performance over time, for better or for worse. They challenge the current reality of performance through the Voice of the Customer. They capture a shared heuristic for characterising variation as signal or noise.

Individual managers may well prefer to interpret the chart with various competing narratives. The message of the data, the Voice of the Process, will not always be unambiguous. But collaborative sharing of data compels an organisation to address its structural and people issues. Shared data generation and investigation encourage an organisation to find practical ways of fostering team work, enabling problem solving and motivating participation. It is the data that can support the organic emergence of a shared organisational narrative that adds further value to the data and how it is used and developed. None of these organisational and people matters have generalised solutions but a proper focus on data drives an organisation to find practical strategies that work within their own context. And to test the effectiveness of those strategies.

Every week the press discloses allegations of hidden or fabricated assets, repudiated valuations, fraud, misfeasance, regulators blindsided, creative reporting, anti-competitive behaviour, abused human rights and freedoms.

Where a proper system of intra-organisational analytics is absent, you constantly have to ask yourself whether you have another FIFA on your hands. The FIFA allegations may be true or false but that they can be made surely betrays an absence of effective governance.

#oligarchslovetrafficlights

Deconstructing Deming XI B – Eliminate numerical goals for management

11. Part B. Eliminate numerical goals for management.

W. Edwards Deming.jpgA supposed corollary to the elimination of numerical quotas for the workforce.

This topic seems to form a very large part of what passes for exploration and development of Deming’s ideas in the present day. It gets tied in to criticisms of remuneration practices and annual appraisal, and target-setting in general (management by objectives). It seems to me that interest flows principally from a community who have some passionately held emotional attitudes to these issues. Advocates are enthusiastic to advance the views of theorists like Alfie Kohn who deny, in terms, the effectiveness of traditional incentives. It is sad that those attitudes stifle analytical debate. I fear that the problem started with Deming himself.

Deming’s detailed arguments are set out in Out of the Crisis (at pp75-76). There are two principle reasoned objections.

  1. Managers will seek empty justification from the most convenient executive time series to hand.
  2. Surely, if we can improve now, we would have done so previously, so managers will fall back on (1).

The executive time series

I’ve used the time series below in some other blogs (here in 2013 and here in 2012). It represents the anual number of suicides on UK railways. This is just the data up to 2013.
RailwaySuicides2

The process behaviour chart shows a stable system of trouble. There is variation from year to year but no significant (sic) pattern. There is noise but no signal. There is an average of just over 200 fatalities, varying irregularly between around 175 and 250. Sadly, as I have discussed in earlier blogs, simply selecting a pair of observations enables a polemicist to advance any theory they choose.

In Railway Suicides in the UK: risk factors and prevention strategies, Kamaldeep Bhui and Jason Chalangary of the Wolfson Institute of Preventive Medicine, and Edgar Jones of the Institute of Psychiatry, King’s College, London quoted the Rail Safety and Standards Board (RSSB) in the following two assertions.

  • Suicides rose from 192 in 2001-02 to a peak 233 in 2009-10; and
  • The total fell from 233 to 208 in 2010-11 because of actions taken.

Each of these points is what Don Wheeler calls an executive time series. Selective attention, or inattention, on just two numbers from a sequence of irregular variation can be used to justify any theory. Deming feared such behaviour could be perverted to justify satisfaction of any goal. Of course, the process behaviour chart, nowhere more strongly advocated than by Deming himself in Out of the Crisis, is the robust defence against such deceptions. Diligent criticism of historical data by means of process behaviour charts is exactly what is needed to improve the business and exactly what guards against success-oriented interpretations.

Wishful thinking, and the more subtle cognitive biases studied by Daniel Kahneman and others, will always assist us in finding support for our position somewhere in the data. Process behaviour charts keep us objective.

If not now, when?

If I am not for myself, then who will be for me?
And when I am for myself, then what am “I”?
And if not now, when?

Hillel the Elder

Deming criticises managerial targets on the grounds that, were the means of achieving the target known, it would already have been achieved and, further, that without having the means efforts are futile at best. It’s important to remember that Deming is not here, I think, talking about efforts to stabilise a business process. Deming is talking about working to improve an already stable, but incapable, process.

There are trite reasons why a target might legitimately be mandated where it has not been historically realised. External market conditions change. A manager might unremarkably be instructed to “Make 20% more of product X and 40% less of product Y“. That plays in to the broader picture of targets’ role in co-ordinating the parts of a system, internal to the organisation of more widely. It may be a straightforward matter to change the output of a well-understood, stable system by an adjustment of the inputs.

Deming says:

If you have a stable system, then there is no use to specify a goal. You will get whatever the system will deliver.

But it is the manager’s job to work on a stable system to improve its capability (Out of the Crisis at pp321-322). That requires capital and a plan. It involves a target because the target captures the consensus of the whole system as to what is required, how much to spend, what the new system looks like to its customer. Simply settling for the existing process, being managed through systematic productivity to do its best, is exactly what Deming criticises at his Point 1 (Constancy of purpose for improvement).

Numerical goals are essential

… a manager is an information channel of decidedly limited capacity.

Kenneth Arrow
Essays in the Theory of Risk-Bearing

Deming’s followers have, to some extent, conceded those criticisms. They say that it is only arbitrary targets that are deprecated and not the legitimate Voice of the Customer/ Voice of the Business. But I think they make a distinction without a difference through the weasel words “arbitrary” and “legitimate”. Deming himself was content to allow managerial targets relating to two categories of existential risk.

However, those two examples are not of any qualitatively different type from the “Increase sales by 10%” that he condemns. Certainly back when Deming was writing Out of the Crisis most OELs were based on LD50 studies, a methodology that I am sure Deming would have been the first to criticise.

Properly defined targets are essential to business survival as they are one of the principal means by which the integrated function of the whole system is communicated. If my factory is producing more than I can sell, I will not work on increasing capacity until somebody promises me that there is a plan to improve sales. And I need to know the target of the sales plan to know where to aim with plant capacity. It is no good just to say “Make as much as you can. Sell as much as you can.” That is to guarantee discoordination and inefficiency. It is unsurprising that Deming’s thinking has found so little real world implementation when he seeks to deprive managers of one of the principle tools of managing.

Targets are dangerous

I have previously blogged about what is needed to implement effective targets. An ill judged target can induce perverse incentives. These can be catastrophic for an organisation, particularly one where the rigorous criticism of historical data is absent.