Imagine …

Ben Bernanke official portrait.jpgNo, not John Lennon’s dreary nursery rhyme for hippies.

In his memoir of the 2007-2008 banking crisis, The Courage to ActBen Benanke wrote about his surprise when the crisis materialised.

We saw, albeit often imperfectly, most of the pieces of the puzzle. But we failed to understand – “failed to imagine” might be a better phrase – how those pieces would fit together to produce a financial crisis that compared to, and arguably surpassed, the financial crisis that ushered in the Great Depression.

That captures the three essentials of any attempt to foresee a complex future.

  • The pieces
  • The fit
  • Imagination

In any well managed organisation, “the pieces” consist of the established Key Performance Indicators (KPIs) and leading measures. Diligent and rigorous criticism of historical data using process behaviour charts allows departures from stability to be identified timeously. A robust and disciplined system of management and escalation enables an agile response when special causes arise.

Of course, “the fit” demands a broader view of the data, recognising interactions between factors and the possibility of non-simple global responses remote from a locally well behaved response surface. As the old adage goes, “Fit locally. Think globally.” This is where the Cardinal Newman principle kicks in.

“The pieces” and “the fit”, taken at their highest, yield a map of historical events with some limited prediction as to how key measures will behave in the future. Yet it is common experience that novel factors persistently invade. The “bow wave” of such events will not fit a recognised pattern where there will be a ready consensus as to meaning, mechanism and action. These are the situations where managers are surprised by rapidly emerging events, only to protest, “We never imagined …”.

Nassim Taleb’s analysis of the financial crisis hinged on such surprises and took him back to the work of British economist G L S Shackle. Shackle had emphasised the importance of imagination in economics. Put at its most basic, any attempt to assign probabilities to future events depends upon the starting point of listing the alternatives that might occur. Statisticians call it the sample space. If we don’t imagine some specific future we won’t bother thinking about the probability that it might come to be. Imagination is crucial to economics but it turns out to be much more pervasive as an engine of improvement that at first is obvious.

Imagination and creativity

Frank Whittle had to imagine the jet engine before he could bring it into being. Alan Turing had to imagine the computer. They were both fortunate in that they were able to test their imagination by construction. It was all realised in a comparatively short period of time. Whittle’s and Turing’s respective imaginations were empirically verified.

What is now proved was once but imagined.

William Blake

Not everyone has had the privilege of seeing their imagination condense into reality within their lifetime. In 1946, Sir George Paget Thomson and Moses Blackman imagined a plentiful source of inexpensive civilian power from nuclear fusion. As of writing, prospects of a successful demonstration seem remote. Frustratingly, as far as I can see, the evidence still refuses to tip the balance as to whether future success is likely or that failure is inevitable.

Something as illusive as imagination can have a testable factual content. As we know, not all tests are conclusive.

Imagination and analysis

Imagination turns out to be essential to something as prosaic as Root Cause Analysis. And essential in a surprising way. Establishing an operative cause of a past event is an essential task in law and engineering. It entails the search for a counterfactual, not what happened but what might have happened to avoid the  regrettable outcome. That is inevitably an exercise in imagination.

In almost any interesting situation there will be multiple imagined pasts. If there is only one then it is time to worry. Sometimes it is straightforward to put our ideas to the test. This is where the Shewhart cycle comes into its own. In other cases we are in the realms of uncomfortable science. Sometimes empirical testing is frustrated because the trail has gone cold.

The issues of counterfactuals, Root Cause Analysis and causation have been explored by psychologists Daniel Kahneman1 and Ruth Byrne2 among others. Reading their research is a corrective to the optimistic view that Root Cause analysis is some sort of inevitably objective process. It is distorted by all sorts of heuristics and biases. Empirical testing is vital, if only through finding some data with borrowing strength.

Imagine a millennium bug

In 1984, Jerome and Marilyn Murray published Computers in Crisis in which they warned of a significant future risk to global infrastructure in telecommunications, energy, transport, finance, health and other domains. It was exactly those areas where engineers had been enthusiastic to exploit software from the earliest days, often against severe constraints of memory and storage. That had led to the frequent use of just two digits to represent a year, “71” for 1971, say. From the 1970s, software became more commonly embedded in devices of all types. As the year 2000 approached, the Murrays envisioned a scenario where the dawn of 1 January 2000 was heralded by multiple system failures where software registers reset to the year 1900, frustrating functions dependent on timing and forcing devices into a fault mode or a graceless degradation. Still worse, systems may simply malfunction abruptly and without warning, the only sensible signal being when human wellbeing was compromised. And the ruinous character of such a threat would be that failure would be inherently simultaneous and global, with safeguarding systems possibly beset with the same defects as the primary devices. It was easy to imagine a calamity.

Risk matrixYou might like to assess that risk yourself (ex ante) by locating it on the Risk Assessment Matrix to the left. It would be a brave analyst who would categorise it as “Low”, I think. Governments and corporations were impressed and embarked on a massive review of legacy software and embedded systems, estimated to have cost around $300 billion at year 2000 prices. A comprehensive upgrade programme was undertaken by nearly all substantial organisations, public and private.

Then, on 1 January 2000, there was no catastrophe. And that caused consternation. The promoters of the risk were accused of having caused massive expenditure and diversion of resources against a contingency of negligible impact. Computer professionals were accused, in terms, of self-serving scare mongering. There were a number of incidents which will not have been considered minor by the people involved. For example, in a British hospital, tests for Down’s syndrome were corrupted by the bug resulting in contra-indicated abortions and births. However, there was no global catastrophe.

This is the locus classicus of a counterfactual. Forecasters imagined a catastrophe. They persuaded others of their vision and the necessity of vast expenditure in order to avoid it. The preventive measures were implemented at great costs. The Catastrophe did not occur. Ex post, the forecasters were disbelieved. The danger had never been real. Even Cassandra would have sympathised.

Critics argued that there had been a small number of relatively minor incidents that would have been addressed most economically on a “fix on failure” basis. Much of this turns out to be a debate about the much neglected column of the risk assessment headed “Detectability”. Where a failure will inflict immediate pain, it is so much more critical as to management and mitigation than a failure that will present the opportunity for detection and protection in advance of a broader loss. Here, forecasting Detectability was just as important as Probability and Consequences in arriving at an economic strategy for management.

It is the fundamental paradox of risk assessment that, where control measures eliminate a risk, it is not obvious whether the benign outcome was caused by the control or whether the risk assessment was just plain wrong and the risk never existed. Another counterfactual. Again, finding some alternative data with borrowing strength can help though it will ever be difficult to build a narrative appealing to a wide population. There are links to some sources of data on the Wikipedia article about the bug. I will leave it to the reader.

Imagine …

Of course it is possible to find this all too difficult and to adopt the Biblical outlook.

I returned, and saw under the sun, that the race is not to the swift, nor the battle to the strong, neither yet bread to the wise, nor yet riches to men of understanding, nor yet favour to men of skill; but time and chance happeneth to them all.

Ecclesiastes 9:11
King James Bible

That is to adopt the outlook of the lady on the level crossing. Risk professionals look for evidence that their approach works.

The other day, I was reading the annual report of the UK Health and Safety Executive (pdf). It shows a steady improvement in the safety of people at work though oddly the report is too coy to say this in terms. The improvement occurs over the period where risk assessment has become ubiquitous in industry. In an individual work activity it will always be difficult to understand whether interventions are being effective. But using the borrowing strength of the overall statistics there is potent evidence that risk assessment works.

References

  1. Kahneman, D & Tversky, A (1979) “The simulation heuristic”, reprinted in Kahneman et al. (1982) Judgment under Uncertainty: Heuristics and Biases, Cambridge, p201
  2. Byrne, R M J (2007) The Rational Imagination: How People Create Alternatives to Reality, MIT Press
Advertisement

Superforecasting – the thing that TalkTalk didn’t do

I have just been reading Superforecasting: The Art and Science of Prediction by Philip Tetlock and Dan Gardner. The book has attracted much attention and enthusiasm in the press. It makes a bold claim that some people, superforecasters, though inexpert in the conventional sense, are possessed of the ability to make predictions with a striking degree of accuracy, that those individuals exploit a strategy for forecasting applicable even to the least structured evidence, and that the method can be described and learned. The book summarises results of a study sponsored by US intelligence agencies as part of the Good Judgment Project but, be warned, there is no study data in the book.

I haven’t found any really good distinction between forecasting and prediction so I might swap between the two words arbitrarily here.

What was being predicted?

The forecasts/ predictions in question were all in the field of global politics and economics. For example, a question asked in January 2011 was:

Will Italy restructure or default on its debt by 31 December 2011?

This is a question that invited a yes/ no answer. However, participants were encouraged to answer with a probability, a number between 0% and 100% inclusive. If they were certain of the outcome they could answer 100%, if certain that it would not occur, 0%. The participants were allowed, I think encouraged, to update and re-update their forecasts at any time. So, as far as I can see, a forecaster who predicted 60% for Italian debt restructuring in January 2011 could revise that to 0% in December, even up to the 30th. Each update was counted as a separate forecast.

The study looked for “Goldilocks” problems, not too difficult, not to easy but just right.

Bruno de Finetti was very sniffy about using the word “prediction” in this context and preferred the word “prevision”. It didn’t catch on.

Who was studied?

The study was conducted by means of a tournament among volunteers. I gather that the participants wanted to be identified and thereby personally associated with their scores. Contestants had to be college graduates and, as a preliminary, had to complete a battery of standard cognitive and general knowledge tests designed to characterise their given capabilities. The competitors in general fell in the upper 30 percent of the general population for intelligence and knowledge. When some book reviews comment on how the superforecasters included housewives and unemployed factory workers I think they give the wrong impression. This was a smart, self-selecting, competitive group with an interest in global politics. As far as I can tell, backgrounds in mathematics, science and computing were typical. It is true that most were amateurs in global politics.

With such a sampling frame, of what population is it representative? The authors recognise that problem though don’t come up with an answer.

How measured?

Forecasters were assessed using Brier scores. I fear that Brier scores fail to be intuitive, are hard to understand and raise all sorts of conceptual difficulties. I don’t feel that they are sufficiently well explained, challenged or justified in the book. Suppose that a competitor predicts a probability p for the Italian default of 60%. Rewrite this as a probability in the range 0 to 1 for convenience, 0.6 If the competitor accepts finite additivity then the probability of “no default” is 1- 0.6 = 0.4. Now suppose that outcomes f are coded as 1 when confirmed and 0 when disconfirmed. That means that if a default occurs then f ( default ) = 1 and f (no default ) = 0. If there is no default then f ( default ) = 0 and f (no default ) = 1. It’s not easy to get. We then take the difference between the ps and the fs, calculate the square of the differences and sum them. This is illustrated below for “no default” which yields a Brier score of 0.72.

Event p f ( pf ) 2
Default 0.6 0 0.36
No default 0.4 1 0.36
Sum 1.0 0.72

Suppose we were dealing with a fair coin toss. Nobody would criticise a forecasting probability of 50% for heads and 50% for tails. The long run Brier score would be 0.5 (think about it). Brier scores were averaged for each competitor and used as the basis of ranking them. If a competitor updated a prediction then every fresh update was counted as an individual prediction and each prediction was scored. More on this later. An average of 0.5 would be similar to a chimp throwing darts at a target. That is about how well expert professional forecasters had performed in a previous study. The lower the score the better. Zero would be perfect foresight.

I would have liked to have seen some alternative analyses and I think that a Hosmer-Lemeshow statistic or detailed calibration study would in some ways have been more intuitive and yielded more insight.

What the results?

The results are not given in the book, only some anecdotes. Competitor Doug Lorch, a former IBM programmer it says, answered 104 questions in the first year and achieved a Brier score of 0.22. He was fifth on the drop list. The top 58 competitors, the superforecasters, had an average Brier score of 0.25 compared with 0.37 for the balance. In the second year, Lorch joined a team of other competitors identified as superforecasters and achieved an average Brier score of 0.14. He beat a prediction market of traders dealing in futures in the outcomes, the book says by 40% though it does not explain what that means.

I don’t think that I find any of that, in itself, persuasive. However, there is a limited amount of analysis here on the (old) Good Judgment Project website. Despite the reservations I have set out above there are some impressive results, in particular this chart.

The competitors’ Brier scores were measured over the first 25 questions. The 100 with the lowest scores were identified, the blue line. The chart then shows the performance of that same group of competitors over the subsequent 175 questions. Group membership is not updated. It is the same 100 competitors as performed best at the start who are plotted across the whole 200 questions. The red line shows the performance of the worst 100 competitors from the first 25 questions, again with the same cohort plotted for all 200 questions.

Unfortunately, it is not raw Brier scores that are plotted but standardised scores. The scores have been adjusted so that the mean is zero and standard deviation one. That actually adds nothing to the chart but obscures somewhat how it is interpreted. I think that violates all Shewhart’s rules of data presentation.

That said, over the first 25 questions the blue cohort outperform the red. Then that same superiority of performance is maintained over the subsequent 175 questions. We don’t know how much is the difference in performance because of the standardisation. However, the superiority of performance is obvious. If that is a mere artefact of the data then I am unable to see how. Despite the way that data is presented and my difficulties with Brier scores, I cannot think of any interpretation other than there being a cohort of superforecasters who were, in general, better at prediction than the rest.

Conclusions

Tetlock comes up with some tentative explanations as to the consistent performance of the best. In particular he notes that the superforecasters updated their predictions more frequently than the remainder. Each of those updates was counted as a fresh prediction. I wonder how much of the variation in Brier scores is accounted for by variation in the time of making the forecast? If superforecasters are simply more active than the rest, making lots of forecasts once the outcome is obvious then the result is not very surprising.

That may well not be the case as the book contends that superforecasters predicting 300 days in the future did better than the balance of competitors predicting 100 days. However, I do feel that the variation arising from the time a prediction was made needs to be taken out of the data so that the variation in, shall we call it, foresight can be visualised. The book is short on actual analysis and I would like to have seen more. Even in a popular management book.

The data on the website on purported improvements from training is less persuasive, a bit of an #executivetimeseries.

Some of the recommendations for being a superforecaster are familiar ideas from behavoural psychology. Be a fox not a hedgehog, don’t neglect base rates, be astute to the distinction between signal and noise, read widely and richly, etc..

Takeaways

There was one unanticipated and intriguing result. The superforecasters updated their predictions not only frequently but by fine degrees, perhaps from 61% to 62%. I think that some further analysis is required to show that that is not simply an artefact of the measurement. Because Brier scores have a squared term they would be expected to punish the variation in large adjustments.

However, taking the conclusion at face value, it has some important consequences for risk assessment which often proceeds by broadly granular ranking on a rating scale of 1 to 5, say. The study suggests that the best predictions will be those where careful attention is paid to fine gradations in probability.

Of course, continual updating of predictions is essential to even the most primitive risk management though honoured more often in the breach than the observance. I shall come back to the significance of this for risk management in a future post.

There is also an interesting discussion about making predictions in teams but I shall have to come back to that another time.

The amateurs out-performed the professionals on global politics. I wonder if the same result would have been encountered against experts in structural engineering.

And TalkTalk? They forgot, pace Stanley Baldwin, that the hacker will always get through.

Professor Tetlock invites you to join the programme at http://www.goodjudgment.com.

Royal babies and the wisdom of crowds

Prince George of Cambridge with wombat plush toy (crop).jpgIn 2004 James Surowiecki published a book with the unequivocal title The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. It was intended as a gloss on Charles Mackay’s 1841 book Extraordinary Popular Delusions and the Madness of Crowds. Both books are essential reading for any risk professional.

I am something of a believer in the wisdom of crowds. The other week I was fretting about the possible relegation of English Premier League soccer club West Bromwich Albion. It’s an emotional and atavistic tie for me. I always feel there is merit, as part of my overall assessment of risk, in checking online bookmakers’ odds. They surely represent the aggregated risk assessment of gamblers if nobody else. I was relieved that bookmakers were offering typically 100/1 against West Brom being relegated. My own assessment of risk is, of course, contaminated with personal anxiety so I was pleased that the crowd was more phlegmatic.

However, while I was on the online bookmaker’s website, I couldn’t help but notice that they were also accepting bets on the imminent birth of the royal baby, the next child of the Duke and Duchess of Cambridge. It struck me as weird that anyone would bet on the sex of the royal baby. Surely this was a mere coin toss, though I know that people will bet on that. Being hopelessly inquisitive I had a look. I was somewhat astonished to find these odds being offered (this was 22 April 2015, ten days before the royal birth).

odds implied probability
Girl 1/2 0.67
Boy 6/4 0.40
 Total 1.07

Here I have used the usual formula for converting between odds and implied probabilities: odds of m / n against an event imply a probability of n / (m + n) of the event occurring. Of course, the principle of finite additivity requires that probabilities add up to one. Here they don’t and there is an overround of 7%. Like the rest of us, bookmakers have to make a living and I was unsurprised to find a Dutch book.

The odds certainly suggested that the crowd thought a girl manifestly more probable than a boy. Bookmakers shorten the odds on the outcome that is attracting the money to avoid a heavy payout on an event that the crowd seems to know something about.

Historical data on sex ratio

I started, at this stage, to doubt my assumption that boy/ girl represented no more than a coin toss, 50:50, an evens bet. As with most things, sex ratio turns out to be an interesting subject. I found this interesting research paper which showed that sex ratio was definitely dependent on factors such as the age and ethnicity of the mother. The narrative of this chart was very interesting.

Sex ratio

However, the paper confirmed that the sex of a baby is independent of previous births, conditioned on the factors identified, and that the ratio of girls to boys is nowhere and no time greater than 1,100 to 1000, about 52% girls.

So why the odds?

Bookmakers lengthen the odds on the outcome attracting the smaller value of bets in order to encourage stakes on the less fancied outcomes, on which there is presumably less risk of having to pay out. At odds of 6/4, a punter betting £10 on a boy would receive his stake back plus £15 ( = 6 × £10 / 4 ). If we assume an equal chance of boy or girl then that is an expected return of £12.50 ( = 0.5 × £25 ) for a £10.00 stake. I’m not sure I’d seen such a good value wager since we all used to bet against Tim Henman winning Wimbledon.

Ex ante there are two superficially suggestive explanations as to the asymmetry in the odds. At least this is all my bounded rationality could imagine.

  • A lot of people (mistakenly) thought that the run of five male royal births (Princes Andrew, Edward, William, Harry and George) escalated the probability of a girl being next. “It was overdue.”
  • A lot of people believed that somebody “knew something” and that they knew what it was.

In his book about cognitive biases in decision making (Thinking, Fast and Slow, Allen Lane, 2011) Nobel laureate economist Daniel Kahneman describes widespread misconceptions concerning randomness of boy/ girl birth outcomes (at p115). People tend to see regularity in sequences of data as evidence of non-randomness, even where patterns are typical of, and unsurprising in, random events.

I had thought that there could not be sufficient gamblers who would be fooled by the baseless belief that a long run of boys made the next birth more likely to be a girl. But then Danny Finkelstein reminded me (The (London) Times, Saturday 25 April 2015) of a survey of UK politicians that revealed their limited ability to deal with chance and probabilities. Are politicians more or less competent with probabilities than online gamblers? That is a question for another day. I could add that the survey compared politicians of various parties but we have an on-going election campaign in the UK at the moment so I would, in the interest of balance, invite my voting-age UK readers not to draw any inferences therefrom.

The alternative is the possibility that somebody thought that somebody knew something. The parents avowed that they didn’t know. Medical staff may or may not have. The sort of people who work in VIP medicine in the UK are not the sort of people who divulge information. But one can imagine that a random shift in sentiment, perhaps because of the misconception that a girl was “overdue”, and a consequent drift in the odds, could lead others to infer that there was insight out there. It is not completely impossible. How many other situations in life and business does that model?

It’s a girl!

The wisdom of crowds or pure luck? We shall never know. I think it was Thomas Mann who observed that the best proof of the genuineness of a prophesy was that it turned out to be false. Had the royal baby been a boy we could have been sure that the crowd was mad.

To be complete, Bayes’ theorem tells us that the outcome should enhance our degree of belief in the crowd’s wisdom. But it is a modest increase (Bayes’ factor of 2, 3 deciban after Alan Turing’s suggestion) and as we were most sceptical before we remain unpersuaded.

In his book, Surowiecki identified five factors that can impair crowd intelligence. One of these is homogeneity. Insufficient diversity frustrates the inherent virtue on which the principle is founded. I wonder how much variety there is among online punters? Similarly, where judgments are made sequentially there is a danger of influence. That was surely a factor at work here. There must also have been an element of emotion, the factor that led to all those unrealistically short odds on Henman at Wimbledon on which the wise dined so well.

But I’m trusting that none of that applies to the West Brom odds.

Proposition 65

WarningPoster1I had break from posting following my recent family vacation to California. While I was out there I noticed this rather alarming notice at a beach hotel and restaurant in Santa Monica. After a bit of research it turned out that the notice was motivated by California Proposition 65 (1986). Everywhere we went in California we saw similar notices.

I stand in this issue not solely as somebody professionally involved in risk but also as an individual concerned for his own health and that of his family. If there is an audience for warnings of harm then it is me.

I am aware of having embarked on a huge topic here but, as I say, it is as a concerned consumer of risk advice. The notice, and I hesitate to call it a warning, was unambiguous. Apparently, this hotel, teeming with diners and residents enjoying the pacific coast, did contain chemicals emphatically “known” to cause cancer, birth defects or reproductive harm. Yet for such dreadful risks to be present the notice gave alarmingly vague information. I saw that a brochure was available within the hotel but my wife was unwilling to indulge my professional interest. I suspect that most visitors showed even less curiosity.

As far as discharging any legal duty goes, vague notices offer no protection to anybody. In the English case of Vacwell Engineering Co. Ltd v B.D.H. Chemicals Ltd [1969] 3 All ER 1681, Vacwell purchased ampules of boron tribromide from B.D.H.. The ampules bore the label “Harmful Vapour”. While the ampules were being washed, one was dropped into a sink where it fractured allowing the contents to come into contact with water. Mixing water with boron tribromide caused an explosion that killed one employee and extensively damaged a laboratory building. The label had given B.D.H. no information as to the character or possible severity of the hazard, nor any specific details that would assist in avoiding the consequences.

Likewise the Proposition 65 notice gives me no information on the severity of the hazard. There is a big difference between “causing” cancer and posing a risk of cancer. The notice doesn’t tell me whether cancer is an inevitable consequence of exposure or whether I should just shorten my odds against mortality. There is no quantification of risk on which I can base my own decisions.

Nor does the notice give me any guidance on what to do to avoid or mitigate the risk. Will stepping foot inside the premises imperil my health? Or are there only certain areas that are hazardous? Are these delineated with further and more specific warnings? Or even ultimately segregated in secure areas? Am I even safe immediately outside the premises? Ten yards away? A mile? I have to step inside to acquire the brochure so I think I should be told.

The notice ultimately fulfils no socially useful purpose whatever. I looked at the State of California’s own website on the matter but found it too opaque to extract any useful information within the time I was willing to spend on it, which I suspect is more time than most of the visitors who find their way there.

It is most difficult for members of the public, even those engaged and interested, to satisfy themselves as to the science on these matters. The risks fall within what John Adams at University College London characterises as risks that are known to science but on which normal day to day intuition is of little use. The difficulty we all have is that our reflection on the risks is conditioned on the anecdotal hearsay that we pick up along the way. I have looked before at the question of whether anecdote is data.

In 1962, Rachel Carson published the book Silent Spring. The book aggregated anecdotes and suggestive studies leading Carson to infer that industrial pesticides were harming agriculture, wildlife and human health. Again, proper evaluation of the case she advanced demands more attention to scientific detail than any lay person is willing to spare. However, the fear she articulated lingers and conditions our evaluation of other claims. It seems so plausible that synthetic chemicals developed for lethal effect, rather than evolved in symbiosis with the natural world, would pose a threat to human life and be an explanation for increasing societal morbidity.

However, where data is sparse and uncertain, it is important to look for other sources of information that we can “borrow” to add “strength” to our preliminary assessment (Persi Diaconis’ classic paper Theories of Data Analysis: From Magical Thinking through Classical Statistics has some lucid insights on this). I found the Cancer Research UK website provided me with some helpful borrowing strength. Cancer is becoming more prevalent largely because we are living longer. Cancer Research helpfully referred me to this academic research published in the British Journal of Cancer.

Despite the difficulty in disentangling and interpreting data on specific risks of alleged pathogens we have the strength of borrowing from life expectancy data. Life expectancy has manifestly improved in the half century since Carson’s book, belying her fear of a toxic catastrophe flowing from our industrialised society. I think that is why there was so much indifference to the Santa Monica notice.

I should add that, inside the hotel, I spotted five significant trip hazards. I suspect these posed a much more substantial threat to visitors’ wellbeing than the virtual risks of contamination with hotel carcinogens.