Bad Statistics I – the phantom line

I came across this chart on the web recently.


This really is one of my pet hates: a perfectly informative scatter chart with a meaningless straight line drawn on it.

The scatter chart is interesting. Each individual blot represents a nation state. Its vertical position represents national average life expectancy. I take that to be mean life expectancy at birth, though it is not explained in terms. The horizontal axis represents annual per capita health spending, though there is no indication as to whether that is adjusted for purchasing power. The whole thing is a snapshot from 2011. The message I take from the chart is that Hungary and Mexico, and I think two smaller blots, represent special causes, they are outside the experience base represented by the balance of the nations. As to the other nations the chart suggests that average life expectancy doesn’t depend very strongly on health spending.

Of course, there is much more to a thorough investigation of the impact of health spending on outcomes. The chart doesn’t reveal differential performance as to morbidity, or lost hours, or a host of important economic indicators. But it does put forward that one, slightly surprising, message that longevity is not enhanced by health spending. Or at least it wasn’t in 2011 and there is no explanation as to why that year was isolated.

The question is then as to why the author decided to put the straight line through it. As the chart “helpfully” tells me it is a “Linear Trend line”. I guess (sic) that this is a linear regression through the blots, possibly with some weighting as to national population. I originally thought that the size of the blot was related to population but there doesn’t seem to be enough variation in the blot sizes. It looks like there are only two sizes of blot and the USA (population 318.5 million) is the same size as Norway (5.1 million).

The difficulty here is that I can see that the two special cause nations, Hungary and Mexico, have very high leverage. That means that they have a large impact on where the straight lines goes, because they are so unusual as observations. The impact of those two atypical countries drags the straight line down to the left and exaggerates the impact that spending appears to have on longevity. It really is an unhelpful straight line.

These lines seem to appear a lot. I think that is because of the ease with which they can be generated in Excel. They are an example of what statistician Edward Tufte called chartjunk. They simply clutter the message of the data.

Of course, the chart here is a snapshot, not a video. If you do want to know how to use scatter charts to explain life expectancy then you need to learn here from the master, Hans Rosling.

There are no lines in nature, only areas of colour, one against another.

Edouard Manet


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s