🎉 Datathon 2020 is finally here! Register here! 🎉

How people lie to your face – with factual numbers.

Published 17 Aug 2020

By Victor Tsang



“Numbers don’t lie…but people do.

As data-literate individuals in a data-drenched world, we need to keep our wits about us and retain our critical thinking when people present us with studies and visualisations that seem to have all the answers. Here’s why, with some examples.

People hate looking at raw, plain, and boring numbers and that leads them to put them in easily digestible visualisations that grab your attention and stick in your mind. But, like all forms of communication, they emphasise some things and hide other things by design. People have agendas to push, which often leads to them obscuring details, distorting figures, or making invalid inferences to further support their argument. Heck, people do this in daily conversation with white-lies or exaggerated stories to capture their friends’ attentions too!

In benign situations, that could be okay. But in some scenarios – like a global pandemic – this could also be deadly.

Correlation != Causation

Geographic Profile Maps which are basically just population maps

Source: XKCD

We all know about this one. It’s what’s taught in basic statistics – but it’s also one that we unconsciously lean on, just by human nature.

For example, I could say that shoe size is highly correlated with reading ability – the average person might think that to be interesting and infer that a bigger shoe size causes a better reading ability. However, our inner statistician should be incredibly suspicious of this. In fact, the misleadingness of this statement is clear if you consider the entire population of a country.

Why? Clearly, the older you get, the better at reading you get. Likewise, the older you get, the bigger you get and therefore your shoe size gets bigger. In this example, shoe size may be correlated with reading ability – but there is no causal relationship between the two. Actually, the causal factor (age) has been hidden away!

Likewise, the XKCD comic here shows a correlation between Martha Steward Living subscribers and consumers of furry porn, which may imply a causal relationship at first! Of course, they are not causally related – they are simply all linked to the population of America!

Misleading Graphs

Graphs are more often than not the best way of communicating information. They’re visual, they’re succinct, and they’re helpful. But they can also be manipulated to push certain agendas. For example:

bush tax cuts chart

Source: flowingdata.com

The underlying data is good and reliable, but the graph has clearly been manipulated to mislead the audience. Can you see it? By starting the y-axis at 34% instead of 0, the change looks far more dramatic – around 4x greater! However, if you look at the numbers, it is only a 4.6 percentage point increase, or an 11% increase in the tax rate as opposed to the ridiculous difference in size.

The numbers make more sense if we start the y-axis at 0: y-axis starting at 0%

Source: venngage.com

In this example, the manipulation was clearly done to sensationalise the data and change the audience’s beliefs. However, these manipulations are not necessarily bad! After all, effectiveness is contextual and in the eye of the beholder. Consider the following graph:

Side by side comparison of linear and log scale COVID-19 graphs Source: The New York Times

Log transformations are incredibly useful and common in statistics to make relationships more easily understood. Here, it makes it easier to understand changes in the growth rate of coronavirus cases. It’s not clear on the linear scale, but Italy’s cases were actually slowing compared to the US. As a result, it seems perfectly fine to use this graph.

However, a pitfall of the log graph is that absolute comparisons cannot be done as intuitively as a linear scale. As the scale gets larger and larger, the data gets more and more compressed and that makes absolute comparisons far less intuitive.

Log scales compress large numbers.

In summary, graphical manipulations are often a necessity for effective communication – but we must be aware of the intention and the consequences of these manipulations.

I could go on and on about this topic, such as extrapolation, cherry picking, or sampling bias – but I would highly recommend you look into these yourself. For reference, I would recommend these videos and articles:

The point is to encourage critical thinking before consuming data, statistics, and information in general. We must all stay aware and alert, unless we want to be at the beck and call of someone else!


Tags: statistics communication