Misuse Of Statistics: Why Statistics Cannot Always Be Trusted

 by Belinda Chau



When we come across a percentage, a figure or a graph, have we ever felt sceptical about the validity of the evidence behind the claim, or the motives behind people publishing the statistical source? 

The reality is, a lot of statistics we have encountered have been manipulated in order to influence us to think in a certain way; it can occur anywhere such as in advertisement, media, news, and science. For instance, based on a 2009 survey by Dr Daniele Fanelli from the University of Edinburgh, 33.7% of scientists surveyed admitted to questionable research practices such as modifying results to improve outcomes and withholding analytical details. The impression that statistics are based on solid facts is what can make misuse of statistics highly misleading.

Misuse of statistics can be accidental or intentional; collection, organisation and presentation stages of statistical studies are all at risk of this. When people intentionally misuse statistics, it is often done to achieve a result in their interest such as advertising a product claiming to be 99% effective at performing its job to boost sales - at the end, consumers are likely to face disappointment. 

A common form of statistical misuse is false correlation and causation, where the investigative outcomes seem to suggest a relationship between two variables, but they are each in fact caused by a third variable. For example, although there is a positive correlation between smoking and lung cancer, this does not mean smoking is the cause of lung cancer - such results might instead be caused by air pollution or other factors; smoking is only a risk factor of lung cancer. Sources should clearly clarify this information to avoid misleading the reader, especially if they are something highly believable, such as Alzheimer’s disease and old age.



Accurate visual data representation is equally significant. A great example would be the graph above, published during the Covid-19 pandemic by the US Georgia Department of Public Health, presenting the top 5 countries with the greatest number of confirmed cases over time. At first glance, we assume that the figures have been declining. However, looking carefully at the labelling of the x-axis: the dates are not arranged in chronological order! It was indeed dangerous as it might mislead people into believing the power of the virus to be weakening around the world, which might cause less people strictly sticking to protective measures and increase risks of transmission and infection as a result.

Not only this, some graphs may have a truncated axis where the starting value of the y-axis is not zero. These graphs tend to confuse people with the scaling usually in order to amplify the differences between figures. Taking a look at the right graph below, if we ignore the labelling of the y-axis, the heights of the bars seem to suggest the quantity of Y is multiple times the quantity of X. But if we look at the left graph presenting the same set of data with the y-axis starting from 0, the two figures are actually very close to each other. 



Furthermore, we often rely on statistics to make comparisons and maximise our interests, but we might come across a statistical phenomenon called the ‘Simpson’s paradox’ where the same set of data can lead to opposite trends depending on how the data are grouped. 

Imagine we are comparing the effectiveness of drug A and drug B in curing a disease in a trial: drug A has a higher overall success rate of 76% compared to drug B's 74%. However, if we break down the data: on day one, drug A cured 49 out of 60 people (82%) whereas drug B cured 25 out of 30 people (83%); on day two, drug A cured 27 out of 40 (68%) people whereas drug B cured 49 out of 70 people (70%). Although it seemed like drug B was performing better on both days, not the same number of people took part each day for drug A and B, so a valid comparison can only be drawn from the overall figure in this case as each has a total of 100 people taking part in the two days.

Misuse of data can take even more shapes and forms, which is why we should always assess carefully the reliability of the data. It is crucial for us to be cautious of the false sense of reality that these inaccurate or false statistics are presenting us with so that we can avoid falling in the trap and make informed decisions.

Comments