How To Debug Your Approach To Data Analysis

Seven common biases that influence how we understand, use, and interpret the world around us.

By Shafique Gajdhar, FusionCharts.

In 2005, UCLA Econ Graduate, Michael Burry, saw the writing on the wall – the ticking numbers that form the American mortgage market. Burry’s analysis of US lending practices between 2003-2004 led him to believe that housing prices would fall drastically as early as 2007.

And he turned his ideas to good use, pocketing net profits close to a whopping 489% between 2001 and 2008! Those who overlooked his insights earned a little over 2% in the same period.

In the modern world, we can’t overstate the impact of accurate data analysis. The price to pay for small mistakes can be significant – running up to millions of dollars, or the failure to predict election results by a laughably wide margin.

So, why do we make these errors? Why do even the best of us, with years of experience in making data-led decisions and equipped with the latest tools, often struggle to read between the numbers?

1. Approaching data sets with a pre-existing idea

Also called confirmation bias, this theory suggests that decision makers use data to prove or debunk a specific theory.

Image Source

Unlike Burry, most stakeholders looked at the data with preconceived notions of how the investment market is supposed to behave.

Instead of a generic stance, C-level executives might leverage data with a predetermined goal. That’s where the data scientist comes in – it’s their job to perform an accurate and objective analysis, gaining insights that may or may not validate the business users’ choice, or even turn out to be completely irrelevant.

2. Not looking at the data, ALL the data, and nothing but the data

The broad umbrella of selection bias covers unwitting biases (like survivorship bias) or unavoidable ones, such as availability bias.

Take, for example, the 7 million Americans living outside the country who weren’t included in 2016’s US pre-poll survey. Incomplete data sets let the NYT Presidential Forecast ticker go from 80% to <5% in around 12 hours.

In fact, most surveys are prey to selection bias. “Many businesses only capture a small piece of the pie when it comes to data available to their segment or industry, and this means their data and subsequent analysis are skewed,” said Powerlytics CEO, Kevin Sheets, in an interview with InformationWeek.

3. Ignoring the impact of outliers (or rejecting them altogether)

Outliers are extreme data points that show a vast difference from the mean. As seen, they tend to generate ‘false’ averages that don’t reflect the real picture.

In 2014, research shows the bottom 50% of the American population earned USD 25,000 on an average, while the top 1% cashed in around 81 times that amount, every year – sizable difference.

However, removing the outliers isn’t always the way forward. For the insurance industry, a set of exceptional claims can impact revenues – but must be analyzed and addressed separately.