Sports, Taxes, Criminal Investigations and Elections: Practical Use Cases for Data Preparation?

In this special guest feature, Jon Pilkington, Chief Product Officer at Datawatch, discusses how self-service data preparation solutions are often work-life hacks, enabling users to easily derive business insights that impact a corporation’s bottom-line. But, the use cases for this technology are actually quite widespread. The article provides 4 unique ways data preparation technology assists in predicting outcomes and significantly improving everyday life. As chief product officer, Jon brings more than two decades of business analytics experience to Datawatch, including 18 years in the business intelligence market. Jon joined Datawatch from Sonian Systems, a public cloud email archiving vendor, where he served as vice president of marketing and product management.

When you think of common data preparation and analytics use cases, the image that most likely comes to mind is one of business analysts hunched over their computers, eyes quickly moving over endless rows and columns of data in an effort to find the right information for analysis. While research has shown that analysts spend, on average, up to 80 percent of their time combing through reams of data and preparing it for analysis, this vicious cycle can be broken.

Self-service data preparation technology enables even novice business users to easily and rapidly extract, manipulate, enrich and combine disparate data from virtually any source, and then prepare it for analysis and reporting – without coding, manual data entry or involvement from IT. This means individuals can spend the majority of their time on analysis that results in insights that deliver business value, improve operational processes and increase the bottom line.

Self-service data preparation, analytics and the resulting business insights have a profound impact on the corporate world. But, they can also play a critical role in everyday use cases that have broad implications on society. Following are just four examples of the role data analytics can play in daily life.

1. Finding the who in “who done it?”

A colleague used self-service data preparation and analytics technology to look into the recent scandal at Baylor University regarding the institutional suppression of multiple accusations of sexual and physical abuse by players on the school’s football team. He obtained suspect and arrest reports for various types of assault cases that took place between 2010 and 2016 via a Freedom of Information Act (FOIA) request from the Waco Police Department. This data arrived in a messy 401-page PDF filled with nearly 14,000 semi-structured individual records. Rather than manually trying to parse it all into Excel, he used a self-service data preparation solution to transform the entire document from a semi-structured mess into orderly rows and columns.

With the 13,900 records formatted properly, the next step was identifying who within them played on the Baylor football team during the incidents in question. Here again, self-service data preparation came into play, enabling him to open and blend data – specifically the football rosters from the pre-determined years – directly from the Internet.

Following a thorough analysis focused on players’ ages, years played and common name elimination, he was able to narrow the 13,900-case list down to 16 separate incidents involving 11 men who were most likely on the Baylor University football team. The dates for these incidents ranged from March 2010 to August 2015, with seven of those players having been on the team the season following the incident.

Although these results didn’t solve the crime, the list would give a journalist or crime analyst a very interesting starting point for an investigation. While people don’t often equate self-service data preparation with investigative journalism or criminal investigations, this use case presents a fantastic example of how the technology can replace hours of tedious, mind-numbing work, and result in faster time-to-insight from raw data.

2. Preventing tax fraud

Tax fraud and tax evasion cost the government billions of dollars in lost revenue each year. It can result from something as simple as individuals and businesses under-reporting their earnings – or failing to submit their taxes altogether – or from more complex cases, such as criminals submitting false tax reports using stolen personally identifiable information (e.g., names and Social Security numbers).

Automated systems compare the filed information from individuals and businesses with that from employers to select those who require an audit as well as an additional one percent of randomly selected individuals. It may not sound like much, but the one percent alone totals more than one million audits yearly. This means manually identifying areas of discrepancy on each incorrect or randomly selected return!

With self-service data preparation’s unique ability to create negative joins – finding the information that doesn’t match up between two tax documents – tax professionals can spend more time identifying the criminals responsible for committing fraud and cutting the overall tax gap.

3. Making your dream fantasy team a reality

Many companies use fantasy football and other fan sports as a social exercise or team-building initiative – which is great for those who closely follow the sport and know how to pick players. For those who don’t, what should be a fun exercise becomes a homework nightmare, entailing hours spent researching “expert” opinions on who to select, in which rounds and other important information. Because of the work involved, many people rely on the autodraft selection, hoping that the system gives them a competitive team.

However, with self-service data preparation, fantasy players can easily gather NFL fantasy draft advice from the web, blend it together into one spreadsheet, and then organize it into categories for analysis (e.g., where players fall in the fantasy rankings, “sleepers” that could impact a team, etc.). And, just like that, someone who never follows sports can build a competitive team that goes the distance.

4. Predicting who will win the next election

In a 2016 University of New Hampshire (UNH) hackathon, students combined numerous, disparate data sources and performed automated, predictive analysis to see if they could predict which presidential candidate would win the popular vote in the last election. Ten teams used self-service data preparation to unlock and blend U.S. Census data, state polling information and data from previous elections that was housed within difficult-to-access PDFs, CSV files, Excel and Access databases, web content and social media networks.

By combining self-service data preparation and analytics technology, the winning team was able to generate a line graph of voter loyalty for each candidate throughout the previous 10 months. The team determined that if voters cast their ballots in the few weeks leading up to the election, Hillary Clinton would win the popular vote by only four percent. However, if Clinton’s voters, who have been historically quick to change their opinion of the democratic candidate, moved to a third party, Trump could win the popular vote. And, as history proves, Trump won the electoral college votes, but Clinton secured the popular vote.