Was Big Data wrong about the American elections?
How can we trust Big Data with Medicine and our finances when it was wrong about the Elections last week?
Big Data Analytics, especially implementing Machine Learning to big data sets, has been a popular topic in Silicon Valley for the last decade. Just in the last few years, it has received celebrity status among popular media sites and dinner tables across the country. The issue lies in the umbrella term that encompasses Big Data, and our inability to take data results with a grain of salt. A popular saying in the Data science realm is “Garbage in, Garbage Out”. What this refers to is the results of Data Analysis depends, to a large extent, on the quality of the data feeding the analysis.
There is something to be said about the quality of data feeding the predictive Election forecasting systems this time around, or specifically the omission of certain groups. In 1968, the “Silent Majority” helped Nixon win his Presidential race, described by Wikipedia as “an unspecified large group of people in a country or group who do not express their opinions publicly”. An easier feat without the advent of the internet and more importantly Social Media as a means of creating a global discussion. Yet, if you perform a simple Google search, you will find a collection of news articles using this term to describe a group of voters who checked off Trump on their ballot.
With the growth of Social Media, there is now a wealth of information on public opinion to be tapped. If I were a man of Statistics, I would choose the sample size of 10,000 opinions over a focus group of 6 any day (margin of error, conformity bias, the list goes on). This being said, the majority of pollsters, data scientists, and predictors were wrong on Election Night. The results of the predictive analysis showed Clinton winning the popular vote. There were a few variables under-considered during the prediction process. The “Silent” Trump voters; those who were less vocal towards their views to save themselves from the moral shaming which was commonly found online and on Social Media.
Many predictive systems employ Machine Learning to perform the number crunching, which is usually based on historical data. The historical data did not prepare the analysis process to consider another previously invisible group, first-time voters. Trump was able to motivate voters who were previously jaded by politics in the past, those who felt their vote did not matter.
What does this say about Big Data and Machine Learning? In the end, these are tools that we can leverage. As businesses and brands alike start to implement a data-driven strategy for their Marketing, Sales, or Customer Service teams, they need to ensure their data solutions properly ingest and clean the data before analyzing. There is only a small number of Data Analytics companies that collect and cleanse the data in-house, Semeon Analytics being one of them. Like everything else concerning Artificial Intelligence, Big Data, and Machine Learning these tools are not meant to replace humans, instead these solutions are meant elevate the human process. The predictions towards the elections does not prove that Big Data Analytics is flawed, rather it should remind us that we still have work to do in adapting these tactics to businesses around the world.
About Semeon: Based in Montreal, Canada, Semeon combines the best semantic, sentiment, intent and statistical analysis. Thanks to its 100 person-years of experience with natural language processing systems, our team of experts has developed a unique platform that affords Semeon’s customers the ability to track what is being said about their brands, products, customers, competitors and helps them do so more rapidly and efficiently than with competing products.