Why Google’s Big Data Pill May Not Be Best Medicine For Flu Bug
Numbers and data have become an indispensable part of our lives. Right from the common man to world leaders, everyone uses numbers and statistics to guide their decisions in areas of politics, entertainment, and even healthcare. We have complex algorithms, which are even available as apps, to help predict the spread of diseases. But as a recent research suggests, blindly following numbers and data without a context can be misleading, according to a press release Thursday.
Like Us on Facebook
"The Parable of Google Flu: Traps in Big Data Analysis" is published in the journal Science, funded, in part, by a grant from the National Science Foundation, where the author Ryan Kennedy, examines Google's data-aggregating tool Google Flu Trend (GFT). GFT is a web service operated by Google and provides up-to-date estimates of influenza activity for more than 25 countries. It works by aggregating Google search queries on flu related topics and helps predict outbreaks of flu.
"Google Flu Trend is an amazing piece of engineering and a very useful tool, but it also illustrates where 'big data' analysis can go wrong," said Kennedy who is a political science professor in University of Houston. Big data is a collection of large and complex data sets that allow capturing, searching, analyzing, visualizing, and sharing of data. Big data is used in areas of business, disease prevention, real time prediction of traffic, and crime fighting among others.
But in their research Kennedy and his team give a detailed explanation of the shortcomings of the use of big data from aggregators such as Google.
GFT was first launched in 2008 by Google.org and since then the tool has undergone several modifications to accurately predict flu outbreaks. But in the past two years, it has grossly overestimated the number of flu cases in the U.S.
"Many sources of 'big data' come from private companies, who, just like Google, are
constantly changing their service in accordance with their business model," said Kennedy. "We need a better understanding of how this affects the data they produce; otherwise we run the risk of drawing incorrect conclusions and adopting improper policies."
GTF went wrong in its estimation of flu in the 2012-2013 season, when more number of cases were predicted than actual. In 2011-2012, the number of cases was overestimated by 50 percent. Also, in August 2011 to September 2013, GTF did not accurately predict prevalence of flu in 100 out of 108 weeks.
The team is also skeptical about data collections done on platforms like Twitter and Facebook as companies can manipulate results for commercial gain.But Kennedy does not completely discount the use of big data. He says that if combined with more traditional methodologies they can be used for better predictions.
"Our analysis of Google Flu demonstrates that the best results come from combining information and techniques from both sources. Instead of talking about a 'big
data revolution,' we should be discussing an 'all data revolution,' where new technologies and techniques allow us to do more and better analysis of all kinds", said Kennedy.
© 2012 iScience Times All rights reserved. Do not reproduce without permission.