There are many reasons for big data project failures – lack of planning, lack of resources, and a lack of understanding about how to identify the right data streams are some of the most common. Since any big data project means a significant investment, big data project failures can be very expensive. For some companies, big data is truly too big to fail so they continue to refine their big data strategies and processes to turn big data project failures into successes.
As with anything, you can learn more from your big data project failures than you can from your successes. Big data is still a new discipline and learning from mistakes helps refine best practices. Based on survey findings from Infochimps the most common reasons for big data project failures are a lack of business context for the data (51 percent), the lack of expertise to “connect the dots” (51 percent) once the project is complete. ”Inaccurate scope” was listed as the number one reason for big data project failures by 58 percent of those surveyed.
Based on the survey, more education and experience with big data leads to better expertise and fewer big data project failures.
The Four Primary Reasons for Big Data Failures
Stephen Brobst, CTO of Teradata Corporation, cites four primary reasons why big data projects fail:
- Focus on technology rather than addressing business problems.
- Inability to deliver data access to the subject experts for analysis.
- Failure to achieve enterprise adoption.
- Lack of sophistication to align the enterprise with the project’s total cost of ownership.
Too often big data projects become all about the data rather than the insight. To be successful, you need to understand the business objectives behind the big data initiative and align the cost of the resources with the potential benefits. Big data is all about actionable insight; the data findings may be interesting, but without a plan to capitalize on those findings the project will have failed.
InformationWeek quoted Jim Kaskade, CEO of Infochimps, about their survey findings, noting, "Too many big data projects are structured like boil-the-ocean experiments." While experimenting with Hadoop and seeing what might fall out of big data analytics is interesting, such projects will prove expensive without yielding business-oriented results.
To make big data pay off you have to have an objective in mind, such as increasing sales to a new market or customer retention; something that will yield ROI. And you need to challenge and test assumptions at each step:
There are many moving parts to any big data project so assessing and reassessing the process at each step will minimize big data project failures.
Google’s Big Data Flu Misdiagnosis
A case in point is Google’s highly publicized failure to apply big data to show the spread of influenza in the U.S.
The Google Flu Project was launched to provide real-time monitoring of flu cases. Granted, this kind of project doesn’t have immediate value for Google, other than to demonstrate it can be done, but it could be useful to health care providers or manufacturers of flu vaccines. While the premise seemed a good way to prove the power of big data, Google got it wrong.
In building its analytics, Google assumed that there was a close correlation between people who search for flu-related information on the web and the number of people that actually had the flu. Drawing from an average of 6 billion searches per day, Google seemed to have found the perfect way to demonstrate the power of big data.
However, Google Flu Trends overestimated the incidence of flu in 2011-2012 and 2012-2013 by more than 50 percent. During the 2013 flu season Google Flu Trends indicated that 11 percent of Americans had the flu; the Center for Disease Control put that number at 6 percent.
The reason for the failure was the data selected for analytics. Google wrongly assumed that its search data was an accurate measurement of flu infection. What was missing was what researchers call “small data,” i.e. traditional predictors such as past flu seasons, regional factors, etc. And by using its own search analytics as a metric, Google was modifying its own findings as it measured results, creating a feedback loop that would skew the findings.
What this “garbage-in, garbage-out” big data scenario demonstrates is that big data project failures often stem from wrong assumptions, choosing unreliable data sources, and lack of testing big data findings. Big data projects are not a one-time event, but need to be repeated to refine the results.
What do you see as the biggest factor driving big data project failures?