There are a number of reasons why big data solutions fail. They go over budget. They go over schedule. The single biggest reason that big data initiatives fail is lack of planning and an inability to adequately define the scope of the project. Big data systems perform only if they deliver actionable insights. Without properly defining the scope of the project at the outset, it will be nearly impossible to define what data sources you will need or that the analytics will be meaningful.
According to a study conducted by InfoChimps, 58 percent of IT professionals say their big data initiatives fail because of an inaccurate scope for the project, while 55 percent of big data projects don’t get completed for other reasons, such as lack of communications or lack of staff. If you scope the project properly then you can anticipate the big data solutions you will require and the project will have a much greater chance of success.
So how do you make sure that you have adequately defined the scope of your big data project? Here are 12 tips to consider:
- Staff assessment – Determine what you need in terms of big data staffing. Do you have the right architects? Do you have data experts to handle Hadoop programming and analytics? What about visualization and interpreting big data results? Big data solutions require diverse expertise so assess your team and determine who needs special training and if you need to recruit additional talent.
- Business discovery – You need to define the use case for your big data initiative. What question are you trying to answer and what are the broader implications of the answer for the company? Will it affect operations, product development, revenue, or all of the above? Have you narrowed the use case to make the findings actionable, i.e. will the results show how to develop a tactical program? Have a clear understanding of what you want to achieve with big data, including the corporate stakeholders involved.
- Architecture assessment – Now that you have defined the use case, determine if you have the technical resources needed as part of your big data solution. What resources do you already have available?
- Information discovery – As part of the architectural assessment, define what data sources you need for your use case. What information is available in house and where is it stored or siloed? What external data resources, both structured and unstructured, will add insight to your big data initiative?
- Infrastructure design – Now that you have an inventory of what’s available you need to design your big data solution. What additional hardware and software do you need? How much data storage? Do you buy more hardware or use hosted technology, or both? What about analytics and applications stacks?
- Procurement and setup – Now that you have the big data solution mapped out you can start assembling the components. Shop for whatever cloud sources you need. Add servers, storage, software, and whatever else you determined is required.
- Make sure everything works – More than just making sure everything is connected and running smoothly, you need to do a data check. Go through the system and make sure all the processes are in place and you can integrate the necessary data streams into the big data system.
- Data gathering and ingestion – Now you are ready to start. Gather the data from in-house sources, external sources, structured, unstructured, and feed them into the big data platform for processing.
- Analytics – If you defined your use case properly, then your development team has been able to apply the right Hadoop or NoSQL magic to ingest the data and spit out answers.
- Applying the data – Now that you have the big data output, you need to deliver the results. This is where data science comes in. Interpret the analytics in a way that is meaningful for the use case. Use visualization and reports to present the big data findings in such a way that the insight becomes actionable.
- Test and retest – Test the findings and make sure you are getting consistent, meaningful results. You should be testing your results on an ongoing basis to make sure you big data solutions are delivering as expected.
- Repeat – Now that you have completed a successful pilot project you can repeat the process. Develop new use cases or refine the questions you need answered for your use case.
These are the basic steps toward big data success. The important thing to remember is to start with the right question. If you understand where you are going you will know when the big data solutions take you there. What’s your biggest challenge when scoping a big data project?