VARs are increasingly looking to big data projects as a new means of creating value for customers, as well as an opportunity to sell additional hardware, software, and services. However, to provide ongoing value, you have to have a strategy for scaling big data projects.
According to a report compiled by Infochimps, “CIOs & Big Data: What Your IT Team Wants You to Know,” more than 55 percent of big data projects are never completed. The Infochimps’ survey says 58 percent of respondents point to “inaccurate scope” as the primary reason for big data project failures. Other experts say that big data failures are also the result of poor planning and lack of resources, such as trying to run a single computer with SAS computing models, or using pre-built software that fails to take advantage of open source standards like Hadoop. Big data projects may fail because the IT experts underestimate the technical prowess needed. Scaling big data projects requires pre-planning to provide adequate resources, so it’s better to start small before thinking big.
Achieve Success by Starting Small and Scaling Big Data Projects
The real value of big data is in deriving actionable insights by analyzing disparate information to uncover patterns, find hidden meaning, and ultimately improve decision-making. Big data value comes from bringing together the three V’s:
- Volume – The amount of available unstructured data available for analysis is growing at exponential rates; much faster than traditional storage and analytical solutions can manage. The more data you can incorporate into a big data project, the better the insight.
- Variety – One of the characteristics of big data projects is that data is unstructured and comes from various sources, such as social media, video, mobile data, etc. All that data needs to be organized for scalable big data processing, including “shadow data,” such as web search histories and access trace logs.
- Velocity – The closer you can get to real-time data, the better the quality of your big data insights. Velocity is also about how quickly your data sets change, and adapting to data coming in at different speeds.
Accommodating the three Vs requires a lot of storage and computing capacity as well as the right analytics applications. If you don’t want to be part of the 58 percent who miscalculated your needs, scaling big data projects should be done in stages, starting small and planning for project growth.
Four Key Steps to Successfully Scaling Big Data
- Identify a business problem – Before you can design a big business architecture you have to know what you want to discover. Start with a business question. Identify the stakeholders in the big data project – IT professionals, marketing professionals, big data scientists, et. al. – identify the business problem they want to solve, and determine why that problem has been impossible to address using existing data sources and analytics systems. This will give a baseline from which to launch your big data project.
- Start small but design to scale – You can learn a lot from a pilot project. Determine what your current production loads are and re-architect your computing solutions to handle expanding data sets. Scaling big data projects requires you to go from handling gigabytes to petabytes of data, and from supporting one application to hundreds. Hadoop is designed to handle scalability, but be sure the entire architecture can grow with the demands of the project.
- Develop a use case – Identify the use cases that will be required for your big data project for proof of concept. Map out data flows and identify the technology you need to support them. Identify what data to include and what to leave out. Determine the complexity of the business rules and how data interrelates. And identify analytical queries and algorithms you will need to generate the desired results. Creating use cases in advance will highlight weak points before you start scaling big data projects.
- Identify gaps between current and future capabilities – Assess your big data project and determine what additional requirements you will need for collecting, cleansing, and aggregating data. What usable data formats will you need? What governance policies will you need to classify data, including defining its relevance and ways to store, access, and analyze that data? What additional infrastructure will you need to ensure scalability, including maintaining low latency and high performance? How do you need to deliver the findings to users? All of these factors have an impact when scaling big data projects.
Successfully scaling big data projects is a matter of careful planning and testing before you deploy. If you consider the three V’s – volume, variety, and velocity – you can design a scalable big data architecture that can accommodate more data, more types of data, and faster access and processing of data.