Big data solutions can cost big money. A big data project can require costly hardware, lots of data storage, and expensive talent to handle the data processing and analytics. For those who are still unsure of the value big data offers for their business, the sticker shock could be enough to keep them away. To keep big data solutions affordable, it’s best to keep your big data small and start with a pilot program with limited scope and big returns.
A survey conducted by IDG Enterprise revealed that nearly half of 751 professionals surveyed are anticipating big data initiatives, and one third will be planning big data projects within the next 12 months. However, 39 percent have no plans and 49 percent are in the planning or pilot stage. Part of the reticence on the part of companies is they don’t know where to start, and they are unclear on what it will cost.
By removing, or at least minimizing the financial risk of big data solutions you have a better chance of launching a successful pilot project. Once you have a proof of concept, companies tend to embrace big data in a big way.
There are specific aspects of big data solutions that will drive up costs. If you can identify those high-priced components and limit the cost of your big data solutions, you can reduce the overall financial risk of the project. Most big data solutions have three major cost centers:
- Computing Hardware – Big data projects require massively parallel computing and rapid I/O to handle complex analytics. Despite what some big data pundits say, commodity hardware probably won’t be powerful enough. Computer systems have to have the elasticity to scale to meet changing processing demands, and they require more processing power and memory. However, there are ways to offset hardware costs.
Virtualization delivers faster performance and optimizes server utilization; it even lets you share servers across business units. Virtualization will reduce the number of servers required, save data center space, and yield added saving from reduced power and cooling demands, maintenance, management, and software licensing.
When entering into a big data project, assess how much computing power you really need to buy, and how much you use with a cloud solutions or a virtualization approach.
- Data Storage – One of the most costly aspects of big data solutions is data storage. If your big data project calls for petabytes of data, that data has to be stored somewhere. An analyst at Aberdeen said that 12 percent of the average IT budget goes to storage, and storage costs are doubling every two years. That ratio escalates dramatically with big data solutions.
Predicting the financial risk from climbing storage costs will be challenging, since big data pools tend to only get bigger. If you can create a hybrid cloud environment that gives you elasticity but allows you to lease as storage you go you will have better control over climbing storage costs. Leveraging public cloud resources to handle data processing and data storage will save a lot of money.
- Analytics personnel – Hadoop is the open source framework used for the majority of big data solutions, but just because open source software is free doesn’t mean it’s cheap. Hadoop is designed to process large datasets very quickly; much faster than a conventional database. Experts say that a Hadoop management system costs about $1,000 a terabyte; one fifth to one twentieth of the cost of other data management platforms. Much of that savings comes from Hadoop’s ability to structure data processing to support lower cost hardware and virtual resources.
However, even though Hadoop costs substantially less than expanding your database warehouse there are still hidden costs. The biggest expense is finding Hadoop experts. Big data is still young, which means the number of professionals who understand how to build a Hadoop infrastructure are still relatively few, which means they are expensive.
Hadoop training and certification programs are starting to emerge, but if you want to save on personnel costs you might consider training the in-house experts. If you have IT staff who are well-versed in DBMS and RDBMS and have some background in statistics, they might be well suited to add NoSQL programming and Hadoop to their tool belt. Check the capabilities of your own team before hiring a big data scientist.
As you assess financial risks from big data solutions remember that big data is still a new frontier. You may already have the resources you need or you can identify alternative big data solutions that cost less and can scale with your needs.
What do you see as the biggest financial risk from big data?