As far back as 500 BC, the Greek philosopher Heraclitus observed that everything changes and nothing stands still, and nowhere is that more true than in technology. Emerging technologies are continually shaping the future of business computing, promising more speed, more efficiency, more insight and more profits. Big data is the latest technology to drive changes in the enterprise, promoting more demand for cloud computing, bigger data pipes, more data-processing horsepower and more complex analytics. However, even big data strategies do not remain constant. To stay current with the latest big data trends, IT managers have to plan for the future, relying on integrators and solution providers to guide them in order to make sure that today’s big data investments still pay off in 2020.
Spending on big data continues to show strong growth. The most recent research from IDC predicts that the big data market will continue to climb at a compound annual growth rate (CAGR) of 23.1 percent through 2019. IDC research shows that overall annual spending is expected to reach revenues of $187 billion by 2019, a 50 percent increase from the size of the market in 2015. Wikibon is more conservative, predicting a CAGR of 14.4 percent, which will make the market grow to $92.2 billion by 2026. Most of the spending will be for professional services (40 percent), followed by spending on hardware (31 percent) and software (29 percent).
If professional services continue to be the big spending category for big data, then solution providers can play an increasingly important role, especially if they can help customers realize big data savings in hardware and software. Helping design the right infrastructure today can reduce big data spending in the future.
Back to Big Data Basics
When planning big data infrastructure design, consider the basics. Big data is defined as data-mining techniques that use large volumes of structured and unstructured data. Big data is characterized by the three Vs—volume, velocity and variety—so to design a big data infrastructure that will scale, you have to determine how to best accommodate the three Vs.
When we talk about big data volume, we’re talking about petabytes and exabytes of data, most of which is extremely difficult to integrate. There’s also a variety of data in different formats, which means that you can’t use traditional relational database techniques for analysis. Instead, raw data has to be stored in a vast data lake where artificial intelligence programs and machine learning mine intelligence using complex algorithms. Hence, we have to meet the demands of both data volume and variety.
Analysis of big data in real time for applications such as enterprise security, monitoring social media and manufacturing requires a new type of data analysis platform, especially to integrate structured and unstructured data. Hence, more users are adopting Hadoop, NoSQL, MapReduce and similar tools in order to accommodate big data velocity, delivering results in real time.
Integrate Using the Fourth V: Virtualization
So to create an extensible big data infrastructure that can accommodate growing volume, variety and velocity, you need:
- Distributed and redundant data storage capable of handling vast quantities of data
- The ability to process tasks in parallel
- Data-processing capabilities for functions such as MapReduce
- Centralized data management
- Easy data accessibility
- Extensibility to expand the infrastructure in order to accommodate new requirements
The cloud can accommodate data volume and variety. Cloud-based data storage is elastic and can grow any company’s big data needs. The cloud also doesn’t care what kind of data it stores or how that data needs to be accessed, so it can accommodate data volume and variety.
Accommodating velocity requires a combination of data streaming and processing. In order to support real-time application—such as retail click tracking, sensor monitoring and financial transactions—the big data application has to create a real-time feedback loop. Data needs to flow into the system and go from input to decision with lightning speed. Streaming data processing handles data on the fly, processing data as it enters the systems to minimize storage requirements and maximize response time.
Virtualization is a necessary strategy for big data success, because it is able to handle data volume and velocity in real time. Analytics requires a lot of processing power and memory. Virtualization takes hardware components such as random access memory, central processing units, data storage and network controllers and abstracts them into a series of virtual machines. These virtual machines run software much more efficiently. Virtualization also makes it easier to extend processing power, adding additional virtual machines from the enterprise or in the cloud. With virtualization, it’s easier to scale the infrastructure to handle big data demands.
So how do you devise a big data infrastructure that can grow with customers’ needs? The simple answer is to design with extensibility in mind. Using software-defined networking and virtualization, you can create an infrastructure made of virtual building blocks that can be added and swapped out as big data needs demand changes. More big data applications are emerging all the time; consider the new demands created by the Internet of Things, for example. Because you can’t predict future demands (beyond understanding that change is inevitable), finding ways to create a more versatile, elastic big data infrastructure is your best strategy.