Part of the role of big data consulting is helping customers make the right hardware choices for big data infrastructures. How do you balance high volume data storage with low latency? What processing requirements are needed for big data analytics? Any big data consulting project calls for the right hardware choices to balance the three Vs of big data – volume, velocity, and variety. And one of the most challenging decisions is determining which servers are best suited for the job?
The biggest big data practitioners – Google, Apple, eBay, Facebook, and others – use hyperscale computing environments with a vast number of commodity servers clustered together using direct-attached storage (DAS) to deliver big data analytics. Other experts insist that off-the-shelf servers aren’t up to the task and that big data processes require enterprise-grade servers customized for big data volume and analytics. As with most network infrastructure questions, the right answer is “it depends” and that’s where big data consulting comes in.
The market seems to be gravitating toward high-performance servers for big data. IDC forecasts that the market for what they call high-performance data analysis (HPDA) servers will hit $2.7 billion by 2018, and the storage market which is also driven by big data will reach $1.6 billion. (HPDA refers to servers that can handle high-performance computing workloads for modeling and simulation.)
When offering big data consulting, here are some considerations for determining the right big data servers:
The Three Vs Define Server Performance
Big data server hardware is defined by the three Vs – data storage capacity (volume), rapid retrieval (velocity), and analysis (variety).
Servers need to be able to handle lots of data, up to terabytes or petabytes. While a single server can’t be expected to store all that data, accommodating high-volume and high-speed storage is essential which is why big data servers tend to have solid-state drives and DAS.
To accommodate velocity, such as real-time analysis for stock trading or machine-to-machine processes, servers have to be able to support concurrent users with multiple inputs per second.
The big data breakthrough is its ability to handle data variety, mingling traditional databases with unstructured data using the same analytics. While traditional databases tend to run on a single server, big data analytics tend to use clusters or cloud resources that offer more cost-effective capacity.
Operational and Analytical Platforms
Part of your big data consulting role is helping customers separate their operational requirements from their analytical needs, and scaling server hardware accordingly.
The operational requirements for big data are designed to handle real-time, interactive data processing. NoSQL was developed specifically to address the limitations of relational databases for big data computing, scaling faster and with less cost. Where SQL databases are designed to run on a single server, NoSQL databases are spread across a number of servers or the cloud. NoSQL typically takes advantage of cloud computing resources to handle massive computations.
For analytical big data loads, massively parallel processing (MPP) databases and MapReduce are also structured to scale beyond a single server. Some NoSQL systems have MapReduce built in as native functionality, so it is designed to scale to a server cluster or the cloud along with NoSQL.
The Argument for Enterprise-Grade Servers
Although white box servers are being used in extremely large big data operations, for most big data environments you will need to build customized enterprise servers that can gather, connect, reason, and adapt to unlock big data insights. Off-the-shelf hardware can’t support the compute intensity, high throughput, and low latency required for processing the data in motion for real-time analytics.
Some argue that using software to customize a distributed server environment will deliver big data results, but this approach has hidden pitfalls such as wasted resources and data center real estate, security concerns, and licensing headaches. Using enterprise-hardened servers offers a number of advantages, including:
- Higher compute intensity (more operations per I/O);
- More parallel processing capacity;
- Better virtualization capabilities with more virtual machines per core;
- Modular design and more scalability;
- More memory and processor utilization.
And using servers that are designed for the enterprise are easier to scale and manage. They are designed and tuned to work together with improved utilization, parallel processing, and virtualization.
So in assessing the server needs of your big data consulting customers, consider their long-term analytics requirements. Big data is driven by volume, velocity, and variety, and when you bring real-time analytics into the mix to make lightning-fast business decisions, then you need the server performance to handle the data and the analytics without delays in processing time or latency. That means specifying server hardware that can do the job while maximizing enterprise and cloud computing resources.