Big data is, well, big. It requires more data processing capacity, more storage capacity, bigger pipes for data transmission—it just requires more. Big data is also still relatively new. Early big data implementations are starting to mature and reach their capacity, and network architects are applying new best practices based on a better understanding of the limitations of big data and big data infrastructure. If you can spot the early warning signs that your customers are overtaxing their big data systems, you will be in an ideal position to offer your expertise and show them how to get big data analytics to the next level.
New demands from big data are pushing the limits of enterprise infrastructure, and the problem is growing. According to Gartner, 75 percent of enterprises already are using analytics or building an analytics foundation or will be using analytics by 2017. And companies are becoming more willing to invest in big data even though they are still unclear about ROI; 43 percent of those companies planning big data projects and 38 percent that have already invested aren’t clear about the return from big data. At the same time, big data is becoming more important throughout the entire organization. Last year, the CIO initiated 37 percent of big data projects, and department heads initiated 25 percent. In 2015, 32 percent were from the CIO, and 31 percent were from department heads. There are more big data initiatives being launched by multiple stakeholders, and enterprise capacity is starting to be overextended.
When looking for signs of big data overload, there are some obvious symptoms:
1. Lack of capacity
The “big” in big data refers to the petabytes of disparate data sets needed for analysis. Structured and unstructured data are intermingled using Hadoop clusters to reveal insights you can’t get from a data warehouse. That means that the infrastructure has to be scalable.
If the infrastructure can’t scale out with added data storage and computing capacity, then it can’t handle expanding big data analytics. Often newcomers to big data will start with smaller projects and establish the size of the infrastructure accordingly. Alternately, big data architects can create a use case that requires more data and data types than the system was designed to handle. If the infrastructure can’t handle more storage and more embedded processing power, then it’s time to expand the infrastructure.
More data also means more files and more file types. Managing an ever-growing set of metadata for file systems can slow down performance. Traditional NAS systems can handle unstructured data, but they have a capacity limit. When NAS runs out of gas, it’s time to expand file-handling capacity. Object-based storage, for example, can handle billions of files without overhead issues, and it scales nicely geographically, across multiple locations.
Another telltale sign is latency. An organization’s big data infrastructure needs to deliver real-time results. For example, big data can be used to deliver custom Web advertising based on historical data. If storage systems can’t grow to handle real-time analysis, latency can result in stale data (i.e. data can’t be accessed fast enough for viable results).
One way to address this problem is to upgrade to computing systems with higher input/output operations per second (IOPS). Server virtualization is one way to increase IOPS. Migrating from a server-based cache to a faster storage system that uses flash technology may also be called for.
3. Insufficient data access
As big data adoption grows and more departments want to compare different data sets, more users will start sharing data. In order to promote more business value, companies are looking for new ways to cross-reference data objects stored on multiple platforms.
Addressing this problem may require expanding to a global file system. Allowing multiple users on multiple hosts to access multiple back-end storage systems can improve cross-reference analysis.
As part of infrastructure design, specific types of sensitive and personal data have to be protected. Specific standards and regulations dictate how financial data, medical records, and data in regulated industries are archived and shared. This could mean placing greater controls on data storage and making sure that specific data are not commingled or stored on servers outside of the U.S.
5. Growing expense
Big data also can translate to a big price tag. The bigger the infrastructure is, the more cost containment matters. Data deduplication, for example, is a simple part of data storage that can save storage space and dollars when you scale it across petabytes of data. Thin provisioning is another way to improve efficiency and cut overhead. Even saving a few percentage points can translate into big money.
Companies that need to archive data may find tape storage is still more cost-efficient. Others have legacy storage devices that retain data on commodity hardware. These types of strategies can save money because they require no additional capital expenditure, but operating expenditure may add to overhead. Storage is being implemented in software as well as hardware, and at some point, these legacy systems may prove too expensive to maintain as part of the infrastructure.
These are just a few of the signs that your customers’ big data infrastructure might be overextended. Applying best practices and big data expertise can help your big data customers address performance problems and contain costs and can open new opportunities for you to upgrade their infrastructure with an eye toward future big data demand.