The one factor that often proves to be the biggest hurdle for big projects is data storage. Big data requires a great amount of data storage and, more importantly, storage versatility. Not only must the big data infrastructure have scalable storage that can accommodate massive quantities of data, but it also must deliver high-speed data access to meet the demands of big data analytics. Object data stores and software-defined storage (SDS) are gaining market momentum specifically because they address these big data storage headaches.
Software-defined storage is an evolving strategy that gives enterprise users more flexibility with data storage by using policy-based storage provisioning and management independent of the actual storage hardware. Storage virtualization separates the storage media from the software that manages it, thus enabling policy management to oversee storage functionality, including replication, deduplication, backup, and more.
New data storage demands for mobile computing, social media, and big data analytics are driving SDS demand because it is the best way to enable storage elasticity at substantial savings. IBM research shows that most companies are overspending on data storage to try to keep pace with demand, and that only 1 in 5 has an efficient IT storage infrastructure; storage provisioning is a bottleneck for 58 percent of enterprise cloud deployments. At the same time, customers can realize greater returns by applying SDS strategies for big data analytics. For example, financial service companies lose an average of $3.5 trillion in fraud every year, 80 percent of healthcare data is stored using unstructured forms that are difficult to manage, and communications service providers can increase revenue by as much as $300 million annually by making better use of available customer data.
New Data Demands Need Virtualized Storage
The demands of big data require a new storage paradigm. File-level storage that relies on direct-attached storage, such as hard drives, has historically been valuable for structured data applications, such as data warehouses. As the volume of data has increased, more enterprises have been adopting block-level storage using storage area networks (SANs). However, big data is driving demand for more data storage and new types of storage beyond SANs. Wikibon also notes that the amount of unstructured data is outpacing structured data, which means big data users need new ways to create handle data storage and retrieval for both structured and unstructured information.
Object storage is the most logical approach. By eliminating file-based hierarchies and abstracting stored data using objects as unique identifiers, data is virtualized, which makes storage scalability virtually unlimited. Data storage is handled using clusters of commodity hardware, which reduces costs, and SDS systems can span enterprise infrastructures and cloud data storage resources.
Object stores are becoming the norm because they reduce storage costs while supporting endless scalability. They also are the only way to handle the data needed for big data analytics.
The Benefit of SANs without the Constraints
Big data solutions and NoSQL platforms are proliferating, as well as commercialized Hadoop variants such as Cloudera, Hortonworks, and MapR. Many of these solutions are being adopted in pilot projects that are scaling in size. Gartner analysts have seen a jump in Hadoop cluster size from 10 nodtes to up to 50 nodes.
With more processing power comes demand for more data access, both inside the enterprise and in the cloud. By their very nature, Hadoop clusters are designed to hyperscale in order to handle the terabytes of data needed to analyze trillions of bits of structured and unstructured data to reveal new market trends, customer sentiment, and more.
SDS is a good fit for this scenario because it is platform-neutral. As big data applications grow, developers will be writing analytics for different Hadoop platforms. One of the business advantages of big data is data transparency; it breaks down the silos of operations so any corporate data can be incorporated for analysis. That requires a level playing field where any data can be accessed from any resource on demand.
The old data storage model using RAID arrays is not well suited for big data. However, by virtualizing RAID assets using SDS, you can eliminate the islands of data storage managed by individual applications and create a common data pool. SDS delivers a consistent provisioning workflow and central control to data access, no matter where the data is stored. You get the benefits of SANs storage, but without hardware or management constraints.
What to Look for in SDS for Big Data
When shopping for SDS solutions that are best suited for big data, look for specific capabilities:
- Automated provisioning – As data storage needs for big data grow and change, especially with new cloud resources being added to the mix, the IT team can’t be expected to be constantly provisioning new storage systems. The SDS platform should provide self-servicing and plug-and-play provisioning.
- Replication controls – Many big data solutions have replication built in, so deploying them on a platform that provides replication will mean exponential growth of useless data. Be sure replication controls are built in.
- Global, inline deduplication – Along with a central means of managing data, the SDS platform should provide in-line deduplication across all data sets. This reduces the raw disk capacity required in each Hadoop cluster.
Flexible data storage management is essential to big data success. By adopting the right SDS strategies from the outset, you can eliminate barriers to data access while centralizing and automating access control. Apply an SDS strategy that allows you to incorporate all data assets into a single, virtual platform that can deliver maximum insight and returns.