Big data requires a versatile enterprise infrastructure with virtualized resources for computing and data storage. One of the latest trends is adoption of software defined storage because it provides the kind of elasticity that big data projects demand. To master big data architectural design, your big data team needs to understand how software defined storage can optimize big data projects.
Mobile computing, social media, big data, and analytics are all driving the need for more and more versatile data storage. According to IBM, 78 percent of CFOs are under pressure to reduce spending, and four out of five enterprises have inefficient infrastructures and are wasting a lot of their money on data storage. Storage provisioning and management is seen as a significant bottleneck for 58 percent of enterprise cloud deployments. Your big data team can use software defined storage to both increase efficiency and reduce costs.
Software Defined Storage is Essential for Big Data
Software defined storage is a means of abstracting data management from the storage hardware, which makes it ideal for applications such as big data since storage virtualization makes it easier to use commodity servers and cloud resources. The beauty of software defined storage for big data is that it provides agile, elastic data storage for changing big data demands; conventional storage arrays just can’t deliver the necessary flexibility.
By definition, big data has to handle large amounts of data and be able to scale with changing analytics demands. Large big data practitioners such as Google or Facebook use commodity servers with direct-attached storage (DAS) and lots of redundancy. Most big data architectures can’t afford that kind of raw data storage, so software defined storage allows design using local server access and cloud-based storage.
EMEA Research reports that 58 percent of IT managers cite storage provisioning as a major bottleneck for enterprise cloud deployments, and storage automation was cited as a top integration requirement by 32 percent of those surveyed. More than 84 percent also said they were planning some form of hardware-independent storage system. One of the biggest challenges that your big data team will face is in developing a workable virtual data storage architecture
What to Consider in Software Defined Storage
The big data team needs to appreciate that certain characteristics should be part of any software defined storage system:
- Software defined storage needs to be open. Intermixing physical and cloud storage means you have no control over what type of storage environments will be used. The big data team has to design the system with open storage in mind, managing both local and cloud-based resources of any type without adding complexity.
- Software defined storage needs to be intelligent, self-optimizing, and policy driven. Big data also demands continual changes in workload, including data storage. That means the system has to be self-administering, supporting automated tiering across virtual machines and storage systems. For example, to handle analytics the system needs to be able to move active data to the fastest storage tier.
- Software defined storage needs to be application aware. The big data team also needs to design the system so it’s not disrupted by routine storage provisioning requests. Automating storage provisioning should help maximize productivity.
When done correctly, the big data team can expect a number of benefits from software defined storage:
- Automated use of both on premise and cloud-based storage to create a single heterogeneous big data infrastructure.
- Policy-based orchestration of storage resources to optimize performance.
- Analytics-driven optimization of software defined storage and other resources to adapt to unpredictable big data needs.
- The use of open APIs and tools to make it easy to reuse storage across hybrid cloud environments.
Next, the Software Defined Data Center
Software defined storage is the next step in the evolution of the software defined data center (SDDC). The purpose of using software defined networking is to abstract hardware resources to the software layer to speed enterprise configuration and management changes. Each configuration change or policy change creates an avalanche of management data that can best be handled using virtualized resources.
With enterprises adding on average 24 percent additional storage capacity annually to their infrastructure, they only get 30 to 50 cents on the dollar in return if the capacity is not utilized effectively. Using virtualization to make optimal use of data storage improves the return on the big data investment.
If the virtualization framework can inspect and control data storage, then it can assess storage constraints and availability. Using those variables the SDDC can apply configuration and policy changes to alleviate congestion and streamline new deployments. That means both local and cloud-based storage can be applied to optimal capacity.
What are the biggest storage challenges facing your big data team? Is software defined storage part of your big data strategy?