The rise of big data is contributing to the rise of cloud services. Any enterprise can now tap into big analytics thanks to cloud computing. In many ways, the rise of cloud services democratizes big data by providing the means to store and analyze an unlimited amount of data.
Spending reports are validating the rise of cloud services. According to IDC, global IT spending for big data and cloud computing will exceed $2 trillion in 2014. Cloud, big data, mobile, and social are all driving sales, but 30 percent of spending will be on analyzing pools of unstructured data. Cloud computing revenues will be booming because the cloud is the only way to manage these huge data pools for analysis.
How does the rise of cloud services enable big data? Primarily by offloading data storage and processing, and providing added scalable resources on demand, including those needed for big data analytics.
The Rise of Cloud Storage
Cloud storage delivers high availability and durability. Amazon Web Services Simple Storage Service (S3) promises 99.9 percent availability and 99.999999999 percent durability per year. In practice this translates to less than an hour of outage per month. And error checking and self-healing processes are completely transparent to the user. Basically, the cloud offers plug-and-play data storage that can scale from a few bytes to petabytes of data storage, and many big data projects require petabytes of data.
And cloud data storage works as a perfect complement to enterprise stored data. For example, structured data a DBMS database can be combined using unstructured data in the cloud for analytics. If a company wants to use big data to assess customer attitudes, for example, transactional data and customer history in the company database can be combined with unstructured and semi-structured data from Twitter, Facebook, email, and other sources. Using the cloud to store the unstructured data means you can mine as much data as you need.
Using Virtualized Cloud Computing
Another factor contributing to the rise in cloud services is virtualization. Both cloud computing and big data projects rely heavily on virtualization. Data virtualization is the only way to access and optimize heterogeneous environments, such as those used for big data projects. The cloud computing model gives users a virtual data center that can access data sets that were previously unavailable using a common application programming interface (API) for disassociated data sets.
Hadoop was designed specifically to handle distributed systems with vertical scalability. Distributed systems act as data stores for Hadoop, NoSQL, Cassandra, and other big data file systems, making it easier to mix enterprise, private cloud, and public cloud resources.
Public, Private, and Hybrid Cloud
Private cloud models are at the forefront of the rise of cloud services. While public cloud resources are available to handle data transfers, data storage, and analytics processing, security is always a concern and accessing third-party resources using virtualization is always a challenge. Tapping public cloud services for big data has other challenges, such as fluctuations in performance, or the cost of copy data out to local systems. For big data analytics, the problem is balancing the load on public cloud resources. If other customers are vying for the same resources the performance hit could affect big data analytics, especially if you are relying on real-time or near real-time data.
And the physical infrastructure behind the virtual resources will change over time. In a public cloud the resources change and have to be throttled back to accommodate all customers and guarantee a specific level of performance. A private cloud provides more control over virtualized resources and ultimately better performance.
The rise of cloud services also has led to the evolution of the hybrid cloud, which merges the public and private cloud computing models. Hybrid clouds are usually designed to combine scalability and security with lower operating costs. Some organizations that require lots of computing resources for a short time use a hybrid cloud approach. In times of extremely high loads, such as during big data projects, baseline resources can be expanded on demand using a hybrid model. The advantage is that cloud access, like cloud data storage, is a pay-as-you-go model.
However, hybrid cloud infrastructures are usually more expensive than dedicated private clouds because providers want long-term commitments. In general, a private cloud model is more cost effective and easier to administer, making it better suited for big data.
Big data adoption is clearly contributing to the rise of cloud services, especially private clouds. Allocating dedicated cloud services is the best way to ensure that the storage and virtual computing resources needed for big data analytics are available when you need them. It’s also the only way to accommodate the petabytes of data required by many big data projects.
How do cloud services fit into your reseller strategy?