There are many experts who say that big data is the killer app for cloud computing. Understanding big data means understanding the infrastructure needed to support massive data storage, intense IO processing, and Hadoop analytics, and the cloud is indeed the ideal platform to support big data.
Big data is one of the factors driving cloud adoption. Technology Business Research reports that the top 50 public cloud providers increased revenues by 47 percent in Q4 of 2013 to $6.2 billion. What the cloud offers is extensible data storage and computing power for rent, making it ideal for big data applications.
Understanding big data needs means knowing how much data storage and computing power your projects will demand, and that means assessing the demands placed on the data center versus what you could deploy using cloud services. If your big data project requires terabytes or petabytes of data, then cloud data storage is less expensive to rent than installing more data center storage or larger data warehouses. Similarly, leasing processing power close to your data repositories makes more sense than moving massive amounts of data for processing in your own data center.
Big Data Performs Better in the Cloud
To appreciate the potential value cloud resources offer for big data you have to start by understanding big data infrastructure. What makes cloud computing so attractive are the same factors that differentiate big data from other types of business analytics:
Volume – Big data is about processing volumes of data beyond the capacity of conventional business intelligence software. Hadoop is capable of managing and processing petabytes of data, so where are you going to store all that data? Expanding your server farms is going to be cost-prohibitive, despite the declining cost of data storage. Even though the cost per gigabyte has dropped from $437,000 in 1980 to $0.05 in 2013, that doesn’t mean it’s cheaper to buy data storage rather than rent it. If you buy data storage, you have to buy enough to accommodate your biggest big data project. The cloud offers an elastic storage resource that can expand and contract with your needs.
Velocity – Big data velocity is what makes it possible to analyze large data sets for real-time results to automate decision-making. This means increasing network bandwidth from gigabit rates to megabit rates to accommodate big data analytics. To accommodate velocity requires virtualization, and the ability to handle much of the data pre-processing where the data is stored, performing the read/writes, and putting the IO processing in the cloud. Cloud resources are more versatile and expandable and can promote higher data velocity.
Variety – Big data also handles a variety of data from a variety of applications, such as app, web, database, file, mail, and print data. Each data type requires different CPU utilization and storage handling. Many of the processes to manage these unstructured data types are best handled in the cloud.
Virtualization of cloud resources is a logical way to approach big data. Virtualization lets you access, administer, and optimize any heterogeneous infrastructure, including the cloud, as if it were a single unified resource. Virtualizing data in the cloud provides the means to optimize volume, velocity, and variety of data and processing resources.
The Cloud Offers Faster Provisioning
One of the greatest advantages of using cloud computing for big data is shorter time to deployment. It can take months to assemble and configure the hardware necessary to handle a big data project. Rather than building out an on-premise data center, use cloud-computing resources to shorten the time to deliver a proof of concept.
Understanding big data means finding the right use case to yield ROI – return on insight. Once you have defined the appropriate use case you can determine the data sources required, as well as the data storage needed and the processing power. Assembling those components as part of the data center is going to be expensive and time consuming. Creating the same infrastructure in the cloud should take days or weeks, but not months. In addition, refining big data resources and processes is much easier to do in the cloud. Since the cloud is elastic, you can apply whatever processing power and storage capacity you need almost on-demand.
Granted, not all big data projects are going to be candidates for cloud computing. The sensitive nature of the data or the applications may require source data to be handled more securely in an enterprise setting – for secure government applications for example. However, to accommodate the volume, velocity, and variety of data needed for big data projects, cloud computing certainly delivers less expensive and more flexible computing resources.