Part of your role as the big data expert is to stay ahead of emerging big data technologies. It’s important that you understand what tools and techniques are being adopted so you can advise your customers about new big data technologies that meet their needs. There is still a big gap between the promise of big data returns and the reality of the capabilities that companies have to deliver on big data.
According to a CFO survey by McKinsey, big data is one of four emerging technologies (along with the cloud, mobile, and social computing) that will boost profits by 10 percent by 2015. While 49 percent of the CFOs surveyed identified organizational challenges as the biggest barrier to success, 39 percent said that lack of IT capabilities is their biggest challenge. Helping your customers identify emerging big data technologies that will augment their data center capabilities will make it easier for your customers to cash in on big data.
The following is a list of 10 new big data technologies that experts have identified that will have a direct impact on big data projects:
1. Column-oriented databases – Online transaction processing, which requires faster update speeds, typically uses row-oriented databases. However, as the data volume increases (as is the case with petabytes of big data pools), query performance starts to decline and data becomes more unstructured. Using column-oriented databases allows for huge data compression, which shortens query time but also requires batch updates.
2. No-SQL databases – Schema-less databases, such as key document stores and key value stores, will be required to retrieve volumes of unstructured and structured data. The use of schema-less databases will mean the sacrifice restrictions such as read-write consistency for distributed processing and scalability.
3. MapReduce – MapReduce is going to be an essential part of big data analytics since it allows for execution scalability across thousands of servers and server clusters. The Map task converts the input data set into tuples or different key/value pairs; and the Reduce task combines the output from the Map task to form reduced sets of tuples.
4. Hadoop – Apache Hadoop is an open source platform that has become the de facto standard for implementing MapReduce. Hadoop is versatile enough to handle multiple data sources and can aggregate data for large-scale processing. Hadoop can be used in many ways, but for big data projects it’s often used to handle large volumes of changing data, such as social media content or traffic sensors.
5. Hive – Originally developed by Facebook, Hive functions like SQL to run queries against a Hadoop cluster to extract business intelligence (BI). It offers a higher level abstraction of data stored by Hadoop and makes the data more readable to BI users.
6. Pig – Pig also lets you create MapReduce programs for Hadoop. The name Pig is taken from Pig Latin because it abstracts the MapReduce program for SQL-like queries on a Hadoop cluster. Pig is also open source and was developed by Yahoo!
7. WibiData – This is a combination of web analytics and Hadoop and is built on top of the Hbase database layer for Hadoop. WibiData enables real-time responses for web sites, such as serving personalized content in response to user behavior.
8. Platfora – One of the challenges of using Hadoop for big data BI is that it is a very low-level implementation of MapReduce, which means it takes a lot of developer know-how and testing to use. Platfora is a big data analytics platform (as well as a company) that automatically converts user queries into Hadoop jobs, similar to querying a conventional database.
9. SkyTree – Manually exploring massive amounts of data is both impractical and expensive so SkyTree provides a data analytics platform for high-performance machine learning.
10. New storage technologies – Big data proliferation means a demand for more storage. This is going to drive sales of conventional SANs and enterprise storage and demand for cloud storage, it also will drive demand for new solutions for data compression and storage virtualization.
Of course, there will be other big data technologies emerging as well. For example, big data is going to have a big impact on cloud usage. Rather than investing in more enterprise infrastructure, organizations are going to look to cloud vendors for hosted Hadoop clusters that can scale with their needs. The more data capacity companies have, the more processing power they will need to analyze the data, and that increase in demand for data processing means more opportunity for resellers.
How do you plan to profit from emerging big data technologies? Are you going to focus on new hardware, new software development, services, or all of the above?