Any organization can benefit from the insight delivered by big data. The ability to assimilate massive quantities of data from any resource and turn it into intelligible, actionable insight is invaluable, whether you are trying to lure new customers or streamline operations. The reason many organizations hesitate in diving into the big data pool is a lack of resources to provide infrastructure support.
Demand for more big data infrastructure support is growing. According to Wikibon, the big data market hit $18.6 billion last year, an increase of 58 percent. Forty percent of the market is made up of big data services, 38 percent by hardware, and 22 percent by software. The service component commands the lion’s share because of the need to develop new architectures and maintain performance.
Big data is defined as data that can’t be processed using conventional technology because of its size and complexity. Gartner defined big data as consisting of the three V’s: volume, velocity, and variety. Where big data infrastructure support comes in is in supporting storage for data volume, maintaining throughput for data velocity, and providing specialty programming to accommodate data variety.
Volume: Storage Infrastructure Support
Big data analytics requires more data so it has a major impact on data storage. The “big” in big data can translate to petabytes of both structured and unstructured data. That means big data infrastructure support has to handle large quantities of data within the enterprise and in the cloud.
Storage has to be elastic and scalable, the infrastructure has to be able to add modules or arrays without disruption. It also means scaling using clustered systems with embedded processors. And it means handling a larger number of files.
The amount of meta data required to deal with the file capacity needed is a challenge for traditional network attached storage (NAS). Instead you will have to use object-based storage architectures in order to scale the number of files supported. Object-based storage makes it possible to handle billions of files across multiple locations.
Velocity: Supporting Real-time Analytics
Many big data applications, such as financial transactions or web advertising, need to deliver real-time information, so latency is a big concern. To deliver fresh data, the infrastructure has to deliver enough processing power and connectivity to increase capacity without adding latency.
To deliver data velocity, big data environments have to have high IOPS (input/output operations per second), so the infrastructure probably includes server virtualization and flash-based storage as well.
Variety: Comparing Disparate Data Sets
Big data, analytics compares different data sets, such as customer sales records and web traffic or social media content. The challenge is to find ways to cross-reference different data objects from different platforms. Data migration is no longer part of infrastructure support. Instead the storage infrastructure is designed to accommodate different use cases and data scenarios.
Hadoop and NoSQL programming is an essential part of big data infrastructure support, and still one of the most challenging to provide since big data programmers are very much in demand.
Ventana surveyed data analysts, IT managers, and programmers and determined that 54 percent are using Hadoop, 82 percent would benefit from faster analyses, and 94 percent are performing analytics on data volumes that they never could before. Hadoop programmers are in demand, up 35 percent over last year and ranked as the third-fastest growing specialty behind NoSQL and cyber-security. Clearly demand for programming talent for infrastructure support will continue to grow.
To provide the elastic data storage and massively parallel process demanded by big data usually requires cloud computing. Data can be hosted by third parties so the big data infrastructure will have to include object-based data storage.
Cloud storage can have a negative impact on both capacity and performance if it isn’t designed properly. Creating highly virtualized cloud storage can be tricky since virtualized loads spread tasks across various storage volumes. You need careful load balancing to maintain both capacity and performance.
Big Data Security
And, of course, there is security. In addition to the technical expertise to handle data access and analytics, there has to be additional security measures both to ensure the integrity of the data and to protect sensitive information.
The volume/velocity/variety nature of big data makes it hard to ensure data integrity, and accessing data from multiple sources makes access control challenging. Infrastructure support needs to include monitoring of big data activity from applications and users. And for organizations concerned with regulatory compliance, an audit trail is required.
These are just some of the more important aspects of big data infrastructure support, and areas where third-party service providers can have a major impact. How many of these areas fall into your big data practice? Do you have what it takes to deliver big data infrastructure support?