The data center is the heart of any big data, which is why data center training is so important to big data success. Your IT professionals need to be well-versed in the latest technologies and strategies to optimize data center performance for big data processing. That’s going to mean reassessing your IT team’s skillset and giving them more data center training.
Managing data storage, computing, and testing require new skills because of the sheer volume of data required. Since big data is still in its infancy a lot of companies are still developing the big data tools they need for configuration and management. For example, Virendra Vase, CTO at Klout, estimates that 20 to 25 percent of his engineering resources are devoted to productivity tools and workflow management, and using those tools in a big data context is going to require more data center training.
So where do you focus your data center training efforts? Here are some of the key components of big data that your data center team needs to master:
1.Configuration and management of enterprise-grade servers: Big data experts are finding that off-the-shelf server hardware doesn’t have the capacity to handle processing for big data analytics. Big data workloads require enterprise hardware with more capacity:
- Higher computing intensity and the ability to support high I/O operations.
- More parallel processing capacity.
- More virtual machines per CPU.
- Modular design and elasticity so servers can be expanded to handle bigger big data analytics.
- Increased memory and processor capacity and utilization.
- Better security and compliance, including hardware-aided encryption.
Your data center training needs to address these custom server requirements so servers can be optimized to provide the resiliency needed for big data processing. This is going to include designing, building, and tuning servers, including making it manageable and scalable as demands increase.
2.Data Storage – Big data can require petabytes of information. That means your data center training has to include storage management. There are a variety of ways that enterprise systems deal with data storage: storage area networks (SANs), network-attached storage (NAS), and direct-attached storage (DAS). For big data processing, systems architects prefer to add more commodity services with DAS support. Others advocate scaled out or clustered NAS. Storage virtualization is another strategy that big data system architects are using to more effectively share physical data storage, reduce storage costs, and simplify management.
Understanding how to integrate these data storage strategies as they relate to big data analytics is crucial, so it should be included as part of you data center training regimen.
3.Virtualization – Although high-speed networking is important, network speeds can’t keep up with big data demands. It can take hours to transfer the terabytes or petabytes of data needed for big data analytics. That’s why most big data projects use virtualization to share resources more efficiently. Virtualization skills will have to span a variety of computing platforms:
- Server virtualization lets you take a single server and partition it into multiple virtual servers, each with the same functions as a physical machine.
- Application virtualization provides a means to manage applications based on demand, which helps provide the elasticity needed for big data analytics.
- Network virtualization is a more efficient means of pooling network resources, which can be valuable for data gathering and reducing network bottlenecks.
- Processor and memory virtualization decouples memory from the server to deliver the additional processing power for advanced analytic algorithms.
Additional data center training in virtualization, and even special training such VMware certification, will be extremely valuable.
4.Cloud Computing – Since big data requires lots of data storage as well as virtualized computing resources, almost every big data initiative uses some kind of hybrid cloud architecture. A hybrid cloud strategy is particularly valuable for big data since it offers reliability, scalability, and can be deployed quickly. It also offers cost savings for added storage and computing power as needed.
VMware, SAP, and other vendors are developing new hybrid cloud strategies with big data specifically in mind. Certification classes could easily be included in data center training.
A different set of programming skills are required for big data analytics. Expertise in Hadoop, NoSQL, MapReduce, Pig, Cassandra, and other software platforms is needed to create a big data analytics framework. The real demand is coming for skill in R programming, the software environment for statistical computing and graphics that is at the core of big data design. Many of these skills fall outside of the scope of data center training but it’s worthwhile identifying the database gurus on your team; DBMS and RDBMS skills adapt well to big data development.
Where do you see the skills-gap in your data center team? Is there one area where they need extra training to be ready for big data?