Before you enter into any big data engagement, take inventory of your big data skills. Big data is part of enterprise technology, but how the data is stored, managed, and analyzed requires specialized expertise. Big data skills can be hired, rented, or taught, but before you go looking for expertise, have a clear understand of the big data skills you need, and what big data skills you may already have available.
The market pundits have been predicting a shortage of big data talent for some time. A report issued by Gartner in 2012 predicted that there will 4.4 million IT jobs created worldwide to support big data, including 1.9 million jobs in the United States. McKinsey predicts that big data will be one of the five “game changers” to boost national GDP by 2020, and will represent as much as $325 billion. McKinsey also predicts that by 2018 the United States could face a shortage of 140,000 to 190,000 people with sufficient analytical skills to mine big data, as well as a shortfall of 1.5 million managers with the big data skills to interpret findings to make effective decisions.
So with a shortage of big data skills that only promises to get worse, where are you going to find the big data talent you need? If you can’t hire the talent, you will either have to rent it and pay the consulting fees, or find skilled in-house experts who can adapt their knowledge and acquire big data skills.
What It Takes to Manage Big Data
Big data is a team effort, so big data experts will come from all parts of the company. An interdisciplinary team representing all those who will benefit or be affected by big data will set the scope of the big data project and review data sources and potential outcomes. It will be up to the IT experts, however, to apply the hands-on expertise with data skills will fall into three basic areas: infrastructure, programming, and analytics.
The enterprise architects you probably already have on staff can use their understanding of data storage and high-speed data delivery to sharpen their big data skills. To support analytics, you need to develop an enterprise infrastructure that can store massive amounts of data (not just terabytes but petabytes). The storage required for high-speed big data processing isn’t usually network-attached storage (NAS) or storage area networks (SANs) but direct-attached storage (DAS) scattered in clustered computing nodes. Big data systems have to be scalable systems and able to readily add storage, either within the enterprise or in the cloud. It also has to be designed for high-speed I/O processing, parallel processing, virtualization, high throughput, etc.
The programmers help manage the stored data streams. Although big data is too big to be handled by standard database management systems like DBMS, RDBMS, and ORDBMS, those database programming skills are still useful. The structured data from DBMS and RDBMS databases has to be extracted and tagged for analysis. Unstructured data is also part of the mix. As much as 80 percent of business data is unstructured in the form of word files, spreadsheets, audio, video, and external social media. Database programming skills adapt well to big data NoSQL programming platforms like MongoDb, Cassandra, Solr, Redis, and Neo4j.
Finding Data Scientists
While database programmers can fill some of the gap in converting data for analysis, the big burden is going to fall to the data scientist.
Data science is a relatively new discipline – the term was coined in 2008 by D.J. Patil and Jeff Hammerbacher when they were working for LinkedIn and Facebook. Although there is no degree program or formal training in data science, at least not yet, those with a background in statistics and mathematics are well suited to the discipline.
Data scientists are responsible for building and managing the algorithms that deliver big data insights. Apache Hadoop has become the most common big data framework, and big data scientists are adept at using Hadoop for distributed file processing, as well as using open source tools, cloud computing resources, and data visualization tools. If you can’t find data scientists for these tasks, you can try enlisting programmers with a mathematics background, or statisticians who understand business issues.
Big data management is still a relatively new discipline so big data skills and best practices are still evolving. Many existing IT skills can be readily applied to big data while others will have to be acquired. Have you taken a hard look at your big data staffing needs? How do you plan to fill the gaps?