Big data projects have proven they can yield insights that can mean more revenue and greater efficiencies for any organization. But with big data projects come bigger security issues. The sheer volume of data adds to security concerns and big data projects create new risks for enterprise networks. That’s why security has to be part of your big data skills.
Big data is being harnessed to provide big data analytics for fraud detection and to detect cyber-attacks, however, the actual process of gathering big data for analytics brings its own security headaches. Hackers often set their sights on big data repositories because that’s where the crown jewels may be buried. Consider the Target security breach. Storing sensitive customer information in big data repositories makes sense if you want to learn more about your customers. By breaking into the big data pool hackers can access customer data, employee data, trade secrets and more. Target is estimated to have lost $1.1 billion to its data breach. What would happen if the same data breach affected a major healthcare provider or financial services company?
Securing big data is not all that different from any enterprise data security, but there are some differences, such as securing data collection and aggregation; monitoring the infrastructure to store the data; and managing the technologies needed to handle structured and unstructured data. Here are some specific big data skills to consider for big data security:
Securing the Data
Secure access and management of enterprise data becomes more challenging as more data reservoirs have to be tapped for big data projects. Each data source is going to have its own access protocols and security policies, so as part of your big data skills you need to be able to secure every data pipe as needed. You have difference sources with proprietary research, personally identifiable information (PII), or a dataset that requires regulatory compliance. Big data skills include being able to balance security with analytics requirements on a case-by-case basis.
And every new data source is a potential point of entry for cyber attackers. Data sources may use different repositories and each may create its own data transfer workflow, so every data source offers a potential line of attack.
Securing the Distributed Infrastructure
Most IT professionals design their data centers to secure one or two high-end data servers. Big data uses virtual resources, and that means distributed data repositories, each residing on its own hardware with its own security protocols.
When you have big data sources scattered across different enterprises and geographies, it’s hard to standardize security configurations, which means managing another point of vulnerability.
Open Source is Less Than Secure
When big data tools like Hadoop and NoSQL were developed the focus was on analyzing vast amounts of information. Security was not a consideration.
Hadoop didn’t authenticate services or users, and didn’t encrypt data transmitted between network nodes for processing. Similarly, NoSQL lacked some of the security features of other database technologies, such as role-based access control. NoSQL was designed so you could add new data types on the fly, but defining security for those data types was an afterthought.
Best Practices to Secure Big Data
When thinking about how to hone your big data skills for better security, think about how you might apply traditional database management security to big data:
- Securing Application Software – Hadoop, NoSQL, and other open source platforms have secure versions. Use technologies like Apache Accumolo or secure versions of Hadoop. Also consider adopting proprietary technologies that offer enhanced security at the application layer, like Cloudera Sentry, to provide access control security for both Hadoop and NoSQL.
- Monitor and Audit Activity – There are audit logging technologies, like Apache Oozie, that you can use to monitor big data clusters. Assign an in-house expert to examine log files and monitor for suspicious activity. Be sure to implement logging and auditing across the enterprise.
- Secure Hardware and Software Configuration – Be sure the configurations for all your enterprise hardware is current and patches are up to data. Consider automating configuration to ensure that all big data servers are secure and configurations are uniform.
- Account Management and Monitoring – Use security best practices such as strong password controls to manage user access. Be sure to deactivate inactive accounts and lock out users after a set number of failed log-in attempts. The objective is to prevent cyber-crooks from getting access to the big data cluster. By monitoring accounts closely you can prevent a security breach.
As you can see, the big data skills needed to secure your infrastructure aren’t very different from conventional enterprise security. It’s mostly a matter of understanding the weaknesses in big data frameworks and how to apply preventive security practices. What do you see as the biggest security risk in big data?