Big data gives organizations the ability to assemble and analyze vast amounts of information for better business intelligence and decision-making. Unfortunately, with big data come concerns about big data security and privacy. Much of the information collected includes “toxic data”; data that if compromised or stolen would create real problems for the organization. Big data security and privacy has to include strategies to protect toxic data such as credit cards, PIN numbers, social security numbers, and sensitive intellectual property.
There are 2.5 quintillion bytes of data created every day, including sensitive information that could mean profits for cyber-criminals. And the estimated cost of every security-related data breach is about $40 million. The more data a project requires, the bigger the threat to big data security and privacy.
Most big data projects use data sources both inside and outside the organization so big data security and privacy concerns are compounded. Not only do you have to look out over your own company’s data, you also have to make sure that data provided by customers, partners, or vendors isn’t compromised. There may be legal liabilities as well. For example, using Twitter traffic may provide great insight into customer attitudes but may also be considered a breach of privacy in some European countries. Big data security and privacy are covered by 50 separate industry and legal mandates worldwide.
The Cloud Security Alliance’s Big Data Working Group has developed a list of 10 big data security and privacy challenges to be aware of:
- Secure computations in distributed frameworks – In order to handle big data computations, big data resources use parallel processing and storage and the data itself is split into chunks using MapReduce. A mapper reads and processes each data chunk and generates a list of key/value pairs. Untrusted mappers can be used to either change the data or sniff private information, such as customer transactions.
- Security best practices for non-relational data stores – NoSQL databases have become very popular to handle non-relational data, but NoSQL was built to deal with analytics, not security. Big data security and privacy is usually dealt with using middleware.
- Secure data storage and transactions logs – Logs are stored in multi-tiered media, and as the size of the data stored gets larger the logs become too large to manage manually. Auto-tiering data storage solutions do not track where the data is stored, which creates a challenge for data security.
- End-point input validation/filtering – Big data collects data from a wide variety of end points, including BYOD devices. Since IT cannot control all the end points there has to be a methodology to validate end points and filter data to prevent malicious data from being introduced.
- Real-time security monitoring – One of the benefits of big data is its ability to monitor the infrastructure and data in real time to prevent events such as fraudulent activity. The problem is inspecting real-time data generates a large number of alerts, including many false positives; the more data in the system, the more false positives.
- Scalable and composable privacy-preserving data mining and analytics – Anonymizing user data for analytics is insufficient; users can be too easily identified. Guidelines need to be implemented to prevent privacy disclosures.
- Cryptographically enforced data centric security – Historically, enterprise security has focused on securing the underlying infrastructure from attack. With big data, the infrastructure is so widespread and virtualized that it’s difficult to protect. The experts are leaning to encrypt the data itself to provide end-to-end protection.
- Granular access control – Implementing more granular access control for big data security and privacy has two benefits: a) it allows IT to maintain secrecy by preventing unauthorized access, and b) it helps prevent data useful for analytics from being inadvertently locked away under a more restrictive category.
- Granular audits – In addition to real-time detection you need detailed auditing of security incidents. Not only do audits reveal what went wrong but they are useful for regulatory compliance and forensics.
- Data provenance – Provenance and metadata reveals the sources and dependencies of data sources. Detecting metadata dependencies to uncover security and privacy restrictions can be computationally intensive. For example, detecting insider trading requires very fast algorithms to assess time-sensitive transactions.
These are the bigger issues the experts have identified to consider for big data security and privacy. How you address them is another matter. Depending on your big data applications you can apply various tactics, but because of its scale and real-time monitoring big data requires new security strategies. If you need help determining how to implement those strategies, we are always here to help.
What do you see as your greatest challenge in dealing with big data security and privacy?