Security Considerations with Big Data
While companies are very concerned about the security and governance of their data in general, big data initiatives come with certain complexities and unforeseen issues that many companies are not prepared to handle.
Often big data analysis is conducted with a vast array of data sources that might come from many unvetted sources. Additionally, your organization needs to be aware of the security and governance policies that apply to various big data sources.
Your organization might be looking to determine the importance of large amounts of new data culled from many different unstructured or semi-structured sources. Does your newly sourced data contain personal health information (PHI) that is protected by the Health Insurance Accountability and Portability Act (HIPAA) or personal identifiable information (PII) such as names and addresses?
Security is something you can never really relax about because the state of the art is constantly evolving. The combination of security and governance will ensure accountability by all parties involved in your information management deployment.
Managing the security of information needs to be viewed as a shared responsibility across the organization. You can implement all the latest technical security controls and still face security risks if your end users don’t have a clear understanding of their role in keeping all the data that they are working with secure.
Assess the risk for big data
Big data is becoming critical to business executives who are trying to understand new product direction and customer requirements or understand the health of their overall environment. However, if the data from a variety of sources introduces security risks into the company, unintended consequences can endanger the company.
You have a lot to consider, and understanding security is a moving target, especially with the introduction of big data into the data management landscape. Ultimately, education is key.
Risks that lurk inside big data
While security and governance are corporate-wide issues that companies have to focus on, some differences are specific to big data. For example, if you are collecting data from unstructured data sources such as social media sites, you have to make sure that viruses or bogus links are not buried in the content. If you make this data part of your analytics system, you could be putting your company at risk.
Also, keep in mind what the original source of this data might be. An unstructured data source that might have interesting commentary about the type of customer you are trying to understand may also include extraneous noise. You need to know the nature of this data source.
Has the data been verified? Is it secure and vetted against intrusion? The more reputable social media sites, for example, will watch closely for patterns of malicious behavior and delete those accounts before they cause damage. This requires a level of sophisticated big data analysis that not all sites are capable of.
Big data protection options
Some experts believe that different kinds of data require different forms of protection and that, in some cases in a cloud environment, data encryption might, in fact, be overkill. You could encrypt everything. You could encrypt data, for example, when you write it to your own hard drive, when you send it to a cloud provider, and when you store it in a cloud provider's database.
Encrypting everything in a comprehensive way reduces your exposure; however, encryption poses a performance penalty. For example, many experts advise managing your own keys rather than letting a cloud provider do so, and that can become complicated. Keeping track of too many keys can be a nightmare.
Managing the storing, archiving, and accessing of the keys is difficult. To alleviate this problem, generate and compute encryption keys as needed to reduce complexity and improve security.
Here are some other available data-safeguarding techniques:
Data anonymization: When data is anonymized, you remove all data that can be uniquely tied to an individual. Although this technique can protect some personal identification, hence privacy, you need to be really careful about the amount of information you strip out.
Tokenization: This technique protects sensitive data by replacing it with random tokens or alias values that mean nothing to someone who gains unauthorized access to this data. This technique decreases the chance that thieves could do anything with the data.
Cloud database controls: In this technique, access controls are built into the database to protect the whole database so that each piece of data doesn’t need to be encrypted.