Tuesday, January 13, 2015

Securing Big Data Part 6 - Classifying risk

So now your Information Governance groups consider Information Security to be important you have to then think about how they should be classifying the risk.  Now there are docs out there on some of these which talk about frameworks.  British Columbia's government has one for instance that talks about High, Medium and Low risk, but for me that really misses the point and over simplifies the problem which ends up complicating implementation and operational decisions.

In a Big Data world its not simply about the risk of an individual piece of information, its about the risk in context.  So the first stage of classification is "what is the risk of this information on its own?" its that sort of classification that the BC Government framework helps you with.  There are some pieces of information (The Australian Tax File Number for instance) where their corporate risk is high just as an individual piece of information.  The Australian TFN has special handling rules and significant fines if handled incorrectly.  This means its well beyond "Personal Identification Information" which many companies consider to be the highest level.  So at this level I'd recommend having Five risk statuses

  1. Special Risk - Specific legislation and fines apply to this piece of information
  2. High - losing this information has corporate reputation and financial risk
  3. Medium - losing this information can impact corporate competitiveness
  4. Low - losing this information has no corporate risk
  5. Public - the information is already public
The point here is that this is about information as a single entity, a personal address, a business registration, etc.  That is only the first stage when considering risk.

The next stage is considering the Direct Aggregation Risk this is about what happens when you combine two pieces of information together, do that change the risk.  The categories remain the same but here we are looking at other elements.  So for instance address information would be low risk or public, but when combined with a person that link becomes higher risk.  When looking at corporate information on sales that might be medium risk, but when that is tied to specific companies or revenue it could become a bigger risk.  Also at this stage you need to look at the policy of allowing information to be combined and you don't want to have a "always no" policy.

So what if someone wants to combine personal information with twitter information to get personal preferences?  Is that allowed?  What is the policy for getting approval for new aggregations, how quickly is risk assessed and is business work allowed to continue while the risk is assessed? When looking at Direct Aggregation you are often looking at where the new value will come from in Big Data so you cannot just prevent that value being created.  So setting up clear boundaries of where approval is required (combining PII information with new sources requires approval for instance) and where you can get approval after the fact (sales data with anything is ok, we'll approve at the next quarterly meeting or modify policy).

The final stage is the most complex its the Indirect Aggregation Risk that is the risk of where two sets of aggregated results are combined and though independently they are not high risk the pulling together of that information constitutes a higher level risk.  The answer to this is actually to simplify the problem and consider aggregations not as just aggregations but as information sources in their own right. 

This brings us to the final challenge in all this classification: Where do you record the risk?

Well this is just meta-data, but that is often the area that companies spend the least amount of time thinking about but when looking at massive amounts of data and particularly disparate data sources and their results then Meta-Data becomes key to big data.  But lets look just at the security side at the moment.


Data Type Direct Risk
Customer Collection Medium
Tax File Number Field Special
Twitter Feed Collection Public

and for Aggregations

Source 1 Source 2 Source 3 Source 4 Aggregation Name Aggregation Risk
Customer Address Invoice Payments Outstanding Consumer Debt High
Customer Twitter Locaiton Customer Locations Medium
Organization Address Invoice Payments Outstanding Company Debt Low


The point here is that you really need to start thinking about how you automate this, what tools you need.  In a Big Data world the heart of security is about being able to classify the risk and having that inform the Big Data anomaly detection so you can inform the right people and drive the risk.

This gives us the next piece of classification that is required which is about understanding who gets informed when there is an information breach.  This is a core part of the Information Governance and classification approach, because its hear that the business needs to say "I'm interested when that specific risk is triggered".  This is another piece of Meta-data and one that then informs the Big Data security algorithms who should be alerted.

If classification isn't part of your Information Governance group, or indeed you don't even have a business centric IG group then you really don't consider either information or its security to be important.

Other Parts in the series
  1. Securing Big Data is about layers
  2. Use the power of Big Data to secure Big Data
  3. How maths and machine learning helps
  4. Why its how you alert that matters
  5. Why Information Security is part of Information Governance

No comments: