'Reasonably anonymised' data is off the hook when it comes to GDPR. But how can organisations deal with the complexity of anonymisation, and the risks of re-identification down the line? And what does ‘reasonably’ mean anyway?
In this post, ODI's Head of Tech Olivier Thereaux shares current work in this area, and asks for your input in where we head next
Author: Olivier Thereaux, Head of Technology
Earlier this year, the EU General Data Protection Regulation (GDPR) came into force. As everyone was busy batting off a hailstorm of emails seeking renewed consent on data collection, our team started looking more closely at one of the interesting edges of the fledgeling law: anonymisation.
GDPR is all about personal data. In other words, data that is not personally identifiable is out of scope, and anonymisation is a (figurative, if not literal) get-out-of-jail card. Or, in the words of the regulation’s 26th recital: ‘the principles of data protection should therefore not apply to anonymous information’ and ‘this Regulation does not therefore concern the processing of such anonymous information’.
Meanwhile, the UK’s parliament was working through its implementation law, the Data Protection Act 2018, which states that: ‘It is an offence for a person knowingly or recklessly to re-identify information that is de-identified personal data without the consent of the controller responsible for de-identifying the personal data’.
The choice of making re-identification an offence was not without controversy, as it may have a chilling effect on crucially important security research. But even for the typical data practitioner in an organisation, there are several ways to interpret these legalese twins:
- Am I at risk of unwittingly being complicit in a crime if I anonymise poorly and someone re-identifies personal data as a result?
- Is this law intended to nudge me to think more seriously about anonymisation, and in this case is the legal protection against hostile re-identification intended to give me peace of mind?
- Or, more subtly, is this a perverse incentive to not worry too much about anonymising appropriately, since the act of willingly breaking poor anonymisation becomes someone else’s problem?
And that’s only looking at the legal framework! As we have seen time and time again on the news, data-related scandals often have more to do with breaking trust than breaking the law.
If we were confused, others may have been too. Scared, even, perhaps.
This is particularly important to us as we aim to create a healthy data ecosystem where data is as open as possible. What does the above mean for open data? What about shared data? If we are to help increase access to data while retaining trust, managing the risk of re-identification needs to be high on our list.
Where next?
Before we started our research in earnest, we turned to experts to help us get our facts straight. The great people at Eticas gave us a crash course on anonymisation, the good, the bad and the ugly, the actors and the techniques, and how to look at the strength of anonymisation and its converse, utility (how much usable information remains after anonymisation) on a spectrum. And we do like data spectrums.
This preliminary research helped us confirm our intuition that different types of data in different parts of the data spectrum come with massively varying challenges, and require very different approaches to data protection in general, and anonymisation in particular.
Armed with this knowledge, we are now in the midst of a series of interviews of data controllers across the public and private sector. In this first phase of research, we want to better understand where fear or confusion may lie, and where we may be able to create new resources to complement and improve on existing ones like the excellent (but somewhat weighty) guidance from the Information Commissioner's Office (ICO) or its anonymisation code of practice, or the Anonymisation Decision-making Framework from the UK Anonymisation Network (UKAN).
In particular, we are still in the process of understanding and deciding:
- As the ICO updates its anonymisation code of practice and UKAN its decision-making framework, what kind of companion resource could we make available to help make those in-depth documents more accessible?
- Can we help reduce confusion about approaches to anonymisation by looking at the difference of context along the data spectrum? Can we explore how the notions of utility and strength of anonymisation fit on the data spectrum? Are there specific approaches that work well for shared data but not open data, and vice versa?
- Can we better understand the relationship between our Data Ethics Canvas and Privacy Impact Assessments. Should we look into why, and how, organisations are looking at those and thinking of ‘Data Ethics Impact Assessments’?
- Building on our recent work on tools and guidance for data publishers, should we look into developing or improving open source tools and automation for more reliable anonymisation, for example as new plugins for Lintol?
- … and finally, what can we learn by observing or helping organisations currently working through this process?
Learning from your experience
This list is still in flux, however, and we welcome thoughts on these ideas and suggestions on other helpful outputs.
Moreover, if you are a data practitioner at an organisation looking at or in the process of opening or sharing data which has been anonymised, we would love to hear from you and arrange a short interview, and maybe also discuss the possibility of taking part in a pilot project to observe and help guide the process.
Get in touch: [email protected]
Photo by Greyson Joralemon on Unsplash