Data Sharing is Not Open Data
One of the disturbing trends that we've noticed over the past year is the government justifying data sharing as if it is part of satisfying wider open data policy.
Data sharing is not open data.
This post explains the difference between data sharing and open data, and how the two overlap.
Data sharing is providing restricted data to restricted organisations or individuals. Access to this data is usually restricted because it is sensitive in some way, either because it is personal or because its general release might cause security problems. Data sharing might be with the individuals that the data is about (midata), within government, or outside government such as to researchers, charities or commercial companies. Examples that we have seen proposed by government are:
- the Department for Education providing more access to the National Pupil Database to data analytics companies
- HMRC providing the full VAT register only to credit reference agencies
- NHS England providing pseudonymised health data to pharmaceutical firms
Open data is providing unrestricted data to everyone. By definition, open data must be available to all without restrictions on what they do with it. And the corollary of that is there cannot be any legal restrictions on making that data available. For example personal data can only be open data if it is written in law that it must be published (for example, insolvency notices in the London Gazette) or if the affected individuals have given their permission for that publication.
More data is always more useful (if you know what to do with it).
There are lots of companies, charities and individuals who would benefit if all the data the public sector holds was shared with them, particularly if it was shared only with them. Those benefits have to be balanced against the rights of individuals to have their data protected by government and the risks to individuals and to society of too much data being available (for example, through making fraud easier).
We believe that releasing open data is the right way of achieving that balance. It can satisfy a lot of the demand for data. It ensures a level playing field because everyone receives the same information. And it limits the release of personal data.
When data sharing arrangements are put in place, we recommend using open data to minimise the number of restricted data releases that are made, to make the process transparent, and to ensure that everyone benefits from it. Namely:
Release open data. Provide aggregate and anonymised information that satisfies the majority of the demands that organisations have for data.
Document the data and the process to access it. It should be clear what the data holds, and the robust process that those who get hold of it have to go through to be granted access. See the information on the National Pupil Database as a good example.
Publish all requests for access as open data. The public has a right to know who is asking for, and being granted access, to their data and for what purpose. When this is transparent, it reassures the public that there is a robust process in place for granting access, and it discourages organisations for requesting data that they don't have a good reason for needing.
Require the publication of the results of data analysis as open data. The public should benefit from the results of analyses of public data – there should be a requirement for publication of anonymised derived datasets that result from priviledged access to the data.
Monitor repeated requests. When the same data is requested repeatedly, including by different organisations, this is a signal that the analysis should be carried out by the organisation that holds the data and the result then published as open data for all to benefit from.
There is lots of data within the public sector that should be open. There is also lots of data that the public sector holds that should not be shared. We should take care not to confuse the two.