The Proliferation of Open Government Licences

As part of my work exploring compatibility between open data licences, I recently began looking at licences for government and public sector data. I was a little surprised to discover quite how many different licences and licence variants have been created.

UK Open Government Licences

The UK Open Government Licence (UK-OGL) is now used to support the publication of the majority of public sector open data in the UK. Based on feedback from the community, the original version of the licence (UK-OGL-1.0) has been updated to version 2 (UK-OGL-2.0) which, among other improvements, clarifies its compatibility with the Creative Commons Attribution licence.

But not all public sector data in the UK uses this licence. The Ordnance Survey have their own OS Open Data Licence that is derived from the Open Government Licence. Originally based on version 1, it was recently updated to version 2. The licence adds some guidance on attribution and a clause on sub-licensing. But it is unclear what this clause actually covers.

Natural England also use variants of both the Open Government and OS Open Data Licences. Their simplest licence, NE-OGL is based on UK-OGL Version 1 but adds an attribution statement. Other versions of their licences, NE-OS and NE-OS-LNR are variations of the first version of the OS Open Data licence. They both have slightly different attribution statements and the same sub-licensing clause.

Note: a previous version of this post wrongly suggested that the Natural England licences were versions of the same licence, when in fact they are different licences used for different types of datasets. See comments for discussion.

Data publishers have also created other “anonymous” copies of the UK-OGL by copying the text of the licence to the terms and conditions page of their website. See, for example, Tunbridge Wells Borough Council and the Health & Social Care Information Centre (HSCIC). In both cases those pages include a full copy of the text of UK-OGL Version 1, although you’d have to read through it in detail to confirm that. In fact the HSCIC have modified the licence to include their own attribution statement.

To add a little more confusion, the references to the original copy of the licence on those pages use a URL that automatically redirects to the latest version of UK-OGL (Version 2), not the version they’ve chosen to apply (Version 1). It would have been more useful to have simply linked to the original licence text.

Canadian Open Government Licences

The UK Open Government Licence has also been used as a basis for creating licences in other jurisdictions, most notably in Canada. Unfortunately this has lead to the creation of yet more variants.

The CIPPIC Licensing Information Project for Open Licences has some great tools for exploring open licences, including tools for visualising relationships between licences and comparing their text.

For example, here is a diagram that shows some of the family of licences derived from the UK-OGL. Some of these are derived from Version 1 whilst others are derived from Version 2.

Canada also has a whole other family of licences used in various city data portals. The “G4 Canada family” consists of a number of variants of a core licence originally created by Vancouver.

A review of the creation and spread of the G4 licenses suggests that some cities are moving towards adopting variants of the UK-OGL.

For example the Toronto licence is a variant of the Ontario licence, but notes that:

The only substantive changes in this licence are to replace direct references to the Province of Ontario with the City of Toronto and the inclusion of a provision for the Ontario Personal Health Information Protection Act, 2004.

The County of Grande Prairie license explains that:

The only substantive changes in this licence are to replace direct references to the Province of Alberta with the County of Grande Prairie.

Creating a new licence simply to add some branding seems entirely unnecessary. Updating a licence to include references to local legislation may be reasonable, but is still arguably unnecessary: regardless of the licence text you still need to conform to local laws, e.g. on data protection. An open data licence can’t give you a pass to ignore relevant legislation. It’s debatable whether the best place to include references to such legislation is in the licence text or in supporting documentation.

Curbing License Proliferation

Clearly there are some repeated patterns here. Standard licences are being forked and possibly modified text is being copied around for various reasons, including:

  • to replace generic attribution requirements with a specific attribution statement desired by the publisher
  • to localise the licence to refer to local legislation
  • to add branding, e.g. logos and names of cities or jurisdictions

This is all adding largely unnecessary overhead for data consumers who need to understand whether any of these changes are relevant, often by reviewing the licence text. Licence proliferation adds complexity for re-users, especially if licences may be incompatible.

Open source software also suffers from a proliferation of licences, but these tend to be completely independent licences. There’s a danger that the open data community might end up in a worse situation with dozens of slightly different licences based on slightly different versions of a few core licences.

Needless proliferation doesn’t help anyone. Now would be a good time for the community to provide feedback to publishers to try and curb further proliferation. It would be far better if data publishers were to:

  1. provide a clear stable link to a standard (country-level) licence, not complete copies of licences
  2. clearly indicate which version of a licence they are using, or note that they are intentionally tracking the latest version
  3. supplement links with guidance on the context for data re-use, specifically their preferred attribution statement and pointers to relevant legislation

The ODI has already published some guidance for publishers on how to publish rights statements in a clear, machine-readable format.

If a custom licence is deemed to be necessary then this should have:

  1. a stable URL for each version of the licence, and a pointer to the latest version of the document
  2. a clear indication of the version of the document, e.g. a version number or publication date, in the body of the licence
  3. a statement about compatibility of the licence with common standard licences

Obviously, custom licences should conform to the open definition and ideally be passed through the licence approval process.

The above recommendations would help stop some of the anti-patterns that are leading to licence proliferation and help the community clearly understand how data can be re-used.