Published Guides to Licensing

We have now published two guides to open data licensing to help you guide you through the process of opening up and reusing dataWe whilst avoiding any potential legal pitfalls.

They are:

These guides were produced with great help from Francis Davey, Leigh Dodds, Tony Hirst, and the anonymous commenters who contributed comments on the draft versions of the publisher's guide and reuser's guide on Crocodoc.

One major change to the guides from the draft version has been to remove a section that gave technical details about how to link a licence to a piece of content or data in a machine-readable way. This raised more questions than it answered:

  • what if there are separate licences that apply to the content and the data?
  • how should we reference a licence from JSON data?
  • how can we articulate complex attribution requirements in a machine-readable way?

We intend to start a separate piece of work that will unpack the technical issues in specifying licensing requirements for data.

Another area that is intentionally omitted from these guides is specific advice for public sector organisations. There are additional legal constraints on how the public sector publishes its information, including that released under Freedom of Information requests. Our aim is to create a separate guide on this in conjunction with ODUG.

The biggest addition to the drafts was around the question of attribution. Leigh Dodds has recently published a detailed analysis of how we attribute data that highlights some of the issues that reusers can find with overly complex attribution requirements. The guide for publishers now includes advice to keep things simple.

The hardest comments to address were the "what if..." questions about scraping:

  • "What about scraping a collection of webpages that are generated from a database that you don't have access to? If eg each of the rows of a SQL database table are expressed in a templated HTML web page, and someone writes a scraper that recreates the database table, have they taken a copy of the database? Or have they created their own database? Would it depend on how they got the list of pages to scrape?​"
  • "This section talks about content and data largely separately. But one common form of republishing/derivation is taking content and turning into data. Scraping is the obvious case here."
  • "What about publishing analyses or the results of processing data? For example, VOSA vehicle recall stats are protected by Crown Copyright. What if I scrape and process the data and produce a list of top car manufacturers who have had prodcut recalls on matters related to brakes. Can I publish that ordered, processed list without having to go through copyright clearance?​"

One of the reasons that answering these questions is hard is because there has been very little case law — judgements by the courts about what legislation means, that would help to firm up the currently fuzzy boundaries. For example, it's not 100% known exactly what you have to do to have a database right in the first place. Nor is it known exactly what amount of extraction or types of reutilisation are lawful.

And this is the big reason why open data is helpful for reusers. If data is published as open data, you do not need to consult a knowledgeable lawyer about what you are doing to get the data you need, and then make a judgement about whether you're prepared to take the risk of a court case.

Publishing data as open data significantly lowers the legal overhead for both publishers and reusers, leaving more resources available for everyone to get the most value out of the data itself.