Standards (Copyright: ODI)

Publishing open data in times of crisis

During times of crisis decision making needs to be informed by the highest quality, most up-to-date data we have. This can only happen if it is published in a reusable form: as open data.

In this guide we are going to cover the basics of how to publish open data in times of crisis and highlight some of the best and worst examples from across the world.

This guide is written for:

  • those who hold data that could be beneficial to others in respect of the current crisis
  • those who do not have an existing method of publishing data

We have made the following assumptions:

  • The data has been collected by you/your organisation
  • You have been given approval to publish the data from those accountable for it within your organisation
  • There are no legal or regulatory reasons why you cannot publish the data
  • The data does not contain individual-level data; for example, you don't have a spreadsheet where each row contains information about one person.
  • There are no issues with and no third parties have any intellectual property rights over the data.

This document is not legal advice and if you are uncertain about your legal right to publish data you should seek guidance from legal professionals.

Step 1: Put the data online!

There are three ways to do this:

1. Do it yourself

This doesn’t have to be complicated. At its simplest you could upload a file to a publication platform such as dropbox, Google or One Drive and share the link with people.

ODI recommends: Github – A popular code/data hosting site where you remain in control. If you are new to Github, register for an account at github.com and then head to octopub.io which is a tool created by the ODI that makes it easy to publish data on GitHub. Login to octopub using your Github account and follow the step-by-step guide to publishing a dataset.

2. Use an existing open data platform

Many organisations, including most governments, already have an open data platform. Alternatively there are community platforms such as data.world, AWS public datasets and OpenStreetMap that allow for user contributions. These platforms often have specific restrictions on the type or format of data you need to provide, but offer additional functionality for reusers.

3. Get in touch with someone who can help you, or do it for you.

If you are facing challenges, contact us and we’ll either help you get it online or point you to a community who can help you.

If you contact others, remember to ask them to make the data available to everyone, under an open licence (see step 2); be careful not to donate your data to organisations who restrict who can reuse it.

Step 2: License the data openly for others to use

Without a licence, others cannot lawfully reuse your data.

When you put care and thought into creating something, such as writing a blog post, taking a photograph or collecting data, you have certain intellectual property rights over that work. For others to use your work, they must seek permission from you. This permission is best given explicitly in the form of a licence.

There are a number of off-the-shelf open licences available. We recommend using the Creative Commons Attribution Licence. This licence allows a reuser to share data in any medium or format and adapt and build on the material for any purpose, including commercially. It also ensures you and the data you publish is acknowledged.

Many platforms, including Octopub.io, have common licences built in so you don’t have to do anything extra, just select it during your data publishing process.

Embedding a creative commons licence on your website:

  1. Open the choose a licence page on the creative commons website.
  2. Allow adaptations of your work to be shared – YES
  3. Allow commercial uses of your work – YES
  4. Fill in the “help others attribute you section”.
  5. Follow the guide to help you embed the licence alongside the data in your output or website.

We recommend that you allow both adaptations and commercial use of your data to enable the broadest impact of this data release. Changing either of these to NO means you are no longer making data open and restricts who can benefit from it. Read more about the impacts of not allowing adaptations or commercial reuse.

Adding a plain text licence alongside your dataset:

  1. Open Wordpad or a plain text editor and paste in the following text (editing the name of dataset and authors)<DATASET NAME> (c) by <AUTHOR(S)> <DATASET NAME> is licensed under a Creative Commons Attribution 4.0 International License. You should have received a copy of the license along with thiswork. If not, see <http://creativecommons.org/licenses/by/4.0/>.
  2. Save this in a file called “LICENCE” alongside your data files.
  3. [Optional] In the LICENCE file, you can also include the full licence text. These are available here. Make sure you select the text to match your licence from above, for example BY 4.0 (plaintext).

Step 3: Tell people about it!

Data becomes findable through it being well described and people linking to it.

Make the data findable

To make data findable it needs to be described in a way that both humans and machines can find it.

Try to include the following clearly alongside the dataset:

  1. A title and description.
  2. The licence under which the data can be reused.
  3. The publishing organisation with a link to the organisations website, including contact details.
  4. Details on when the data will be updated and how often.
  5. Details about how the data was collected or otherwise generated, and any biases that might arise from that process.
  6. A description of the structure of the data, such as how to interpret column names and codes used in the dataset.

Octopub.io, like most publishing platforms, will ensure your data is well described. It does this through the creation of a human and machine friendly web page through which the data can be accessed without needing knowledge of Github.

If you are publishing your own dataset in GitHub, create a file containing all the details above in Wordpad (or another plain text editor) and save it as README.md alongside the data.

Communicate

To make the data findable, you need to get people to link to it. This can be done by blogging about it, listing it in others’ data collections or portals or sending tweets.

Here are a list of community resources we recommend:

  1. The Coronavirus Tech Handbook - Newspeaks House’s crowdsourced library of tools, services and resources relating to COVID-19 response. It is a rapidly evolving resource with thousands of expert contributors.
  2. #Data4Covid19 - GovLab’s living repository to build a responsible infrastructure for data-driven pandemic response.
  3. #opendatasaveslives - A community established by ODI Leeds to help gather useful resources, create things openly, and enable others to engage with data about the crisis.

You can also contact us directly and we’ll amplify your contribution in our networks.

Frequently Asked Questions

I’ve read this and still don’t know where to start! What data would be most useful?

  1. Try reaching out and telling people what you have before you put the effort into publishing it. Maybe:
    • Share what you have on twitter and tag it appropriately, e.g. #covid19 #opendata #data #opendatasaveslives.
    • Reach out to those who may be interested in the data or are already publishing similar data.
    • Get in touch with convening organisations, including us who may also be able to help advise.

What file type or format should you make available?

  1. Make it a data format lots of people can reuse such as CSV, XLS (Excel) or JSON and that is easy for you to produce.
  2. You don’t have to only upload the data in one format; use many. Make at least one of them a simple, widely used and accessible format that doesn’t require complex, proprietary or code libraries to access. Nothing wrong with the latter, but make these in addition to a simple format!
  3. Data in a PDF is not easy to reuse. Use PDFs for informative documents, not data formats.

How do I make my data interoperable with others?

  1. Make the data available in an easily reusable format (see the previous section). If possible, document your data structure: describe the columns you are using and what they contain.
  2. If you can, create and share the schema for your data. A schema is a machine-readable document that describes the structure of the data. Octopub allows uploading of a schema if you have one, but don’t make this a blocker to getting the data out there. Even if you can only write a few paragraphs about the syntax, column types, etc. it will help others use the data, and publish data in the same way.
  3. Do use existing standards where you can, especially when describing geographic areas. But, don’t worry about trying to comply with complex data standards if it’s going to delay the publication of data. If your data does conform to standards, tell people this. Put this in the description or upload a complementary informative document alongside your dataset to help people understand the data as much as possible.

Should we all try and convene to get our data into one global standard and repository?

  1. NO! It’ll take too long. But do check if there may be a standard you can reuse. If not, publish the data, the standard will have a chance to emerge later.
  2. Try and find existing communities of practice, peer organisations or researchers who might need your help and work with them.
  3. Make sure you describe the column titles/fields in your data so others can understand and translate them. Also link to other stuff!

Do I need an API for my data? What is one? How do I create one?

  1. An Application Programming Interface (API) is best described as a promise by one system to another on how the two can interact. The human API is our spoken (and written) language; it is how we interact.
  2. Do you need an API for your data? Yes and No. The web has an API and by putting your files online we hope you are using the web’s API.
  3. The basis of the web’s API are web addresses  (e.g. https://www.theodi.org/news). These locators (or URLs) let humans and machines consistently access content. When publishing data the best thing you can do is to make the location of the data itself consistent so both humans and machines can access and link to it easily.

Here are some examples of good consistent data URLs:

  • http://mysite.org/data/all-data.csv
  • http://mysite.org/data/2020-04-19.csv
  • http://mysite.org/data/2020-04-20.csv

Try and guess what the URL will be for the 30th April data? Easy hah!

  1. DON’T make the data available using a one time link that only works if a human navigates to the page and clicks a specific button. Data should be easy for machines to consume, in the same location, using the same URL.
  2. 100% DON’T, put the data behind a captcha, forcing a human to come to the page and complete a complex puzzle to get hold of the data!

Case studies

Excellent

Italy has been publishing open data daily on GitHub since the beginning of March, with regional breakdowns, and numbers of people self-isolating, hospitalised and in intensive care. All openly licenced!

In the UK, Public Health England provide a dashboard outlining the daily statistics related to infections and deaths. You can also download the data in CSV format. Use is covered by the UK Open Government Licence.

Could do better

Belgium is providing province-level open data on cases and deaths, broken down by gender and age group, and numbers of people in hospital, ICU, and receiving respiratory support. Fantastic stuff however the licence could be clearer as there doesn’t appear to be one. It says “All rights reserved” at the bottom, but it is unclear if that has any legal implication.

Google and Apple have both published community mobility reports. These reports are valuable to evidence the adoption of social distancing and provide insights into how people are conforming to lockdowns in different regions across the world.

Google were first to publish their reports, in PDF document format alongside their standard terms and conditions. Since then they have now published the underlying data in CSV format to download and use with attribution.

Apple chose to publish their data in a more web friendly way and also made the underlying data available in CSV format for anyone to use during the COVID-19 crisis with clearer terms.

Google’s CSV link is the best as it follows the API guidelines above while the Apple links are not consistent and/or accessible by a machine.

Avoid

Singapore has been publishing detailed data about every infected person, including their age, gender, workplace, where they have visited and whether they had contact with other infected people. Fundamentally this undermines people's right to privacy, has the potential to enable the targeting of individuals and groups and could lead to unintended consequences relating to autonomy and surveillance.

Related guides

Sharing data safely in times of crisis 

Key points to consider to ensure legal, ethical, commercial and societal risks are managed when sharing data.

Coming soon.

Anonymising data in times of crisis

How to anonymise data in order to reduce the risk of individuals being identifiable while maintaining the utility of data.

Free access to ‘Anonymising data in times of crisis’ is available here.

Further reading

eLearning: What is open data

Open data is data that anyone can access, use and share. Governments, businesses and individuals can use open data to bring about social, economic and environmental benefits.

This module explore:

  • What is open data?
  • What is data?
  • What makes data open?
  • Why do we need open data?

Free access to 'What is open data' is available here.

Guide: Publishers guide to open data licensing

What does it mean to license data? What licence should you use? How can you indicate the licence that a dataset is available under? This guide answers these questions

Read 'Publishers guide to open data licensing' here

eLearning: Open data licensing

In order for data to be open, it should be accessible (this usually means being published online) and licensed for anyone to access, use and share.

This module explores:

  • Why open data needs to be licensed
  • How licences unlock the value of open data
  • What type of licence suits open data
  • How to provide for open data licensing in the tender, procurement and contracting lifecycle

Free access to 'eLearning: Open data licensing'  is available here.

eLearning: Open data formats

The 'format' of an open dataset refers to the way in which the data is structured and made available for humans and machines.

Choosing the right format helps ensure the data can by simply managed and reused. To maximise reuse of data, it may be necessary for a publisher to use a number of formats and structures available across different platforms to suit users' needs.

This module explores:

  • Why formats matter to open data
  • Choosing the correct structure
  • Accessing different open data formats
  • Keeping it simple with CSV

Free access to 'eLearning: Open data formats module' is available here.

Guide: Marking up your dataset with DCAT

The Data Catalog Vocabulary (DCAT) defines a standard way to publish machine-readable metadata about a dataset.

This guide will help you to add machine-readable data to your dataset in order to increase its findability.

Read the guide 'Marking up your dataset with DCAT' here

About

Open for feedback

This guide has been produced by the Open Data Institute, and published in April 2020. Its lead author is David Tarrant with contributions from Deborah Yates, Jeni Teninison and Olivier Thereaux.

CC BY-SA

This guide is published under the Creative Commons Attribution-ShareAlike 4.0 International licence.