Our Head of Research and Development Olivier Thereaux describes the importance of interoperability and standards in data, in the context of crises and their recovery, and outlines lightweight steps that data stewards can take
There are many ways to ensure that data leads to better decisions: making it as open as possible, ensuring that data is accurate, that its quality is high, that it is as free-as-possible of problematic bias. All those things contribute to data that is findable and usable by people who need it, and contribute to positive outcomes while avoiding harmful impacts.
Another mechanism we can use to make data better and more impactful is standardisation: creating a broad agreement on process and technical aspects of the data means that data can be used more broadly, more effectively, and that shared tools and systems can be created.
All this is all the more true in the context of crises and their recovery. In the current worldwide health crisis, many governments and organisations are publishing or sharing data: government health agencies are publishing data about cases and deaths on a regional basis; symptom trackers are sharing data with researchers and some are making data open; tech companies are opening mobility data which can help us understand the impact of the virus on our lives.
Inconsistency hurts efforts to use this data, and may even cause harm: health authorities and journalists collating information from various agencies and governments have found that data was being shared in a variety of formats, with different vocabularies used. This makes the work of data analysts slower and harder as they need to write new software/code to handle different datasets. The confusion this lack of standards causes, with data from different sources telling a different story, can lead to further erosion of public trust. This is made difficult by the uncertainty inherent to this crisis: we don’t yet fully understand the virus or its effects, and that is reflected in the data.
In this context, standards for data seem like a necessity, but an elusive one: creating standards typically take time, but it is particularly hard in a volatile situation where new knowledge keeps coming to light; and it is still often unclear where standards would be needed to create impact. In most cases, the wisest move would be to not seek to create new standards, but instead to find, reuse and adopt existing ones whenever possible.
The importance of interoperability and standards
It is important to recognise that there are many aspects of data that can be standardised independently. While overall consistency is desirable, the use of shared identifiers, an agreement on a vocabulary for key terms, or a commitment to use consistent formatting can have a large impact individually. The Open Standards for Data guidebook suggests that one viable approach is to standardise ‘from the outside in’: to use common formats and identifiers as a way to reduce friction, before standardising more complex things like taxonomies, vocabularies and processes.
In some cases, standards may be decided and imposed from the top down, but for most of those things which do need to be agreed as new standards, there will be a need for collaboration, coordination and a healthy dose of listening and creating buy-in (such as this initiative by the Open Data Charter), so that standards can be created effectively. This may feel like a waste of time, especially in the midst of a crisis, but the alternative is a harmful fragmentation of the data formats, vocabularies and identifiers which put undue pressure on the users of the data, and may cause them to make mistakes that could cost lives.
This obviously does not mean that data stewards must stop their efforts to share or open data until standards can be agreed, but it does mean that a prerequisite for successful data use in this crisis is for data holders to understand the importance and impact of interoperability and standards.
There are also lightweight and effective steps data stewards can take to create more interoperability without the need to start full-blown standardisation efforts:
- Anyone aiming to publish data needs to first check whether someone else has already published similar data, and whether they can follow a similar approach – it is worth looking at the online presence of similar agencies or organisations in other countries, for example.
- And if there is no precedent to follow, it is all the more important to create clear and open documentation about the data they hold and how it is being structured, so others can follow your approach.
Such steps can create a gradual, organic increase in interoperability and consistency across data publishers: if data publishers do not need to reinvent the wheel time and time again, if it is easier for data stewards to help each other, consistency can emerge, leading to the creation of common tools for collecting, sharing and analysing this data. Some of it will mature into full-blown standards. In other cases, it will at least reduce the fragmentation of data, and create a more healthy data infrastructure which will help us deal better with the next wave of this pandemic, or whenever another public health crisis hits next.
Open Standards for Data
The ODI offers a free resource: the Open Standards for Data guidebook, worth reading for clarity on the different types of standards, and guidance on when to create standards and when not to do so. If you are a data steward looking for guidance on how to use the guidebook or need advice on standards, consistency and interoperability for data related to Covid-19, the ODI may be able to support you: please get in touch.