An open future: how the National Institute for Health Research is opening up data

People holding devices and paper showing graphs and uploading - only hands/devices in shot

Sun Sep 23, 2018

How can open publishing help to streamline business processes and help patients make better informed decisions about clinical trials? Richard Dudley, Chief Data Officer of the NIHR’s Centre for Business Intelligence, explains

At the National Institute for Health Research (NIHR) – the most integrated clinical research system in the world – we handle a lot of data. Collectively, we represent all publicly-funded research in the NHS, and there are NIHR centres across England, all of which generate datasets. These datasets range from being based on the research itself, through to administrative and operational data about the management of the research.

The issue

As is often the way in large organisations, there is a huge variety in the format and structure of data, the methods of publishing, and the type and maturity of systems used.

In the same way that, for example, train timetable data used to be locked away within separate systems, a lot of data we hold is also siloed across the sections of the organisation. And, in the same way as train timetables are now standardised and centralised, the NIHR is moving towards publishing as much data as possible in a centralised, standardised and open format.

It is really important to note that we are publishing data about our processes, funding and research – and not the raw data from within the research studies, some of which is of course protected.

The current inaccessibility of some datasets can cause problems. At the NIHR Centre for Business Intelligence we deal with many requests for data from both NIHR staff and external audiences.

Often our internal teams will need some information for a report or presentation – and sometimes can’t find what they need quickly. Staff in our core team here are responsible for providing the data to satisfy parliamentary questions, Freedom of Information (FOI) and other external requests, and spend a lot of time collating information from different sources.

...we are passionate about opening up data to enable better decision making, help identify trends and gaps, and make health research faster, easier and more efficient

Open data solution

An obvious solution to improve access is to centralise and publish datasets. On top of this, if we publish using an open data platform under an open licence, we provide access not only to internal teams, but to anyone to view and use, under the terms of the licence.

As well as being a practical solution, as a custodian of public funds, we are required to be transparent and accountable. Publishing open data can help to meet this requirement and in terms of perception, also demonstrates a commitment to transparency.

But it’s more than this. We want to look beyond our statutory obligations and internal benefits: we are passionate about opening up data to enable better decision making, help identify trends and gaps, and make health research faster, easier and more efficient.

NIHR launches first open dataset: video explainer

The first dataset

The first dataset to be published openly is the NIHR funded research portfolio. It is data about the research we fund, the funding amounts awarded, the programmes of funding and health category information for particular studies.

This is the most requested dataset from the NIHR – and thanks to a lot of work across the major NIHR coordinating centres much of it is now centrally held, robust and well structured: it is therefore ideal to showcase the use of open data. It is published using the OpenDataSoft platform.

Why publish open data?

By streamlining data access for researchers, clinicians and policymakers, we hope to improve health research, leading to faster delivery of better treatments that directly benefit patients.

A single point of access for these datasets will make life easier all round. For my team, knowing that there is one access point for up-to-date, accurate, open data, about the most-requested dataset, is invaluable. We get loads of internal requests for information, as well as parliamentary questions and FOI requests: it should be win-win from a resource and budget point of view.

I’m also excited about the potential benefits to our external audiences. From a grass-roots level, being able to see the scope and scale of what’s been funded could help patients make better informed decisions about signing up to the latest trial. For a clinical trial to succeed, a minimum number of patients have to be recruited to give the result of the trial the necessary statistical significance. In general, better recruitment leads to better trials.

Also, funders and researchers will be able to spot trends and gaps, collaborate more easily, pool resources and avoid duplicating effort.

For technical and data innovators, I’m hoping the datasets will be used to power innovative web services and apps, that can interrogate the data to produce useful services for researchers, funders and patients.

Risks

There are of course risks with publishing data openly, ranging from the quality of the data itself, to the technology, to the risk of accidental or wilful misinterpretation. While risks around the data quality and the technology are within our control to manage and mitigate, the risk of misinterpretation occurs post publishing – by third parties – and so is harder to predict.

However, rather than increasing this risk, the publishing of accurate open data presents an opportunity to decrease the risk of misinterpretation. The data was of course previously available to analysts and the media, but across various sites in different formats, and not as a full picture – the silos and different formats were difficult to compare and analyse accurately. Open publication provides a great opportunity to contextualise the data and provide an accurate, comprehensive picture, actually reducing the risk of misinterpretation by the public and the media.

Context is key here: alongside the data we have published metadata to fully describe each set of data and explain any limitations or omissions, alongside context for the data – eg, coding and apportionment information, as well as the taxonomies used for coding our portfolio..

OpenDataSoft provides documentation to support use of its platform, making it clear and user friendly, and any APIs provided in the future will also include service documentation to support their use.

Similar projects and next steps

We’ve been watching with interest the progress of the CrossRef Funder Registry, which lists who funds research and where it has been published, across the globe. As well as the obvious win of having everything listed in one place, CrossRef has found that funders can track the published results of their grants; research institutions can monitor the published outputs of employees; publishers can analyse the sources of funding for their authors; and the public can see how research is funded, and by whom.

We’re hoping to realise similar benefits with our open data publishing project, and as well demonstrating transparency, driving efficiencies and speeding up information requests, we want to deliver tangible benefits to the health–research community and beyond.

If you have comments or experience that you'd like to share, pitch us a blog or tweet us at @ODIHQ.

About us

Our five year plan

What we do

Solid

Membership