800px-A_view_of_the_server_room_at_The_National_Archives

Earlier this year, the Labour manifesto laid out a vision for technological innovation going hand in hand with economic growth and societal change, building the UK into a science and technology superpower by 2030 while enabling it to play a leading role in wider geopolitical conversations. Here at the ODI, we are particularly interested in Labour's plans for digital and technology policy in the UK.

One policy now gaining momentum is the creation of a national data library (NDL) to “bring together existing research programmes and help deliver data-driven public services, whilst maintaining strong safeguards and ensuring all of the public benefit.” Since the election, there has been discussion - some of it behind closed doors - about what an NDL should look like in practice. This blog post explores some of these discussions. It articulates the process, the priorities and the dos and don’ts that we think the government must work through to ensure that the National Data Library works for government, industry, research and society.

What could the National Data Library be like?

At the ODI, we’ve set out a vision for an AI-ready National Data Library, focusing on developing an AI-ready, robust, sustainable data infrastructure. The NDL should enable the opportunities arising from AI through user-centric design, curation of high-value datasets and an infrastructure based on open standards, federation, and interoperability. We see the NDL as a fundamental piece of digital infrastructure for innovation in the UK going forward, specifically if the government is to realise the opportunities presented by new technologies including AI. We said more about this in our recent input on developing the AI Action plan.

From a policy point of view, the concept of a National Data Library has been credited as having emerged from Onward who called for “a British Library for Data — a centralised, secure platform to collate high-quality data for scientists and start-ups”. Onward’s vision for an NDL focuses on making AI-ready data available for “AI advancement, particularly for new powerful, tailored foundation models” with “robust privacy-preserving mechanisms”. This version of the library posits an NDL as a centralised mechanism for the government to become a provider of data for AI, enabling technological and economic growth off the back of it.

The Labour version appears slightly different, focusing specifically on 'bringing together existing research programmes, [...] data-driven public services that all of the public benefit from'. There is limited information about how the National Data Library will support these aims as work is ongoing to define and shape them. However, it will likely involve creating a more interoperable and accessible data infrastructure across government, enabling safer and easier data access, which delivers social and economic benefits.

Others have considered the NDL in different forms. On the eve of the General Election, Gavin Freeguard (an advisor to the ODI) took the concept of a traditional library as the basis for how an NDL should act as a custodian of data playing a “catalysing and coordinating role for a wider ecosystem”, improving the UK’s data practises more broadly. Meanwhile, Dr Emma Gordon, Director of ADR UK describes how an NDL should be about making data which already exists more accessible, streamlining access for researchers. Theo Blackwell, the Chief Data Officer for the Greater London Authority, has proposed a London Data Library, riffing on the idea of a national library. The London Data Library would exist in tandem with the London Data Store to enable smoother data access, sharing, and ultimately improving data discovery for a “smarter, more sustainable, and inclusive London”. There are other proposals from Tom Westgarth (Tony Blair Institute), focused on scientific data infrastructure, and soon after the election, the Connected Places Catapult set out a vision for a decentralised NDL building on lessons learned from other data sharing infrastructure projects like Stream and the EU Data Spaces.

The government could also take inspiration from other countries and organisations that curate datasets or enable data sharing for public benefit. In the health domain, there are a number of these types of organisations already thriving. UK Biobank is one such example. Data from the UK Biobank has been used to understand the impact of Covid-19 on the brain, obesity and more. The UK Biobank has a robust process for facilitating safe access to data to the right people for the right reasons. Further afield, there are efforts in the US to build a National Artificial Intelligence Research Resource which makes available “government-funded, industry and other contributed resources in support of the nation's research and education community”. If bringing data together is one of the main aims of the Library, then X-Road from Estonia is a fantastic example of data infrastructure built for interoperability and seamless data access. Smoother and safer data sharing between different government institutions saves time and enables better delivery of public services.

How do we decide what version of the National Data Library to build?

In our view, there are several prerequisites to designing data infrastructure like a National Data Library. You must use open, interoperable standards and technologies to share and safely access data; identify and fix data gaps and other limitations through continuous assessments and user feedback; set up clear governance structures that are conducive to innovation while maintaining public trust; and give people and communities meaningful tools to have control over how their data is used and by whom. An NDL will require high-quality datasets, which must be curated from across existing government departments, research programmes and beyond. For a National Data Library to be AI-ready, processes to ensure this is the case must be built in from the start. Finally, the current lack of public finance available means affordability of any solution will be key.

This is not the first attempt at connecting and bringing together government datasets in one place, and this effort must learn from lessons of the past. For example, there have been efforts to centralise government data in one place, which have, for various reasons, not worked (too much bureaucracy, no sustainability plan etc.). The NDL should take lessons from different aspects of other efforts to steward and connect data from around the world.

At the ODI we’ve worked on the topic of data stewardship, the process of collecting, maintaining, using and sharing data, since 2020. This includes a range of socio-technical architectures, including decentralised solutions that empower data holders. Ultimately, many questions about the design of a National Data Library are about how to steward data safely, securely and effectively.

When we work with organisations to design and build data institutions, we utilise our facilitating safe access framework. We start by defining the purpose of stewarding data. What role will the organisation play in helping data get to the right people for the right uses? A user-centric approach to design will be key to building something that delivers on its promise, so who are the key audiences for the Library - researchers, AI developers, public servants or even members of the public? Once the purpose and users of the NDL are clear, subsequent decisions about the legal foundations, the governance model, the technical infrastructure, the funding and the sustainability model to enable the institution to survive and thrive in the long-term will flow.

We conclude that there are many different ways to build a National Data Library from a legal, technical, commercial and governance perspective. A robust diagnosis of the problem you are trying to solve is a crucial part of building this data infrastructure - what are the current issues, where are the gaps in our data ecosystem, who is currently not accessing data that they need, and what are the aspirations? Answering these questions will ensure we build a National Data Library to meet the needs of the UK for years to come.

We’re thinking about what a National Data Library and other innovative UK data infrastructure could look like in practice, keep an eye out for further posts exploring these ideas in more depth. If you’d like to learn more about our thinking on data stewardship, get in touch at [email protected].