London - Palace of Westminster, UK

Ahead of the much-anticipated publication of the government's AI Action Plan, this is the evidence we submitted for consideration in August 2024.

According to the Labour Party manifesto, the National Data Library aims to ‘bring together existing research programmes and help deliver data-driven public services, whilst maintaining strong safeguards and ensuring all of the public benefit'. While details are yet to be agreed, the primary aim is to create a more integrated and accessible data infrastructure across the Government for social and economic benefits. Whatever the specifics of its socio-technical architecture, there are considerable benefits in ensuring that the Library is by design AI-ready.

There is understandably a lot of excitement around the potential for AI to improve public services. For instance, The Alan Turing Institute has estimated that of the 143 million complex tasks performed by civil servants every year, approximately 1,200 person-years of work could be saved if even just one minute could be freed up through AI-enabled automation. That requires quality data and robust data management and governance; the National Data Library should leverage the UK's world-leading capabilities to get them right from the start.

This includes some well-tested data hygiene measures that the ODI has helped establish over the years: using open, interoperable standards and technologies to share and safely access data; identifying and fixing data gaps and other limitations through continuous assessments and user feedback; setting up clear governance structures that are conducive to innovation while maintaining public trust; and giving people and communities meaningful tools to have control over how their data is used and by whom. This also includes some emerging AI-related measures that will ensure that the data assets of the National Data Library can be used effectively and responsibly by AI innovators in the public and private sectors. Our data-centric AI programme and recent white paper on building a better future with data and AI discuss these measures in detail, and we welcome the opportunity to have your feedback on them.

Beyond that, the latest AI advances create new opportunities to refresh how we think about infrastructure design, operation, use and impact:

  • Data prompting tools allow civil servants to publish and analyse complex data even when they are not trained data specialists. The same technologies can help data users find the data they need and repurpose it easily without having to browse through large data repositories using technical language.
  • AI can integrate across data silos, whether by identifying mappings between data vocabularies, or automatically translating from one technical format to another. It can undertake basic data cleaning and augmentation, such as removing duplicates and suggesting useful missing values.
  • Pre-trained models can create realistic synthetic data to supplement or replace data that is not fit for purpose or cannot be widely shared, encouraging prototyping and experimentation. Federated learning and similar approaches allow for models to be trained without compromising on data governance.

All this requires more investment in the UK's data-centric AI ecosystem, start-ups and scale-ups providing services and tools that collect, package and optimise data for AI use, label it for fine-tuning and safety testing, carry out data audits, or document provenance and other quality trails to allow others to use the data with confidence. It also calls for new open standards for data exchange, provenance, documentation and assurance for AI, stewarded by trusted, neutral conveners to ensure all relevant voices are heard. The National Data Library could be the perfect showcase and resource to boost this sometimes undervalued part of the AI economy, while also leading the field in responsible AI-data practices, which are badly lacking all over the world.

What data to include

Whether used for AI or in other contexts, ultimately, the success of the National Data Library hinges on the quality, relevance, and accessibility of the data it stewards. As such, we recommend bootstrapping it with data from three areas: high-quality public data, federated Trusted Research Environments (TREs), like NHS Digital’s TRE and INSIGHT Health Data Research Hub, and cultural heritage data (National Archives, Museums, etc.).

While the specific use cases will have to be co-designed with relevant stakeholders, we believe there could be a lot of benefit from starting with these areas in combination. The data published across them varies in content, technologies used, and potential for AI disruption. These are high-value datasets, fairly well-structured and of reasonable quality to be reliably deployed in public-sector projects without massive governance changes or DevOps investment.

When made available to diverse digital innovators, these datasets could unlock substantial benefits for public service delivery and catalyse the ecosystem that will be essential for delivering sustainable impact. We already have many solutions for making open data available and a lot of experience in how to do things well or less well. When data should not be made available publicly, TREs, such as OpenSAFELY, offer strong guarantees for data to be used for innovation and research safely and ethically. Cultural heritage data will ensure the Library stewards data about people and places, capturing the diversity of the UK.

Beyond this high-impact Library MVP, it is also crucial to consider how other data assets, closed, shared or open, could be collected, stewarded, and shared safely. Undoubtedly, the richest and most valuable data in the public sector relates to individuals; however, public trust in how personal data is handled is low. Besides AI, this has been a priority area at the ODI in recent years: helping government and industry build trustworthy data ecosystems that enable safe data sharing without compromising on people's rights to privacy or their desire to have more say over how their data is used.

One privacy-enhancing technology that is particularly relevant in this context is Solid, an open standard for structuring data, digital identities and online applications, which allows people and organisations to store their data securely in a decentralised way. Solid, which the ODI recently became the custodian of, is already in use across a number of pilots and would give the National Data Library a cutting edge compared to prior public sector data projects, where this level of data agency was at most an afterthought, with serious consequences for adoption. ,

Putting users at the core of the library

There will be a number of options for how to design the Library from a technical point of view and a lot of excitement in some parts of the community to go for the shiniest technical solution. As noted earlier, we believe in the power of AI and other emerging technologies to tackle some of the challenges of past data initiatives. At the same time, we know that a user-centric approach will be key to designing something that delivers to its promise. This includes reviewing key sectors, engaging with experts and a diverse range of stakeholders, deciding on strategic use cases, and identifying the technological, legal, and operational enablers for sustainable data sharing and use.

We have learned from other government projects the importance of prioritising use cases that make a material difference to people's lives, and engaging diverse voices, including those affected by data - or AI-enabled public services - from the start. Smart data schemes, as well as similar programmes that the ODI has pioneered over the years, like data incubators, have taught us how to bootstrap innovation around public sector data early on and create a community that is invested in improving and maintaining the data over time. With AI in mind, there are exciting opportunities to attract specialised solution providers, typically start-ups and scale-ups, to strengthen the UK's data-centric AI market.

A socio-technical data infrastructure

Our vision for the National Data Library is grounded in creating an inclusive resource that enables access to current and future public sector data while addressing known challenges around data heterogeneity, curation, maintenance, safe access, privacy, and responsible use.

Data across the public sector resides in a mixture of pooled and federated systems. This can be a feature, as it allows data holders and stewards to attend to their data work independently, leveraging existing assets and capabilities and meeting critical use cases. But, it also requires a modular, extensible, decentralised architecture that can manage this complexity without compromising on safeguards and accommodate new sources of data and ways to share it with others. This architecture is not just about technology. Among other things, it must include clear governance structures that will give businesses the confidence to invest in AI innovations with this data and maintain public trust. Bootstrapping the user ecosystem from day one will be absolutely crucial. We've seen too many well-intentioned data initiatives fail because of a bias towards making the data available rather than prioritising the use of the data and capturing its impact to fuel sustainable maintenance.

We've commented on some of the key design principles of the technical infrastructure earlier, which the ODI and other organisations have followed to create impact in a number of sectors. Whatever the choice of technology provider and their delivery credentials, bringing in expertise from academia and civic society will make a real difference in ensuring that the infrastructure supports ethical data use and engenders public trust, aligning the Library with national digital standards and practices.

While the technical foundations of the Library are built, including catalogues of interlinked data assets, access safeguards, standardised APIs, and other more accessible means to make sense of and explore the data (e.g. using generative AI), it is important to decide on the governance structure. The aim has to be to protect individual rights while fostering innovation; this may require more thinking into how these goals could be achieved for AI use cases, as there are very few examples of well-governed AI data.

Who we are - the ODI

For 12 years, the ODI has been at the forefront of exploring, researching, and designing many key features and structures essential to the National Data Library's success. We were instrumental in launching data.gov.uk, the UK government's flagship open data platform, and have provided expert advice on data infrastructure to over 50 countries. We have designed data standards that have generated significant economic value, such as Open Banking and OpenActive, and trained over 100,000 people in data literacy skills across businesses, governments, and the third sector. Our previous applied research programmes in data ethics, institutions, and governance have resulted in new data governance structures that allow very sensitive data to be used in cutting-edge health research, such as our work on the Insight project, which makes available anonymised data from eye scans and images and advanced analytics to NHS, academic, and industry researchers.

We at the ODI are particularly excited by the government’s vision for the National Data Library and its transformative potential because it represents the realisation of our long-standing calls for strong, comprehensive data foundations upon which advanced public services can be built and cutting-edge technologies developed.

Thank you once again for requesting our input to the AI Action Plan. If any or all of our ideas are of interest, we would welcome the opportunity to discuss them further.

You can read below a blog we published in September 2024 about our evidence submission.