800px-A_view_of_the_server_room_at_The_National_Archives

In January 2025, the UK Government and Matt Clifford launched the AI Opportunities Action Plan, which aims to support AI development, helping to “turbocharge growth and boost living standards”. A key aspect of the plan is to unlock data assets in the public and private sector, with a National Data Library (NDL) at the core of these efforts. The plan recognises the key role that data plays in enabling AI development, which in turn will lead to further innovation and growth.

The ODI has been contributing to the growing discussion about the government’s proposal to establish a National Data Library. We have set out our vision for an AI ready NDL as a data institution that enables safe access to data and provides robust foundations for modern AI driven public services. We have also outlined how we think the NDL should be built: using open standards such as Croissant; giving people control of their data using technologies such as Solid; and learning from existing, successful data sharing initiatives such as Stream or UK BioBank.

We recognise that many sectors have an interest in contributing to the design of the NDL and our research team recently responded to a Wellcome trust and Economic and Social Research Council (ESRC) call for NDL whitepapers that set out technical visions and architectures for a UK NDL to make public sector datasets more accessible to researchers and enable future science to thrive.

The UK is data rich, with a wealth of data generated by key sectors including health, crime, education and financial services. These data hold immense potential for scientific research and innovation, particularly if combined with AI technologies. However, as we outline in our feedback on the European Commission’s GenAI Code of Practice, its value often remains untapped due to challenges in transparency, accessibility, interoperability, and analysis. These challenges hinder the UK's progress as a science and AI superpower. The ODI strongly believes that the NDL, and the wider Government, has the potential to help solve these problems, and we explore this potential in our white paper.

In the white paper we explore how the ODI’s model of facilitating safe access to sensitive data across the data lifecycle, can be used to guide the development of the NDL. Our framework provides four dimensions (technical, governance, legal and commercial) across which we recommend those designing the NDL should analyse and co-design systems that balance provision of value, ensure interoperability, and enable protection against harm.

Although commercial and legal considerations like adequate data licensing are important, our white paper focuses on the technical and governance dimensions. This is because these are the critical dimensions that those designing the NDL need to consider if they are to succeed in making the NDL AI ready and of most value to UK science.

Technical Dimension

The technical infrastructure of NDL needs to meet the principles and capabilities of the National AI Strategy. Our conceptual blueprint shows the key enabling elements of the architecture of NDL that are essential for enabling its AI-readiness these are:

  • Metadata & dataset documentation, which describe datasets broadly according to various relevant metadata formats and standards (such as Croissant, mentioned earlier), and enacts a solid ground for achieving data interoperability;
  • Analysis, which contains services and tools that examine datasets at a deep level inferring statistical distributions, outliers, biases, dataset uses, links, etc and lifting this information to enrich metadata & documentation;
  • Benchmarks and dashboards, which assess the AI-readiness of underlying datasets through the execution of metrics, KPIs, visualisations, etc. and provide a global decision-level view of how much AI-ready datasets in the NDL are;
  • Open standards, which are orthogonal to all AI-readiness services.

The paper explores each of these areas in more detail, recommending approaches and standards that could be adopted and summarising considerations those designing the NDL should be making.

Governance Dimension

We know that organisations that steward data make important decisions about how data is accessed, used, and shared, including who has access to data, for what purposes and to whose benefit. Governance of data can look different across different organisations and diverse types of data and require different mechanisms to ensure that data is accessed, used, and shared in the right way.

We know that a user-centric approach will be key to co-designing governance that enables the NDL to deliver data that is AI-ready. This includes reviewing key sectors, engaging with experts and a diverse range of stakeholders including the public, deciding on strategic use cases, and ultimately identifying the governance model which best suits the aims of the NDL.

Next steps

We look forward to continuing discussions with Wellcome, ESRC, DSIT and other interested parties to shape the future of the NDL as we see this as a critical component of the UKs data infrastructure, providing the trusted access to data that will be required to make the UK an AI power house. More broadly, as part of our data-centric AI programme, we are thinking about what AI readiness means in practice.

If you’d like to learn more about our work on the National Data Library, AI-readiness, or other data stewardship initiatives, please get in touch at [email protected].

Read the white paper