In our response to the UK Government's call for input on the forthcoming AI Action Plan, the Open Data Institute (ODI) focuses on establishing an AI-ready National Data Library. While other stakeholders may address various aspects of AI technology, we believe that developing a robust data infrastructure is fundamental to the AI Action Plan's success.
The need for an AI-Ready National Data Library
The Labour Party’s manifesto outlined plans for a National Data Library to centralise research programmes and deliver data-driven public services that ensure strong safeguards while benefiting the public. The objective is to create a more integrated and accessible data infrastructure across government, delivering social and economic benefits. A key consideration is ensuring that this Library is designed to be AI-ready from the outset.
The ODI agrees that AI has the potential to enhance public services significantly. It can enable better targeting of resources, reduce inefficiencies and improve quality, but delivering these benefits requires high-quality data and robust management and governance practices. The National Data Library must leverage the UK's leading capabilities to establish a strong data infrastructure for UK public services.
To ensure safe data sharing and access, the Library should incorporate well-tested data hygiene measures, including open, interoperable standards and technologies. It should also implement emerging AI-related measures to allow its data assets to be effectively and responsibly utilised by AI innovators in the public and private sectors. The ODI’s Data-centric AI programme and recent white paper on data and AI provide detailed insights into these measures.
There will be several options for how to design the Library from a technical point of view and a lot of excitement in some parts of the community to go for the shiniest technical solution. As noted earlier, we believe in the power of AI and other emerging technologies to tackle current challenges. At the same time, we know that a user-centric approach will be key to designing something that delivers on its promise. This includes reviewing key sectors, engaging with experts and diverse stakeholders, deciding on strategic use cases, and identifying the technological, legal, and operational enablers for sustainable data sharing and use.
Opportunities through AI advances
Recent AI advances offer opportunities to rethink infrastructure design, operation, and impact. For example, data prompting tools can enable civil servants to analyse complex data without specialist training. AI can also facilitate data integration across silos and perform basic data cleaning and augmentation. Pre-trained models can generate realistic synthetic data, encouraging experimentation. Federated learning techniques allow models to be trained without compromising data governance.
Greater investment in the UK's data-centric AI ecosystem is needed to realise these opportunities. This includes supporting start-ups that optimise data for AI use and establishing open standards for data exchange and documentation. The National Data Library could serve as a showcase, leading the field in responsible AI data practices.
Selecting and stewarding data
The success of the National Data Library will hinge on the quality, relevance, and accessibility of its data. It is recommended that the Library initially focus on three key areas: high-quality public data, federated Trusted Research Environments (TREs), and cultural heritage data. These datasets offer significant benefits and are well-structured for reliable public-sector deployment.
Making these datasets available to digital innovators could enhance public service delivery and stimulate the ecosystem necessary for sustainable impact. Including cultural heritage data will ensure the Library captures the diverse history and culture of the UK.
A user-centric approach
When designing the National Data Library, a user-centric approach is essential, prioritising practical use cases that materially impact people's lives. Engaging with experts and stakeholders from the outset is crucial. Lessons from other government projects highlight the importance of bootstrapping innovation around public sector data early on and creating a community invested in maintaining the data over time.
Beyond a high-impact Library MVP, it is also crucial to consider how other data assets, closed, shared or open, could be collected, stewarded, and shared safely. Undoubtedly, the richest and most valuable data in the public sector relates to individuals; however, public trust in how personal data is handled is low. Besides AI, this has been a priority area at the ODI in recent years: helping government and industry build trustworthy data ecosystems that enable safe data sharing without compromising people's rights to privacy or their desire to have more say over how their data is used.
Building a socio-technical data infrastructure
The ODI’s vision for the National Data Library focuses on creating an inclusive resource that enables access to public sector data while addressing challenges related to data heterogeneity, curation, safe access, privacy, and responsible use.
Public sector data often resides in a mix of pooled and federated systems. This requires a modular, decentralised architecture capable of managing complexity without compromising safeguards. Such an architecture must also include governance structures that inspire confidence in businesses to invest in AI innovations while maintaining public trust. The National Data Library’s success will depend on designing a technical infrastructure that balances protecting individual rights with fostering innovation.
The ODI supports the government's vision for the National Data Library, seeing it as a significant step toward establishing the robust data foundations needed to build advanced public services and develop cutting-edge technologies. It is also a cornerstone for successfully implementing the AI Action Plan that will emerge in the coming weeks and months.