10.1 Grids+Face-Blue-HeroBanner-1441x452-ODI-Research

AI systems depend heavily on high-quality, well-governed data. Yet those building, using, assuring or regulating AI often focus on the models rather than on the data that shapes them. As other countries begin to treat open, assured data infrastructure as part of their national AI strategies, the UK risks overlooking a core point: without secure, reliable AI-ready data foundations that are accessible to everyone in the same way, no AI system will truly serve the public. This problem won't go away as AI advances; in fact, it is becoming much worse with agentic or embodied AI (e.g. robots, autonomous systems), which do not have access to the vast hoards of public data that were available to large language or media models.

The ODI’s Data-centric AI programme tackles these challenges and their implications for healthy, trustworthy AI ecosystems. We study causes, question assumptions, unpack hyped, misleading trends, while also building critical components of the public data infrastructure for AI, such as standards, user-centric benchmarks, and policy tools.

Three areas of focus

We are working on three linked challenges:

  • Getting data AI-ready - For AI to be reliable and fair, the data it uses must be well-governed, clearly structured, and accompanied by information about its provenance and content.
  • Making AI-ready data available - Making AI-ready data available - AI can only deliver on its promise if access to data in an AI-ready format is not restricted to a few organisations. Open data remains a driver of AI progress and innovation.
  • Ensuring responsible data practices - Transparency about training and other forms of data is essential for oversight and public trust. Governments and organisations need clearer information about which datasets AI systems use and how well those systems perform.

We respond to these challenges with practical tools grounded in evidence and research: open foundational standards, open frameworks with actionable recommendations, open AI datasets, and open-source tools that help organisations build AI systems on transparent, secure, and dependable data foundations.

Our current work

Building on the programme's earlier achievements, our current work focuses on AI that demonstrably works in practice rather than in press releases. This means applying tools in real settings of public interest and supporting organisations that want to improve the data foundations of their AI systems.

We continue to apply our framework for AI-ready data to new contexts and operationalise it for new data, high-impact sectors, AI use cases, and international contexts. A central component of the framework is AI-native data practices and standards, such as MLCommons' Croissant. We advise and support data holders and AI technology providers in adopting Croissant to make machine-learning datasets easily discoverable, governable, and usable across tools and platforms.

Benchmark datasets and evaluations are AI's engine of progress. We are building open benchmarks that measure things that people care about rather than abstract notions of "intelligence". We want to advance the field of AI measurements and evaluations with people-centric transparent and collaborative approaches to assess, audit, and assure how AI tools perform in areas such as public services or education.

We develop frameworks and best practices for the design of useful AI-ready data products, which can allow more people, not just those from a data background, to make better informed decisions based on data that is currently siloed, noisy, or out of date.

Building on our previous work, we develop open-source tools to create general-purpose AI-ready datasets from diverse public sources. Our NDL-lite project shows how public data held across different organisations can be brought together more coherently, demonstrating what can be delivered quickly, openly and at low cost, using practical state of the art AI-assisted engineering accessible to more and more people in the UK.

Alongside our technical work, we are increasingly engaging with policymakers and regulators to ensure that emerging frameworks for data-centric AI are grounded in practical evidence. Our Global Data-centric AI Policy Observatory informs our engagements, providing insights grounded in hundreds of AI policies and regulations all over the world. Moreover, in January 2026, our work was used widely to inform the UK Government’s Guidelines and best practices for making government datasets ready for AI.

Working together

There is much to do, and as technology and regulation move at lightning speed, we must prioritise this field of enquiry and work quickly to transform ideas into action. Over the years, we have collaborated with Microsoft, the Industry Data for Society Partnership, King's College London, Duke University, the University of Oxford, and many others to make progress and make a difference.

We are keen to hear from funders, partners, and other organisations interested in helping to develop our programme of work. If you would like to contribute, discuss funding our work, or challenge our thinking, we'd love to hear from you.