9.2 Face-Orange-1920x1080-ODI-Research

Read the framework

The first version of the AI-ready data framework was developed in spring 2025. The framework defines the distinct aspects of what makes data ready to be used for AI, and explores the specific criteria for data publishers to follow, across three critical dimensions of AI-readiness:

  • Dataset properties: standards compliance, semantic consistency, identifiable imbalances, de-identification, and appropriate file formats
  • Metadata: machine-readable formats, attached documentation, technical specifications, supply chain information, and legal clarity
  • Surrounding infrastructure: accessible portals, APIs, and version control

Over the past months we have been working on an update of the framework, informed by feedback from the community, as well as several projects in which we applied the framework to both data assets and AI use cases of an organisation. In a first project, a joint work with the Estonian company Nortal, we used the framework to data and real-life AI projects in UK local government. The analysis involved assessing ten high-impact use cases where local councils are exploring or piloting AI, in social care, health, traffic, fire, homelessness and environmental settings.

In the second project, we started from a series of open-data use cases from all over the world, which are part of our brand-new Open Data Use Case Observatory, developed in partnership with Microsoft. The case studies focus on the intricacies of each dataset and how they enabled the datasets to be used for impact, whether that came from use in the training of AI models or in other technological, data-led interventions.

In September and October 2025, we conducted further research to explore the different features which should be considered in the governance of AI-ready data. This research involved a strategic review of key literature, including ODI research on AI data transparency and AI data governance.

In the first iteration of the framework, we focused explicitly on a technical view of the conditions required to make data AI-ready. We developed the framework from the view of a prospective data publisher, with the framing of ‘what information would a potential data user need to know to feel able to use a dataset for AI in a frictionless way’. While many engineers could adapt a dataset to suit their needs, this takes time, and therefore our framing prioritised datasets which are as immediately usable as possible. For the governance pillar of the framework, we wanted to focus on the decision-making aspects of governance which sit around the dataset, and continue using the lens of a data publisher. Therefore, our guiding question for this governance pillar is ‘what does a data user need to know about the governance of data to use it for AI’.

This decision was made for two reasons. Firstly, during the development of the framework, we have debated whether the framework should take a normative view of making data AI-ready responsibly, versus the core technical requirements that a dataset would need to be AI-ready. We ultimately settled on the latter, because it is important to articulate the concrete steps someone needs to take to achieve AI-ready data and increase the availability of usable data. Data governance from a normative point of view is particularly difficult in this regard; the definition of responsible data governance can mean different things to different groups and in different contexts. Albeit, there are certain normative values which hold across different groups and contexts, such as licences and terms of use. Secondly, many of the factors that would be included in a broader view of data governance, such as anonymisation or transparency, are included in other aspects of the existing framework.

Therefore, we have developed the following criteria as additions to the governance pillar of the framework:

  • Governance policy-as-code. Key data governance policies—including data access, compliance checks, audit logging, and consent management—should be codified into machine-readable formats executable across governance tools and platforms.
  • Documented roles and responsibilities. The dataset's documentation must clearly define and assign key governance roles with explicit accountability, such as a data owner or data steward.
  • Publicly identifiable points of contact.Metadata should include contact information for the designated data steward, providing a feedback loop for users and a channel for reporting data quality or governance issues.
  • Clear data access processes. The processes required for data users to access data must be clearly articulated, with the requirements, criteria and timelines described in full.

During the process of this research, other themes emerged. Some of these themes, like transparency over anonymisation, were already covered in other aspects of the framework, while others, such as transparency around update frequency and data usage controls were not. We have decided to adapt and iterate the framework to include these new themes, which mostly fall into the legal and sociotechnical information as part of the metadata. The updated version two of the framework, which includes the new governance pillar, is available below. We will continue to explore further potential extensions to the framework in the future.

If you’d like to learn more about our work on the framework for AI-ready data, or our data-centric AI work more broadly, please do get in touch at [email protected].