7.2 Grids+Face-Orange-ArticleHeroBanner-1110x452-ODI-Research

Read the report

Download the framework

Data is the foundation of AI. Poor quality data drives up costs and can lead to hidden problems for AI, especially in complex fields like healthcare. Meanwhile, biased data negatively affects the performance of AI models, and uninspected evaluation datasets can lead to false positives or overestimates of model accuracy. AI practitioners must, therefore, dedicate time to ensuring the datasets they use are ready, meaning they can spend less time innovating and delivering AI solutions. Instead, their work would be more efficient and productive if datasets were published in a way that makes them AI-ready.

In this research, we sought to define AI readiness and its underlying principles. We focused on creating specific criteria for data publishers to follow, aiming to offer actionable recommendations rather than simply high-level concepts for data, metadata, and infrastructure design.

To collect and refine the requirements for AI-ready data, we utilised insights from a literature review, expert interviews, and the lived experiences of the ODI team, which has worked on data practices for over a decade. We present the requirements alongside contextual information in the main report and summarise them in a stand-alone visual framework that can be used to assess and improve dataset publishing practices.

This is only the first iteration of the AI-ready data framework. We anticipate feedback from dataset publishers and users will guide its development and refinement as it is applied and implemented across the data ecosystem. If you use the framework or want to know more about it, contact us at [email protected]