Without data, there is no AI. Data plays a definitive role at all stages of the AI ‘lifecycle’. This applies to any form of AI, from deep learning, reasoning and planning, to knowledge graphs.
To deliver on safe and responsible AI we need to consider the data infrastructure of existing and future AI applications. We need to look more closely at the links between data and models, draw on approaches from multiple disciplines and engage those directly affected by AI, and the civil society organisations that represent them. The latest wave of large language models (LLMs) and other foundational models has disrupted how we think about many components of our data infrastructure: from the value of data we publish openly and the rights we hold on data, to the quality and governance of critical datasets.
The ODI's data-centric AI programme aims to advance our collective thinking to build and shape the socio-technical data infrastructure of AI, including data assets, tools, standards, practices, and communities.
In this series, we will be sharing our own work and inviting world experts with their own research and experience to share key insights and recommendations for building the data infrastructure needed to make AI safer and trusted by everyone.
Further reading
- The ODI's work on the promise and challenges of data discovery with LLMs
- Our fourth policy intervention from our data-centric AI programme on the need to ensure broad access to well-governed data for training AI models