Data Centric AI #11: Why open data matters in the age of AI

This webinar will bring together insights from ODI projects, and combines them with input from experts in the field of open data and AI. We will explore the AI readiness of open data, unpack key projects that use open data for AI, and investigate the opportunities open data can bring to the AI ecosystem, and how to realise them.

Background

Data is the foundation of AI. Poor quality data drives up costs and can lead to hidden problems for AI, especially in complex fields like healthcare. Meanwhile, biased data negatively affects the performance of AI models, and uninspected evaluation datasets can lead to false positives or overestimates of model accuracy.

At the ODI we’ve been thinking about what it means for data to be AI ready. Earlier in 2025 developed a framework to define the distinct aspects of what makes data ready to be used for AI. The framework explores the specific criteria for data publishers to follow, aiming to offer actionable recommendations for data, metadata, and infrastructure design. Building on recent work to test the framework in the public sector setting, we have expanded the framework to consider governance design as well.

Open data and AI

The relationship between open data and AI is full of potential, but involves significant complexities. Open data has played a key role in the recent explosion of AI innovation, and enables more transparent and accountable governance. However, there are also significant concerns about the use of open data for training AI, and the lack of reference or attribution to open datasets.

Despite significant amounts of open data being published online, it is not all necessarily AI ready. Just as during the first wave of open data, simply making open datasets available online is not enough for them to be used to support AI development. Ensuring datasets have high-quality metadata and data infrastructure, as well as robust data governance, is essential for AI innovation.

In 2025 we explored the landscape of open data use, developing the Open Data Use Case Observatory as part of our final year partnership with Microsoft. The observatory includes case studies from a broad range of subject matters, modalities and geographies. The case studies focus on the intricacies of each dataset and how they enabled the datasets to be used for impact, whether that came from use in the training of AI models or in other technological, data-led interventions. In a second phase of work, we have applied our framework of AI readiness to a subset of these case studies, providing some initial insight into the landscape of open data and AI.

Speakers

Professor Elena Simperl, Director of Research, ODI

Elena Simperl is one of the UK’s leading advocates for AI that is transparent, trustworthy, and truly human-centred.

As the ODI’s Director of Research and co-Director of King’s College London’s Institute for Artificial Intelligence (AI), she is in the AMiner top 2000 most influential scholars in AI in the world, and is the UK’s expert on building AI that works ‘for’ people, rather than just ‘on’ them. She is on Stanford University's list of "World's Top 2%" scientists.

Gretchen Deo is Director of IP Policy Outreach, Open Innovation in the Intellectual Property Group at Microsoft, based in Redmond, WA, where she leads the team’s initiatives at the intersection of open data and AI. This includes building partnerships with knowledge institutions and research organizations, and spans initiative design, policy outreach, and community engagement. Previously, Gretchen served as Chief of Staff for Microsoft’s EU Government Affairs office based in Brussels, Belgium. Before joining Microsoft, Gretchen spent over a decade at the U.S. Chamber of Commerce in Washington, DC, managing the organization’s legislative grassroots campaigns.

Andrew Myers, Lead Architect at NWG (Northumbrian Water Group), has 20 years’ experience in the water industry covering a variety of operational, technical and data roles, including large digital transformation projects. Currently Lead Architect at Northumbrian Water with responsibility for setting the strategic direction for technology and also leading the Stream Open Data project. Stream is a group of UK water companies and partner organisations with a vision to “unlock the potential of water data to benefit customers, society, and the environment.

Joe Massey is a Senior Researcher at the ODI. He is currently focused on the sustainable data access project, alongside other projects. He is also spending time thinking about how the ODI can use more quantitative research methods to better understand the data-driven world.

Before working at the ODI, Joe worked at the responsible technology think tank Doteveryone, and completed a MSc in Global Development at the University of Copenhagen.

Joe is a strong advocate for interdisciplinary, mixed methods research and is fascinated by the intersection of technology, data and society.

About us

Our five year plan

What we do

Solid

Membership

Data Centric AI #11: Why open data matters in the age of AI

Speakers