Despite broad agreement on the critical importance of data to AI development, attempts to understand and query this data are stilted by poor transparency practices. Recognising the fears of a ‘growing data transparency crisis’ for AI necessitates an investigation into how transparency practices do not meet the diverse needs of the responsible AI ecosystem. As developers frequently fail to disclose details of their training datasets, transparency practices vary significantly, ranging from widespread opacity to clear and detailed disclosures. This inconsistency hampers efforts to ensure fairness, identify biases, and comply with regulations, leaving researchers, policymakers and the public unable to make informed decisions about AI systems or conduct large-scale comparative research.
As part of the ODI’s Data-centric AI programme, we developed a new framework to analyse the maturity of model providers in providing the relevant information needed for meaningful transparency.
In assessing 22 models, we found:
- High maturity: Demonstrated by five model providers, characterised by detailed, accessible documentation, consistent use of transparency tools, and a proactive approach to explaining decisions made in the development process.
- Medium maturity: Six model providers met some transparency criteria but lacked consistency for all dimensions.
- Low maturity: Eleven model providers shared limited or poor-quality information, suggesting a general reluctance to be open.
The AI Data Transparency Index (AIDTI) represents a significant step towards more meaningful and user-centric AI data transparency across the ecosystem, not just by evaluating practices as they are, but by providing a framework for the direction of travel to a more meaningful data transparency approach. Future work will focus on expanding the AIDTI to include more models and stakeholder perspectives, integrating machine-readable data for greater accessibility, and exploring interactive systems to visualise AI supply chains. In addition, this work aims to establish a new direction of research to ensure meaningful transparency to help all those ensuring that AI ecosystems are responsible and trustworthy.