Conceptual image of the European Union flag with a superimposed, glowing network of interconnected data points and the text

On 2nd August 2025, the EU put into practice a guideline for providers of general-purpose AI systems. While organisations are not legally required to sign up to the code, it is a tool to assist organisations to comply with their obligations under the EU AI Act.

Much-anticipated and written with input from around 1,400 experts across different sectors, the General-Purpose AI Code of Practice includes three chapters, one of which is dedicated to transparency. The ODI has contributed three rounds of feedback to the drafting of this section of the code. This builds from our work examining data transparency practices and their importance in addressing the many systemic issues in AI systems that can emerge from the use of data - and our existing work to improve standards for these practices.

Having a regulatory initiative to increase the prevalence and quality of model documentation is of huge importance. We believe there will be great benefit in seeing similar initiatives in regulation internationally.

The publication of the code has been met with a mixed reception. It immediately led to some organisations, including Hugging Face, publishing public transparency reports in the EU’s format. Many in the industry elected to sign the code (a list was published by the European Commission), while others, including Meta, have announced that they will not do so, highlighting concerns that the requirements stifle innovation.

Meanwhile, we join other civil society organisations in highlighting the potential weakening of the standard following from limited diversity of stakeholder engagement in the final round of deliberation, which emphasised engagement with the companies that provide GPAI models. There are missed opportunities to build from a wealth of substantive evidence, including our own research, about how to guide meaningful transparency practices. This should ensure that transparency will genuinely support more positive societal impact and trust in engagement with AI systems, rather than provide a false sense of security. Here we summarise our core reflections for future iterations of the code, and future similar regulations internationally, and why we think this is so important.

What the Transparency chapter of the Code of Practice achieves

The Code of Practice chapter on transparency includes a range of commitments, including documentation of a range of aspects of general purpose AI models. In a blog earlier in the year, we outlined why we believe that these transparency guidelines can support trustworthiness in AI systems.

Our perspectives on what remains missing

To make transparency initiatives meaningful, they need to deliver the right information for stakeholders to be able to evaluate the AI systems. As such, they should outline the most salient aspects of AI development for their impact in the real world, many of which centre on data practices. The EU’s AI Act prescribes that general purpose AI models should do three things to ensure transparency:

  • Drawing up and keeping up-to-date model documentation
  • Providing relevant information
  • Ensuring quality, integrity, and security of information

We believe that there are limitations in how the guidelines currently encourage these aspects, and make the following proposals:

Requirements to focus on non-policy stakeholders, including public transparency reporting

As it stands, signatories of the code are not required to share model documentation publicly. Rather, they are required only to share this information with the EU’s AI Office and with ‘downstream providers’ of the model. As outlined in our recent European Data and AI Manifesto, a core reason behind the need for transparency is supporting the ability for diverse stakeholders to meaningfully participate in how data is used, and as a foundation for informed decision-making related to AI systems.

Many more stakeholders need easy access to the model documentation. Currently, the guidance states that: “signatories are encouraged to consider whether the documented information can be disclosed, in whole or in part, to the public to promote public transparency.” We would like to see future initiatives go further, to require public documentation.

Further, the ‘relevant information’ shared within this model documentation should be better targeted to help key system stakeholders, like researchers, civil society organisations and journalists, to understand the social, ethical, and environmental impacts of systems. This means creating transparent information that is understandable to this wider audience. We also proposed the inclusion of criteria focused on ethical assessment (type of risks and related mitigations, for the model), and for delineating which models are high risk.

Clearer reporting requirements to ensure information is meaningful

The model documentation guidance currently contains open text boxes for providing general information about the model development. For example, requesting a “A general description of the scope and main characteristics of the training, testing and validation data [...]”. We have highlighted that such an approach opens the opportunity for transparent information to be sparse, failing to fulfil the needs of stakeholders requiring such information.

We propose the inclusion of specific criteria and pre-defined domains for these descriptions, building from existing research. Including more specific criteria for describing aspects such as data curation processes can help to prevent “transparency washing.” This is where organisations appear to offer transparency without meaningful change in trust and accountability.

Aware of the constraints on reporting for smaller organisations, we recommend specified guidance for how small and resource-constrained organisations could document their models while maintaining the usefulness of the information.

An emphasis on interoperability and machine-readability

We highlight the importance of providing documentation in a machine-readable format. This means that the documentation can be easily collated and analysed by stakeholders, including regulators like the AI office. This would also facilitate analysis and comparisons, preventing transparency washing.

We would also recommend ensuring that datasets used have interoperable documentation themselves, using machine-readable standards such as Croissant, which the ODI has been involved in stewarding. Such initiatives are important for ensuring transparency information is not only standardised but findable and searchable.

Focus on traceable data supply chains, including the human labour

The final version of the Code of Practice included more detail about how organisations should describe the ‘methods used to obtain and select training, testing and validation data’, as well as specifying the need to discuss data annotation and synthetic data practices. We welcome these changes, although we would propose further inclusion of information about the labour involved in the process. To achieve this, requirements to offer dataset digital object identifiers, names of organisations involved in dataset curation and annotation, and details on the human involvement in this work, are all important facets.

Conclusions

In all, it is exciting to see the Code of Practice in place. We believe that these major improvements outlined in our response should inform future transparency initiatives, including regulatory mechanisms. While there are limitations to the current code, it forms an important step forward in building the infrastructure for more trustworthy data and AI ecosystems.