There are lots of different terms used to describe organisations in our data ecosystems. At the Open Data Institute (ODI), we’ve been focusing on ‘data institutions’. In this article, our Senior Researcher Joe Massey discusses ‘data intermediaries’, and unpacks the similarities and differences between this term and the concept of data institutions.
Data institutions
At the Open Data Institute (ODI), we’re interested in data institutions – organisations that steward (collect, maintain and share) data on behalf of others, often towards public, educational or charitable aims.
Data institutions make important decisions about who has access to data, for what purposes and to whose benefit. This ultimately affects what types of products, services and insights that data can be used to create, what decisions it can inform and which activities it can support.
In practice, data institutions steward data in different ways. Some protect sensitive data and grant access under restricted conditions - for example, UK Biobank was set up in 2006 to steward genetic data and samples from around half a million people, and continues to support their use for vital health research. Others combine data from multiple sources and provide insights and other services back to those that have contributed - HiLo takes data generated by around 3,500 ships globally to generate vital risk analyses related to lifeboat accidents, engine room fires and other incidents.
Although the term is fairly new, data institutions exist across the private, public and third sectors. Public organisations like national mapping agencies, statistics agencies and archives are perhaps our oldest data institutions – in some cases they’ve played these roles on behalf of the public for hundreds of years.
New forms of data institution are also being imagined and built – such as INSIGHT, a hub that brings together eye scans and other data to support new research in disease detection, diagnosis, and treatment.
What’s in a name?
There are a lot of other terms being used in this space, too:
- ‘Data intermediaries’ is used by the UK government in its National Data Strategy and in reports from the Centre for Data Ethics and Innovation; by the European Commission in its Data Governance Act; and we’ve used the term ourselves in our work with Refinitiv
- ‘Data spaces’ and ‘data altruism organisations’ are also terms used by the European Commission
- ‘Data collaboratives’ is used by The GovLab’ to describe initiatives that enable data to flow from the private to public sector
- The Oxford Martin Programme speaks of ‘ethical web and data architectures’ that ‘could allow data to be shared in more flexible and innovative ways, respecting individual autonomy while generating wider societal benefits’
We generally prefer not to get caught up in differences over terminology, and instead focus on the specific roles and functions that organisations play within data ecosystems.
However, we recognise that the use of different, related terms can be confusing, so think it’s important to unpack some of the differences between our chosen term (data institutions) and ‘data intermediaries’.
What is a data intermediary?
The Centre for Data Ethics and Innovation defines ‘data intermediary’ as:
‘a broad term that covers a range of different activities and governance models for organisations that facilitate greater access to or sharing of data.’
Its report Unlocking the value of data: Exploring the role of data intermediaries describes seven different types of data intermediary, including data trusts, data exchanges and industrial data platforms.
We’re using this definition of data intermediaries as it’s the most comprehensive we’ve seen, and the one used by the UK government.
The similarities
Before getting into the differences between data intermediaries and data institutions, it’s worth considering what the terms have in common.
In the first place, both concepts are broad and significantly overlapping. We would identify many of the examples of data intermediaries mentioned in the Centre for Data Ethics and Innovation report as data institutions, such as Genomics England. Both terms recognise organisations that empower individuals to exert control over data about them (eg digi.me), as well as more industrial, organisation-to-organisation approaches (eg MK Data Hub).
Secondly, neither concept refers to a particular type of technology, architecture or legal form. At the ODI, we’re largely agnostic to these design choices and instead approach and define data institutions in terms of their function. The Centre for Data Ethics & Innovation take a similarly tech- and structure-agnostic approach, discussing a range of technological and organisational models that data intermediaries can adopt.
Finally, these aren’t the result of distinct agendas, and work on them is often interoperable. We think our work on measuring the impact of data institutions and investigating mechanisms for the government to support them is relevant to data intermediaries too. We draw on outputs from the Centre for Data Ethics and Innovation and others working on data intermediaries. Rather than work in silos, we think it’s important that organisations working on data intermediaries, data institutions and other related terms engage with each other’s work.
Despite these similarities, there are some key differences in emphasis in the way the terms are used.
The differences
Data stewardship
Firstly, ‘data institution’ can be interpreted as a broader concept than ‘data intermediary’, as it encompasses the practice of data stewardship – collecting, maintaining and sharing data – rather than simply the facilitation of data sharing.
The term ‘data intermediary’, as it’s used by the Centre for Data Ethics and Innovation and others, covers a range of different types of activities, governance models and technologies. But what they all have in common is that they directly facilitate greater access to or sharing of data, typically by holding that data on behalf of others and enabling access in some form.
Some data institutions also do this – but not all do. Data institutions also take on wider data stewardship roles, such as developing and maintaining identifiers, standards and other infrastructure for their sector or field (eg Open Banking Limited), or collecting open data for anyone to access, use and share (eg OpenCorporates, Open Apparel Registry and 360Giving).
Focusing solely on the facilitation of data sharing, rather than the wider set of functions implied by data stewardship, may mean missing some of the important roles that are needed to support open and trustworthy data ecosystems.
We’ve also noticed that work on data intermediaries tends to propagate the idea that ‘greater access to or sharing of data’ is always a good thing – it’s baked into the definition. In contrast, stewardship, as we describe it, can involve limiting and applying friction to the collection, use or sharing of data. For example, Te Hiku Media, a non-profit radio station, is working to revitalise the Māori language while keeping control of their community’s data. As a data institution, it has created a licence based on the Māori principle of kaitiakitanga (or guardianship) to limit the flow of Māori data so it won’t be used without the community’s consent, or worse, in ways that harm its people.
Public, educational and charitable purposes
We are deliberate in our use of the term ‘institution’ in this context, which tends to refer to an organisation with a public, educational or charitable purpose.
It also conveys the responsibility that data institutions have in stewarding data on behalf of a particular sector or community, rather than just for their own, internal benefit.
This type of purpose and responsibility is clear in data institutions like Workerbird (which enables workers to collect and analyse data on their working pattern), the Frozen Ark (which safeguards genetic data on endangered animals for the benefit of future generations) and others in our Data Institutions Register. While there will always be a degree of subjectivity in applying this criteria (see the wider debate about the concept of ‘data for good’), it helps us to hone in on the types of organisation that we think are having a positive impact on the world.
This doesn’t mean that data institutions can’t also be commercial. We’ve worked with data institutions such as OpenCorporates, HiLo and Farmbench, which pair commitments to the public good with the need to generate their own revenue, and in some cases, profit. Our report Measuring the impact of data institutions quantifies some of the economic benefits that data institutions can produce in terms of revenue, cost savings and efficiency gains.
Work on data intermediaries, on the other hand, does not tend to foreground these purposes. In fact, in the way the term has been defined, data intermediaries may not have any purpose outside of generating profit for shareholders. That means a data broker that facilitates the transfer of data about users between social media platforms, advertisers and other brokers might qualify as a data intermediary, but we’re unlikely to see it as a data institution.
Old institutions, new data duties
Finally, we use the term data institution to refer to a wide range of existing organisations, whereas work on data intermediaries seems to emphasise newer organisations and the need for new types not yet in existence.
In our view, organisations like national mapping agencies, statistics agencies and archives are perhaps our oldest data institutions, while biobanks, company registers and meteorological offices all predate the data revolution of the past decade or two. In addition to this, many more established organisations are now taking on new data duties, including regulators, trade associations, consumer advocacy organisations and trade unions.
At the ODI, we’re helping some of these existing organisations recognise their potential as data institutions – such as Smart DCC. We think there’s value in using a lens that includes these established organisations – they are already embedded in their ecosystems, and have a legitimacy which newer organisations can struggle to achieve. They can and should have an important role to play in stewarding data going forward.
New approaches – such as data cooperatives, data spaces, data unions and data trusts – are exciting and deserve attention, and they are often included in the discourse around data intermediaries. But we think focusing heavily on these new models that haven’t yet achieved scale can be a distraction from existing institutions that have borne fruit, or could be further scaled or adapted into new domains.
Data intermediary or data institution?
Our preference is to focus our work on the roles and functions that organisations play within data ecosystems. Data institutions is a deliberately broad term, allowing us to engage with a wide variety of organisations that are stewarding data on behalf of others.
The term you use to describe organisations in this space will ultimately depend on what you’re interested in emphasising about those organisations. We choose to highlight:
- organisations that carry out a wide set of important stewardship activities, in addition to those which directly intermediate data-sharing;
- organisations stewarding data for public, educational or charitable purposes, rather than solely for commercial benefit;
- established organisations taking on new data roles, as well as new organisations set up to steward data.
For these reasons, we’ll continue to use the term data institutions – while continuing to engage with data intermediaries and the vibrant landscape of work on everything in-between.