What are data institutions and why are they important?

Fri Jan 29, 2021
$download_content = get_field('download_content');

In order to explore data institutions, it’s important to first understand the concept of data stewardship and how it fits into our theory of change. In this blog, we explain the concept of data stewardship, what data institutions are and the vital roles they play

Data institutions are organisations that steward data on behalf of others, often towards public, educational or charitable aims. In order to explore data institutions, it’s important to first understand the concept of data stewardship and how it fits into our theory of change. In this blog, we explain the concept of data stewardship, what data institutions are and the vital roles they play.

We have created the below illustration to show what data institutions are and why they’re important in achieving a future where data has a positive impact. We refer back to it throughout this blog to explain what it represents.

How data institutions support our theory of change

The challenge

The right kind of access to data is vital in tackling the big challenges we face – from the earlier detection and treatment of disease to reducing pollution in urban spaces.

Data also has an important role to play in driving economic growth and recovery by supporting the creation of new technologies, products and services. But those technologies can also cause harm, through automating decisions that need a human touch, or embedding existing biases, inequalities or competitive advantages.

At the ODI, we talk about three competing futures for data and the type of impact each of these could have on our society, environment and economy.

In one future, data will not be accessible to those who need it due to organisations hoarding it. We think of this scenario as ‘data hoarding’, where organisations think of and treat data they’ve collected like oil. The benefits of data would only be enjoyed by a few organisations while, like oil, the negative impacts of its use would affect society as a whole.

In our illustration, we show this future using an oil field and the pull towards it using the blue and purple arrows.

There’s another future where data might not be collected or used to the extent it could, due to concerns about the harm that it can cause people being left unaddressed. We call this scenario ‘data fearing’, where a lack of consideration of ethics and equity, and a lack of engagement with those affected by data’s use, undermines trust. People might avoid using services, or withdraw consent for data to be collected, which means that we end up missing data and the uses of it that could support human flourishing.

The pull towards this future is represented in our illustration by a wasteland, and arrows in red and yellow.

Stewarding data is the foundational activity in the lifecycle or value chain of data – collecting, maintaining and sharing it. Organisations that steward data make important decisions about who has access to it, for what purposes and to whose benefit. How data is stewarded ultimately affects what types of products, services and insights it can be used to create, what decisions it can inform and which activities it can support. Stewarding data involves realising the value and limiting the harm that data can bring.

In our illustration, the green circle and the white figures within it depict the world, about which data is being collected, maintained and shared. The thin arrow in the middle represents the data value chain, with the act of stewarding located at its bottom-most point.

Data needs to be stewarded responsibly to ensure that we are able to mitigate against these future ‘data hoarding’ and ‘data fearing’ scenarios, and instead plot a course towards data being used to drive positive societal, environmental and economic outcomes – a ‘positive impact’ scenario.

This course is represented by the green arrow running to the sunny farmlands at the top of the illustration.

The role of data institutions

Data institutions are organisations that steward data on behalf of others, often towards public, educational or charitable aims.

We show data institutions in the illustration as a set of 3D shapes within the central green circle. The white arrow wrapped around this circle illustrates the way that by stewarding data responsibly, data institutions push back against the data hoarding and fearing futures. 

In practice, data institutions steward data in different ways, including:

  • Protecting sensitive data and granting access under restricted conditions. UK Biobank, for example, was set up in 2006 to steward genetic data and samples from around half a million people, and continues to support their use for vital health research.
  • Combining or linking data from multiple sources, and providing insights and other services back to those that have contributed data. In the maritime sector, HiLo takes data generated by around 3,500 ships globally to generate vital risk and safety analyses related to lifeboat accidents, engine room fires and other incidents. 
  • Creating open datasets that anyone can access, use and share to further a particular mission or cause. OpenCorporates, Open Apparel Registry and 360Giving each collate and make accessible important open data, about companies, factories and grant-making respectively.
  • Acting as a gatekeeper for data held by other organisations. ADR UK ensures important administrative data held by the UK Government can be accessed safely and securely, and in the US, Social Science One plays a similar role in order to make Facebook data available for new research.
  • Developing and maintaining identifiers, standards and other infrastructure for a sector or field, such as by registering identifiers or publishing open standards. In the UK, Open Banking Limited was established in 2016 to develop standards and guidelines to drive competition and innovation in the retail banking sector.
  • Enabling people to take a more active role in stewarding data about themselves and their communities. Open Humans, for example, enables people to explore, analyse and share data about their health, and make it available for citizen science projects.

We’ve represented the various roles that data institutions play using different shapes. The 2D shapes around the outside of the illustration include short descriptions of these roles. 

Although the term is new, data institutions exist across the private, public and third sectors. Public organisations like national mapping agencies, statistics agencies and archives are perhaps our oldest data institutions in some cases they’ve played these roles on behalf of the public for hundreds of years.

There are also new types of data institutions emerging, such as data cooperatives, data unions and data trusts. Many are being designed to shake up the status quo – by redressing power imbalances between people and corporations, enabling collective action informed by data or supporting data altruism.

The concept and terminology

We are deliberate in our use of the terms ‘data institution’ and ‘stewardship’ to describe these organisations and the role they play. 

To us, stewarding data involves making important decisions about who has access to it, for what purposes and to whose benefit. As the Ada Lovelace Institute writes in a soon to be published report, “stewardship is often described as a concept that embodies the responsible planning and management of resources… Though data isn’t a traditional ‘resource’, data stewardship can help think through aspects of trustworthy and responsible data use. It’s key to protecting the data rights of individuals and communities and unlocking the benefits of data in a way that’s fair, equitable and focused on social benefit”.

The term ‘institution’ can mean different things to different people – we have taken it to mean ‘an organisation, establishment, foundation, society, or the like… especially one of a public, educational, or charitable character’. We’ve therefore carried forward the organisational footprint and a public, educational or charitable purpose into our definition of a ‘data institution’. 

We recognise that the term might be used in other contexts to mean different things. For example:

  • To describe organisations that are essential components of our data ecosystem but tend not to actively steward data themselves, such as advocacy organisations, funders, auditors and so on. We think of these as simply ‘institutions’.
  • To convey the need for a richer institutional environment around data to ensure it works for everyone, which, as well as organisations, would include laws, codes of conduct, norms and informal behaviours. Using this wider lens, some might see things like data protection or data portability as institutions related to data.

There are other organisations that use different language to describe similar or quite related concepts. For example, the Oxford Martin School’s new programme will ‘explore new institutional and legal constructs that could allow data to be shared in more flexible and innovative ways, respecting individual autonomy while generating wider societal benefits’. The European Commission focuses on data spaces, data intermediaries and data altruism organisations, and The GovLab supports data collaboratives between the private and public sectors. While we emphasise the importance of data stewardship, other organisations like the Royal Society use the term data governance. The terminology may be slightly different, but we will continue to learn from and collaborate with these and other organisations around the world.

You can read more about our data institutions programme and the work it’s doing here.