What is a data trust?

Tue Jul 10, 2018

How ‘data trust’ is used, and how it can be used to increase data access while protecting trust

ODI Policy Adviser Jack Hardinges unpicks how the term ‘data trust’ is used in the UK and beyond, and how it might be used in the context of increasing data access while protecting trust

Data trusts: why we are interested

At the ODI our vision is for people, organisations and communities to use data to make better decisions and be protected from any harmful impacts.

As part of our data innovation programme, we are undertaking a project to investigate ways to increase access to data for new technologies while retaining trust. We want this work to help organisations that hold data decide whether and how to provide access to it, and support governments in understanding how they can create the right environment for this to happen.

We are currently conducting research into both existing and new models for increasing access to data; exploring how trust is built, maintained and lost; and researching the needs and thoughts of various stakeholders. This post is about one of the new models for increasing access to data that we’ve come across: ‘data trusts’.

There are different interpretations of what a data trust is, or should be

There’s not a well-used definition of ‘a data trust’, or even consensus on what one is. Much of the recent interest in data trusts in the UK has been fuelled by them being recommended as a way to ‘share data in a fair, safe and equitable way’ by a UK government-commissioned independent review into Artificial Intelligence (AI) in 2017. However, there has been wider international interest in the concept for some time.

At a very high level, the aim of data trusts appears to be to give people and organisations confidence when enabling access to data in ways that provide them with some value (either directly or indirectly) in return. Beyond that high level goal, there are a variety of thoughts about what form they should take. In our work so far, we’ve found different interpretations of the term ‘data trust’:

  • A data trust as a repeatable framework of terms and mechanisms.
  • A data trust as a mutual organisation.
  • A data trust as a legal structure.
  • A data trust as a store of data.
  • A data trust as public oversight of data access.

Interpretations of the term ‘data trust’

The UK AI review defined a data trust as a repeatable framework of terms and mechanisms. It described that data trusts ‘are not a legal entity or institution, but rather a set of relationships underpinned by a repeatable framework, compliant with parties’ obligations’. The development of this repeatable framework was a recommendation made by the review to grow the AI industry in the UK.

This form of data trust seems designed to tackle the challenge of data stewards and prospective data users having to negotiate and establish data sharing agreements on a case-by-case basis. For example, the law firm Fieldfisher suggests that a repeatable framework of terms and mechanisms could be used when an organisation requires access to a dataset to complete processing or analysis on behalf of a client who holds the data.

Capgemini France (the French arm of the consulting, IT services and digital transformation company) has described that a data trust – ‘not be an entity or a legal institution, but rather a contractual model’ – could be established between organisations that hold data and organisations seeking to use it to develop AI. It has suggested that this contractual model would give data holders a share of intellectual property (IP) derived from the data’s use, as well as a share of any subsequent profits generated and a right to perpetual use of the developed technology.

The Organisation for Economic Co-operation and Development (OECD) has outlined ‘community-based data sharing agreements’ as a method of increasing access to data. It describes that ‘these arrangements are crucial for maximising the value of data by keeping the range of opportunities [for data use] as wide as possible, while limiting the risks of violating the interests of data subjects’. Although not referred to explicitly as ‘data trusts’, the concept of community-based data sharing agreements appears to be related. However, while the OECD refers to individual uses of data sharing agreements, this interpretation of a data trust would seem to also describe a framework that can be used to create them in a consistent way across different communities.

In practice, this form of data trust could specify legal terms as well as governance processes and technical mechanisms of data access. The interpretation, however, remains vague.

A different form of data trust is a mutual organisation formed to manage data on behalf of its members. Typically, a mutual organisation is owned by members who have democratic control of it and a share in its profits. Cooperative societies, mutual insurance companies and building societies are common types of mutual organisation.

Neil Lawrence (Director of Machine Learning, Amazon Research Cambridge and Professor of Machine Learning, University of Sheffield) has described how a data trust in the form of a mutual organisation could be created to manage personal data. He has suggested that “data subjects would pool their data forming a trust, stipulating conditions under which data could be shared… large enough to be effective partners in controlling how [the] data is used”. This form of data trust would have a founding constitution and an appointed set of representatives, perhaps partly consisting of data subjects themselves. The representatives would consider the wishes of the broad group of people as well as the organisation’s constitution to make decisions about how the data should be used and shared.

A motivation behind this form of data trust is to increase the control that people have over data about them. Other models for increasing access to data share this objective, such as through data portability initiatives or the creation of personal information management systems. What seems to be different in this proposed model is that the control of the data would be at an aggregate level and would be delegated to a set of representatives.

On the basis that a group of people would collectively define the parameters in which this type of data trust would operate, some have envisaged there being different data trusts with varying attitudes, objectives and rules related to the use of data. Lord Jim Knight has suggested that he “might choose [a data trust] that is more permissive … Others, with higher distrust, could choose a different trust with different attitudes to sharing”.

In this context, it’s important to remember that the total pool of potential subjects – and people affected – might be more than just the original group of data contributors, which is likely as data is often about multiple people.

A related, but slightly different, interpretation is a data trust as a legal structure.

In law, a trust is a structure that enables one party, the trustor, to give another party, the trustee, the right to hold an asset for the benefit of a third party, the beneficiary. Historically, trusts have been used to hold assets such as property or investments. They’ve provided legal protection for a trustor’s assets, distributed assets according to the trustor’s wishes, and in some cases, been used to avoid or reduce liabilities (such as taxes) against those assets.

A number of people have suggested that this legal structure could be adapted to control personal data. In a 2004 paper Lilian Edwards suggested that ‘using [a] trust as the legal model may take us away from the problems dogging data protection, as to what formally constitutes ‘consent’ is in data collection and processing. If we turn to a trust-like model, we can imagine a regime in which data is given away freely and yet data collectors and processors still owe duties of care and trust to data subjects’.

In this interpretation, a set of data subjects are the trustors and also the beneficiaries of a data trust. A third party trustee has a duty to make decisions in their best interests (often described as a fiduciary duty), for example, who has access to the data and for what purposes.

Sean McDonald and Keith Porcaro have described this form of data trust (‘civic trusts’) as creating ‘independent, fiduciary governance of third-party data sharing’. This governance – essentially the trustee’s deliberative process – could take different forms in practice. Sean has suggested that “like academia’s Internal Review Boards (IRBs), Data Review Boards [built using a data trust] would be independent, fiduciary governance structures that review, monitor, and enforce the ways that collectors share data with third parties… they could broker conditional licences to third-party users — like academic researchers — based on an ethical review that includes both industry and data subject.”.

The Social Economy Data Lab is seeking to establish the first civic trust for data in the UK, through ‘shared partnership agreements, data sharing agreements and representative governance’.

One motive behind this form of data trust appears to be to distribute the benefits of data use more equitably. In some cases the proposed benefits to data subjects are monetary, for example through licensing access to the data to a commercial organisation, and distributed between them. Presumably some benefits would be indirect and difficult (or impossible) to distribute, such as the societal benefit of providing access to the data for research purposes.

Rather than as an attempt to introduce fiduciary governance or the more equitable distribution of the benefits of data use, the law firm Walkers has proposed Guernsey-based data trusts as an approach to reduce potential data protection (and tax) liabilities.

A different interpretation of a data trust is one as a store of data that provides restricted access back to the original data steward and/or others.

Mastercard and IBM are working together to create a data trust of this type called Trūata. As an independent third-party, Trūata will use proprietary tools to ‘manage, anonymise and analyse vast amounts of personal information held by companies such as travel agents and insurers… to allow [businesses] to extract valuable information from the data’. Trūata appears to be a commercial proposition designed for data holders seeking to make use of data in line with the EU’s General Data Protection Regulation (GDPR).

The data managed by the Trūata data trust isn’t accessible to, or pooled with data committed by, other organisations. This differs to the Silicon Valley Regional Data Trust (SVRDT) in the US, which is an attempt to combine and make use of data from different educational organisations in California. The SVRDT is ‘developing a Secure Information Sharing Environment that is based upon an integrated policy, architecture and technology framework’. The data trust seeks to enable the use of data currently siloed in different organisations for purposes including policy, research and case management.

The SVRDT is similar to the Kent Integrated Dataset (KID). The KID gathers data from more than 200 local health and care organisations across the UK county, including primary care providers, mental health services and acute hospitals. It links patient-level records from across these services and provides pseudonymised access to the data to inform health and care decisions.

Data trusts have also been discussed at a city level. Andrew Collinge of the Greater London Authority has proposed ‘city data trusts’ controlled by city governments as a way of better stewarding data, using it for innovative purposes and building trust among citizens. He’s suggested that “given that cities are places of increasingly intense experimentation with data, it seems to make sense to consider a Data Trust or body that can attempt to bridge the gap between the business of government…  and the leaps and bounds of technology”.

The sharing of data by private operators with city governments, such as Vodafone and Uber with Barcelona and New York respectively, can be seen as related developments. In these examples, the cities have become stewards of data generated through citizens’ use of private services. Similarly, the DECODE project being trialled in Barcelona and Amsterdam is an attempt to enable citizens to contribute data about them – held by public and private organisations – for use in city policymaking and planning.

In general, this interpretation of a data trust (a store of data) also has similarities with the concept of ‘data commons’, such as Sage Bionetworks’ attempt ‘to create an open, massive, mineable database of anonymized data about health and genomics’.

The term data trust has also been used to describe public oversight of data access.

In a recent interview, Rana Foroohar, associate editor for The Financial Times and CNN’s global economic analyst, described “data trusts where there could be some public oversight of personal data in such a way that corporations that want to access that would then be monitored, they would be supervised”.

Although the description is ambiguous, this public oversight could take the form of a certification scheme similar to that due to be launched by the Singapore Personal Data Protection Commission (PDPC) by the end of 2018. In an attempt to promote best practices in data sharing, the PDCS has already launched guidance to help organisations identify appropriate approaches for sharing personal data. It has said that ‘as part of [PDPC’s] efforts to develop a trusted data ecosystem in Singapore… [it is also] developing a Data Protection (DP) Trustmark certification scheme and a series of resources aimed at encouraging organisations to be transparent and accountable’.

Although a number of national regulators currently produce data protection guidance, the PDPC certification scheme appears to be designed to support organisations to provide access to data while retaining trust.

Hetan Shah, executive director of the Royal Statistical Society (RSS), has proposed a different type of oversight in the form of a public body. He’s asked “what if… technology companies were allowed to use the data that they gather only for a limited period, say, five years? The data could then revert to a national charitable corporation that could provide access to certified researchers, who would both be held to account and be subject to scrutiny that ensure the data are used for the common good”. The economist Mariana Mazzucato has written about something similar. Although not described as a data trust, this proposed model of increasing access to data feels like it is part of the same picture.

Data trusts: next steps in our project

A useful and consistent definition of a data trust might ultimately be a combination of parts from the different interpretations. For example, a data trust could be a repeatable framework of terms and mechanisms that is mandated for use (or subject to scrutiny, or certification) in particular contexts to provide oversight of data access. Or it could be a store of data underpinned by a legal trust (in fact, this is what Trūata already is). In general, the term seems to be used to describe a bundle of choices related to different aspects of data access, such as purpose, governance and technology.

A data trust as a repeatable framework of terms and mechanisms, however, does seem different in nature to the other interpretations. It appears to describe the process of using the data trust as much as the data trust itself. We think of there being three elements to this interpretation: 1) the repeatable framework, 2) the process of applying or adopting it in a particular context, and 3) the resultant relationship.

Although some of the interpretations involve the provision of access to closed data currently held by private organisations (often for ‘public good’), this isn’t a dynamic at the heart of them all. Some of the proposed forms of data trust could feasibly involve only private or public sector organisations, or a combination of the two, with the data being accessed in either direction.

This work on data trusts will feed into our broader research into models for increasing access to data. As well as existing models, we are exploring other new and proposed models such as the one represented by the open banking ecosystem in the UK and the use of privacy enhancing technologies. We want to unpick the bundle of choices that we think these models represent to compare them and understand where they might be used effectively.

Over the coming months we will be exploring options within selected sectors, and producing reports, guides and prototypes to help increase the adoption of data access models that retain trust.

What have we missed? What do you think of these examples of data trusts and the way that we’ve interpreted them? Are you working on building any of them? Let us know. @ODIHQ on Twitter or mail us at [email protected].