With questions about the use and misuse of personal data high in the public's mind, Jeni Tennison shares five takeaways on the implications for access to data for researchers
Over the last few days there has been a maelstrom of stories about Cambridge Analytica and Facebook. The stories have built for some time with the latest after an ex-Cambridge Analytica employee told their story to The Guardian and The New York Times, while TV journalists at Channel 4 News secured undercover footage of Cambridge Analytica executives talking about their controversial techniques and services.
There is a growing backlash against Cambridge Analytica and Facebook. Questions about the use and misuse of personal data are high in the public's mind.
This is a wide-ranging and evolving story with multiple angles, including democratic elections in multiple countries around the world and insights into the emerging data ecosystems that affect our lives. We are interested in all of those angles but particularly in what this might mean for data. What we focus on in this post are the potential implications for access to data for researchers. We will be sharing more thoughts sparked by other parts of this story over the coming days and weeks.
What happened to the data
Part of Facebook’s service is that external organisations and developers can build apps and other services that access data about Facebook users. For example, many people play games with their friends on Facebook. In doing so, these users provide the developers behind games and other apps with access to data that Facebook holds about them. This can range from access to basic profile information through to chat histories and all of the photos and videos they have uploaded.
In 2015, around 270,000 Facebook users were paid to use an app called thisisyourdigitallife that connected to their Facebook accounts. The app included a personality test that people could take to learn more about themselves. Users were required to give it permission to access data about them already held by Facebook, such as their likes, posts and other activity. The app gained access to data about all of the users’ friends too, unless they had changed their default privacy settings. As a result, it collected data about more than 50 million Facebook users.
The app was built by Global Science Research, which claimed it would use access to the data for research purposes, but is reported to have sold the data it had collected to Cambridge Analytica. Cambridge Analytica is suspected of then combining the data with other sources.
It’s unclear whether the data it amassed was used by Cambridge Analytica and their customers to develop models to psychologically profile or categorise people, or to actively target people with adverts and content on behalf of different political campaigns, or both. If it was used, we don't know how effective it was. As Ellen Broad has said, we know that other organisations are using similar approaches to make decisions about us, for example whether we should get a job or a home loan. We don't know how effective those techniques are, either. That may be even more concerning than their use in political advertising.
Since the app was built, Facebook's API, privacy controls and terms of service have changed multiple times. It is hard to follow these changes but this blogpost summarises many of them.
Access to data for researchers: implications and lessons
At the ODI we think data should be as open as possible.
With personal data, a balance must be struck in considering the benefits of providing access to data and individuals’ rights to privacy and being protected from harm. Only in rare cases – such as politicians’ expenses – should personal data be open for everyone to access, use and share.
A sweet spot, where we feel there are good arguments for providing secure access to personal data, is for research purposes.
In the ODI’s recent YouGov poll of British adults, 37% of people said they would share data about their background and health preferences if it helped advance academic understanding of areas such as medicine or psychology. Researchers can help us to understand society better, to help ensure that policies are informed by evidence, to ensure we invest in things that work rather than those that don't.
There are several troubling aspects to the role of research in the Cambridge Analytica story, some of which can also be found in other places where access is granted to data for research purposes.
Looking like a researcher isn’t enough
When organisations provide access to data by researchers, how do they know they are really researchers? Research is carried out by many different kinds of people and organisations: academics, think tanks, journalists, the R&D groups in commercial companies. But looking like a researcher isn't really enough. Some people who are researchers might sometimes access data for non-research purposes. We need good ways of assessing whether someone who claims to be a researcher really is accessing data for the purpose of research.
Researchers’ ethical codes of practice must extend to social media
What is the relationship between public data and data researchers can use? Some people, including researchers, act as though information we make available publicly – such as profile pictures, likes or tweets – is by definition available for others to use. We need to learn from older research sectors, like medicine and psychology, and develop better, clear, ethical codes of practice for researchers using social media data.
Due diligence has failed
How effectively do data stewards control what the people they share data with do with that data? In the Cambridge Analytica story, blame is falling on Cambridge Analytica, on the researcher who passed on data, and on Facebook. It is the case that, no matter what level of due diligence a data steward carries out, researchers can act unethically and in breach of the agreements they have made. Legally, the liability for those bad acts may be passed on to the researcher, but reputationally they may not. If we want researchers to have access to data, data stewards need to feel safe in granting it. We will need to work out how to decide the appropriate level of due diligence data stewards should be expected to apply in different contexts.
Regulators must monitor stewards and researchers
How effectively do regulators monitor the behaviour of data stewards and researchers? We cannot solely rely on data stewards to determine this governance; we need regulators both to support them and to check that they are performing well. Our regulators need to monitor both data stewards and researchers, performing risk-based inspections and using their legal powers to punish bad actors as appropriate. We will need to work out what regulatory powers they need, how to do risk-based inspections, and ensure that we fund regulators so that they can perform this role. We need more transparency
Why has this story emerged in this way? In this episode – as in other cases like DeepMind and the Royal Free Hospital in London – the facts about the extent of data sharing have emerged through shocking headlines in the media. As we have argued before, part of the reason for the outrage is the fact that this data sharing is hidden. Having transparency about how personal data is being used, which researchers have been granted access, what they are researching, what they have been given access to, and so on, would not only dampen the fury we feel about not having known but also potentially moderate the access that researchers request.
Heading towards the ‘data wasteland’
At the ODI, we think a lot about the kind of data future we might be heading towards. We worry about a future where data is treated like oil, the preserve of monopolies who benefit from exclusive access to all kinds of data, including about us as individuals.
We're equally worried about a backlash that takes us into a ‘data wasteland’. Many will respond to stories like this by withdrawing permissions for data about them to be used, even for public good. Some individuals may restrict their privacy settings further, adding bias into what might have otherwise been useful datasets. In some countries, controls may be put in place that limit researcher access more or unnecessarily inhibit the potential of data portability to create more effective and efficient services for us all.
We need to find a balance. Data is a tool. It can be used for good as well as for bad. We need to learn lessons from Facebook and Cambridge Analytica and put in place proportional controls and data governance so that we can benefit from our decisions being informed by data while being protected from any harmful impacts.
You can read all of our thought pieces and interviews on the Facebook and Cambridge Analytica story in our ODI View page. If you have comments or experience that you’d like to share, pitch us a blog or tweet us at @ODIHQ.