Collaborative data projects: what we learned by contributing to them

Wed Nov 20, 2019

By Rachel Wilson, Senior Software Developer at the ODI

You probably use a collaboratively maintained dataset every day: Wikipedia, and its sister project Wikidata – which creates the data-box on the right side of Wikipedia pages – attract more than 18 billion page views per month. But there are many other data projects that use the power of people, organisations and communities to gather disparate data into one database. In fact, collaborative data collection has a history that stretches back to the early days of what is now called citizen science.

Much has changed since the early days of collecting observation data by post. These days projects encourage many types of contribution: there are projects that create catalogues, pool knowledge and make connections. Inventions such as databases and the internet provide tools so that a community can both contribute data and maintain it: keeping it up to date and improving the quality.

During our research into the collaborative maintenance of data we compiled a list of community-data projects that on the surface looked very different because of the topics they covered and their user interface design. We wondered how the designers of these varied projects manage the interests and experiences of individuals to create good quality data. We decided the best way to find out was to become contributors ourselves, and have some fun in the process.

The list of projects included: OpenStreetMap, Wikidata, Discogs, MusicBrainz, OpenCorporates, Zooniverse, Democracy Club, the Internet Speculative Fiction Database (ISDB), Mozilla Voice, Open Food Facts, Encyclopedia of Life, Open Plaques, eBird, Open Library, among others

Our intention in contributing to these projects was to look for common approaches – or patterns – that contribute to good data maintenance. Through our experiences we made observations about the stakeholders’ interests; project scope and range; the role of the contributor as a creator of data; common user-experience patterns; and how it felt to interact with a new domain for the first time.

Different stakeholders have different interests

During our research we identified three main groups of stakeholders each with their own aims and needs:

Data stewards including service designers, community managers and data engineers who want to help the community collect and maintain a high-quality dataset
Data users who want to know whether the data is comprehensive and of good enough quality to be useful
Contributors who interact with the data and experience firsthand the design decisions taken by the data stewards

In our interactions with the projects we were mostly acting as contributors, but we also needed to understand how the design of projects supports both the data stewards and data users. Read more about the aims of collaboratively maintained dataset stewards here.

Two axes of project types

In our early analysis of the projects we identified two axes for the openness of contributions:

The range of contributors. Some projects only permit specific people to become contributors, for example Edubase, which is a register of schools, maintained only by the schools and local authorities. Whereas other projects allow for contributions from any interested party. For example Democracy Club permits anyone to collect information such as lists of elections, candidates and polling locations
The scope of the project. Some projects are very narrow in scope, such as Open Plaques (about commemorative plaques) or Edubase (about schools), whereas for others the scope is completely open-ended, such as Wikidata.

Project types at different positions along either axis might require different approaches to support their contributors, and as such might contribute to the context in which a design pattern will be appropriate.

Contributors are people with specialised skills

We noticed that as contributors we played different roles in a project, offering unique human skills (currently) beyond the reach of computers:

People as holders of knowledge

Documenting and pooling knowledge, seen in Wikipedia, MySociety
Creating and linking expert databases, seen in EDUBASE, OpenCorporates

People as sensors

Discovering where something is – for example Open Plaques, FixMyStreet
Taking part in discovery campaigns – for example Democracy Club, HOT OSM
Participating in identification projects on Zooniverse

People as pattern recognisers

Performing classification tasks, seen in Zooniverse, EveryPolitician and Mozilla Voice
Labelling items – adding further detail to a shared ‘canvas’ like a map or shared spreadsheet – for example OSM, Colouring London, Democracy Club
Making connections between things, such as connecting landmarks on OpenStreetMap, and Wikidata

People as witnesses

Verifying data provided by others, for example ISFDB, Discogs
Providing observations and witness statements for example eBird,
Cataloguing artifacts, for example MusicBrainz, Open Food Facts

People as data sources

Contributing data they have collated/created for others to re-use in projects such as Mozilla Voice, Freesound, Rebrickable

We noted common approaches for accepting contributions and managing interactions which varied according to the role of the role of the human in the project:

User input: wiki-style ‘always editable’ format; web forms; shared spreadsheets; wizard-like microtasks; bulk-upload APIs; editing maps to add detail and labels

Review systems: performed by an expert moderator or made retrospectively by the community

Human-to-human interaction: wiki ‘talk pages’; forums; web forms; email and mailing lists; instant messaging

Because these approaches aligned with the aim of the project and the experience of the contributors, we thought they would be a good place to start identifying ‘design patterns’.

Making the first contribution

Besides observations about the nature of the projects themselves, we also experienced what it felt like to participate in an unfamiliar domain of experts.

As a contributor I found some projects easier to pick up than others. Many thanks to Andy Mabett who provided a fascinating days training on the history, functionality and governance of OpenStreetMap and Wikidata – two of the most (necessarily) complicated projects.

Some projects, such as Zooniverse, were designed to be easy to use by people with no prior experience at all. Others needed specific knowledge, or an artifact to hand – usually projects creating catalogues such as Rebrickable, Discogs, or the games, books and comics databases.

Regardless of my experience with the domain I noticed I had the following questions as I started:

Where do I start?
How do I know my contribution is valuable?
How do I learn I’m doing it right?
How does the format ensure I make a good-quality contribution?
What do I do when I’m uncertain or have a problem?
Who is the community? What is their purpose and how do I know we’re aligned?
How easy is it to make a useful contribution to an unfamiliar domain?

In this way we noted many different approaches to support newcomers and guide them to make useful contributions.

The human in the loop

Technology is a valuable tool for connecting people across many spectrums: of experience, timezone and capacity. Often contributors are not merely harvesting and collecting facts. In collaborative projects – especially the more open ended projects – there is often an element of perception and interpretation.

In some cases skilled people are needed to mediate and share their expert experience to find resolutions: resolutions that may require the project itself to evolve. As a result, many of the interface and interaction patterns we experienced exist to get the best out of everyone. Observing the effort of everyone involved was humbling and inspiring.

Collaborative data projects could be the next Pokemon Go

Looking back on this discovery phase I’ve experienced many positive things about contributing that I had never anticipated:

It feels good to improve something.
The delight of connecting things like adding a link to a public artwork in OpenStreetMap to a public web page via Wikidata.
A window on new domains. I learned more about: the democratic process in Democracy Club; the nuances of publishing books, music and comics; the many layers of mapping in OpenStreetMap and how useful they are to environmental modelling in Colouring London; and topics ranging from microbial resistance to the history of slavery in Zooniverse.
Discovering useful resources for friends’ interests such as Freesound, Open Food Facts and Rebrickable.
There are so many ways to contribute. Beyond adding and classifying data you can verify, comment, rate, moderate, and answer questions in forums.
Gaining serendipitous knowledge. By chance I viewed George Africanus’s record in Open Plaques and realised his connection to one of the the anti-slavery manuscripts I’d transcribed in a Zooniverse project.
It’s contagious. I was telling a friend how I’d sent eBird a sighting of an unusual bird I’d seen in our local park. He exclaimed that he’d just taken a picture of a strange bird in his garden, so right away we submitted his sighting to eBird and he was delighted.

We all might just have found a new hobby. If you’re looking to while away a few minutes at a bus stop, perhaps contribute to one of the projects above. Your contribution might just help someone make a breakthrough.

Get in touch

Please explore our Collaborative Data Guide, and get in touch if you’d like to discuss this further, or if you are interested in building a collaborative data-maintenance project.

About us

Our five year plan

What we do

Ready for consultancy?

Membership