Collaborative data maintenance: how can we enable shared curation of high-quality data?

Wed May 22, 2019

Community-powered projects like OpenStreetMap or Wikidata use ‘collaborative maintenance’ processes – and in our new R&D project we’re trying to understand how these can be applied to enable and inspire similar projects that are fuelled by the shared curation of high-quality data.

We want to make our findings easy to use by projects of all kinds within the public or private sector, across the data spectrum, with infinite (eg Wikidata) and finite (eg. Wild Gabon) project scopes.

First, what is a collaborative data maintenance project?

Extremely varied in themes, collaborative data maintenance projects rely on the same idea: using the power of people, organisations and communities to gather disparate data into one database.

They can rely on vast amounts of people for very narrow datasets, small numbers of people for small datasets, or even huge amounts of people for infinite databases. They can stem from the private or the public sector. Projects can be:

100% open
One-way conversations (a chosen group of collaborators, with a database searchable by all)
Closed (open only to certain collaborators and researchers)
Or anything in between

Projects can range from cataloguing all of the UK’s legislation to gathering all possible data about birds that enthusiasts around the world can capture. It can also mean mapping out disaster areas from satellite pictures or creating an open bibliographic database.

A project’s purpose is what will establish its community; the tools you give the community will keep it alive.

Our research

As this project is quite short and a lot of research has already been done about the systems surrounding collaborative data, we are primarily focusing on figuring out concrete, reusable tools and patterns that make a successful project.

We kicked off the project with a day of learning about how Wikidata and OpenStreetMap actually work and how to make expert contributions (which was much harder than it seemed at first).

A project’s purpose is what will establish its community; the tools you give the community will keep it alive

We then tried different collaboration projects for ourselves: classifying galaxies on Zooniverse, adding our vinyls to Discogs, donating our voice to Mozilla Voice, downloading OpenFoodFacts on our phones and scanning our lunches. It was all pretty exciting and rewarding, making us feel like we were taking part in something big, with a higher purpose.

The workshop and participation in projects motivated us to look in greater detail at certain aspects of collaboration and generated more relevant questions - which we’re answering through desk research and user interviews.

Our aim is to understand what works for contributors, which participation drivers are effective, what is seen as easy and what issues are more challenging when it comes to inputting data, keeping the infrastructure sustainable and more generally, what keeps a project afloat.

Categorisation

We have chosen to categorise projects according to their level of openness to participants types and their scope of work.

For example, Wikidata is open to anyone who wishes to participate and their scope of work is infinitely broad, as they want to catalogue all of the knowledge in the world. Other projects that could fall into the 'open to any participant type/broad scope' category are OpenStreetMap or MusicBrainz.

At the other end of the spectrum, there is the Cuneiform Commentaries Project: only a few hundred people worldwide can read and translate the language and there is only a low number of archeological tablets that still exist. This project and others like ROR.org or Edubase can be categorised as 'closed type of participants/narrow scope'.

In the middle of this spectrum we have projects like Zooniverse or Humanitarian OpenStreetMap that allow a lot of users to contribute but have a narrower scope of work. They are mostly task based and may only have a fixed lifespan.

These categories are a work in progress and may change, but this scale of scope / participant types seems to fit the projects well.Hopefully the open / broadly scoped projects patterns will be able to influence the more closed / narrower scope projects.

To uncover the relevant patterns, we’re asking questions about each project:

How is the collection and maintenance of data supported?
How is quality assessed and improved?
What are the potential barriers to the adoption of shared curation?
How are communities talking to each other?
How are conflicts resolved?
Why are communities engaged in this particular project?

The research will produce a list of tangible patterns that can be used, reused, adapted and applied to projects. We think these will be useful to service managers and others building systems and services to support collaborative data maintenance.

What do we mean by patterns?

Patterns are solutions to specific problems that arise when building systems. You can see a very thorough working example of patterns in the Government Design System.

Here are some good examples of patterns we have come across so far:

A step-by-step onboarding, like OpenStreetMaps or Zooniverse have, to make sure their new users understand how to take part in a meaningful empowered way.
Or a Slack channel that will enable contributors to talk to project founders to get some help on very specific questions about data schemas, like Open Food Facts has.
Even an app that will allow offline participation, in collaborators' free time, like Humanitarian OpenStreetMaps's MapSwipe is doing.

Based on user journeys we’ve defined through our initial research, we are attaching patterns to the different facets of a project that seem the most relevant:

Training and support to help someone progress as a contributor, from beginner to expert
Recruitment and graduation processes for collaborators
Community support and communications
Issues and conflict resolution
Tools to support data entry and upload
How the data will be used
Technical infrastructure that will allow the project to be sustainable
General sustainability issues, such income and licensing
Governance over time

We’re planning to document our patterns by showing examples of how different services approach these issues.

Get involved!

We would love to hear from you about your experience of contributing to and/or developing collaboratively maintained projects. Tell us your happy and your sad, your wishes, what you think is useful and what can be improved. We work in the open so you’ll be able to see the results as we publish them.

About us

Our five year plan

What we do

Solid

Membership