Creators of tools for open data? Steal this

Fri Sep 28, 2018

By Olivier Thereaux in collaboration with Marina Peneva

Over the past 12 months, we have focused on understanding the needs and issues of people and organisations publishing open data, and taken some steps to address them. We invite you to steal our ideas and collaborate with us – and each other

From user needs to a healthier open data ecosystem

One of the goals of our data innovation programme is to support the creation of a healthier open data ecosystem. Our hypothesis was that the first step towards this healthier ecosystem is to look at the barriers people face when publishing data: technical, cultural and organisational.

Our report, What data publishers need: synthesis of user-research, formed the basis of what we know about the needs and problems of data publishers. Our team listened and talked to dozens of people to learn how we can help open data publishers do their job better, faster and more competitively.

We uncovered a range of open data publishers’ needs, and the issues keeping those needs unmet. For tools, these needs are:

tools that are easier to use, especially for novices
better tools for checking and ensuring quality of the published data
better integration between tools in the publishing workflow.

We then used these insights to help develop resources to address some of those needs:

a register of tools for people starting their open data publishing journey,
and a range of new or improved tools for open data publishers (in collaboration with Open Knowledge International and Lintol).

As we do not want to compete with other tool makers, we made sure that every output of this work is openly licensed in the hope that tool makers would sweep in, adopt our good ideas, learn from our mistakes and integrate some of our insights – and some of our open source code – into their offering.

No strings attached: free and openly licensed

So here are some of the things we would like to invite you to ‘steal’.

The identified user needs ought to be useful to toolmakers in and of themselves, but we also think that some of the ideas and open source code in the tools we (co-)developed can also be used, adopted and adapted:

Octopub, a simple pre-publishing tool

Our tech team released a new version of Octopub, a tool developed previously as an easy way to publish data onto the Github platform. We used it as an experimentation platform to demonstrate how we think a good, usable publishing tool could look and feel.

Some ideas worth stealing:

Putting emphasis on pre-publishing flow (based on our interviews with practitioners); making sure teams are comfortable with the data quality level before publishing; and giving them a safe spacer to reach that level.
Having a wizard-like interface to make sure the users are led through all the key activities of pre-publishing (describe, license, etc) and taught (via tooltips) why each activity matters in the process.
Providing a free, reliable space to publish the data once prepared, and all the necessary metadata – Octopub creates and publishes a data package, which makes it easy to add the published data to common data portals.
Bonus: the free, reliable space happens to be Github, which offers a wealth of versioning and collaboration tools (versioning, diffs, commits, branching, issues, discussions, roadmaps etc), all of which we believe may be particularly useful to the post-publication lifecycle of the data.

Frictionless Data Tools

We aimed to foster the development of tools which would not only be very useful on their own but also integrate well with others through well-documented application programming interfaces (APIs) or standards. This is why we supported the development and improvements to the frictionless data tools from OKI.

The frictionless data tool catalogue is broad and, as it is all published as open source, there are many things to adopt or learn from. Specific ideas worth adopting include:

The improvements to the data portal CKAN, especially with a new integration of data validation and reporting, will be of interest to creators of data quality checking technology.
Other outputs of this project, including a new tool to create and describe data packages and a new iteration of the tabular data checker Goodtables (and its API), all work towards the creation and management of data as ‘Frictionless data’, one of the standards for data publishing.
The Frictionless Data Field Guide wrapping it all together is an interesting example of integrating software through guidance and documentation, not only through APIs and libraries.

Lintol, a one-stop data quality checker

Unsurprisingly, one of the key features worth integrating between tools is quality checking. Our project provided support and guidance to the team behind the new Lintol tool – which adopts a ‘plug and play’ approach.

Some ideas worth stealing:

Some of the concepts behind Lintol are worth looking at if you are developing a data management tool with a quality-checking component. For instance, Lintol allows its users to define a quality threshold by adopting standard quality-checking components (eg a geographical boundary checker) and giving the tool user the ability to define bespoke constraints (eg the boundaries a given local authority expects all geospatial data to be within).
Because of its plug-and-play nature, Lintol should be of interest to makers of specialised data checking technology. If you are creating a library to check a very specific new data standard, it may be worth looking into plugging into Lintol rather than, or in addition to, building your own user interface. Lintol is reaching maturity now, and the team has been running code challenges to encourage developers to flesh out a growing library of quality-checking plugins.

A publishing toolmakers community and event

We now hope to keep engaging with publishing toolmakers around these user needs – in particular by supporting you to collaborate with each other in mutually beneficial ways.

To this end, we would like to start a community of toolmakers with the initial aim of co-designing an event with you in early 2019. We have some initial ideas about the focus and format of the event, but we invite you to actively shape the discussion and outcome.

For more details and to join the online community, take a look at the Google Group's introductory post.

Whether or not you decide to join this community, we’d love to hear your feedback and find out if you have adopted or been inspired by this work, so please feel free to get in touch at [email protected].

Image credit: CC0 by PXHERE

About us

Our five year plan

What we do

Ready for consultancy?

Membership

Creators of tools for open data? Steal this