How to publish open data from Google spreadsheets
Many organisations, including the ODI, use web based document management services to create, share and publish content such as spreadsheets. While saving these spreadsheets as CSV and making them available as open data seems straight forward there are a number of common problems people face:
- Regular publication requires human effort and as a result becomes inconsistent over time
- Columns containing sensitive or personal data should be removed before publication
- Column titles may need renaming to make more sense to the general public
- When exported as CSV, identifiers for each row (e.g. row number) are lost
As part of a recent ODI project to publish data contained in Google spreadsheets we developed the ODI Labs doc2data tool. The tool allows complete automation of the publishing of data from Google spreadsheets. The tool can be easily configured to address all four of the problems outlined above and integrated with your data publication workflow.
Once the data has been processed, the resulting output is available in both CSV and JSON. The CSV contains just the rows of the spreadsheet. The JSON provides an additional section containing machine readable metadata relating to the title, publisher and license of the dataset. Further metadata can be included in the JSON version by simply editing the doc2data config file.
The ODI deployed this tool to publish ODI Summit 2014 Award nominations data. Nominations are submitted using a Google form backed by a Google spreadsheet and data is extracted and published every fifteen minutes. This practice has quickly allowed us to publish a dataset with a pilot level open data certificate.
To download the tool and for help on installation please refer to our GitHub page.