ODI Fridays: Bad data (and how to fix it)

Details

Fri May 4, 2018 13:00
Open Data Institute, 65 Clifton Street, London, EC2A 4JE

Bad data is everywhere. A lot of time is spent fixing these issues, instead of actually analysing the data. In this talk, you’ll hear about Good Tables, a tabular data validator that is able to check for issues

Friday lunchtime lectures are for everyone and are free to attend. You bring your lunch, we provide tea and coffee, an interesting talk, and enough time to get back to your desk.

Bad data is everywhere. A CSV that doesn’t load, a spreadsheet that is badly formatted, a date column that has different formats, and so on. A lot of time is spent fixing these issues, instead of actually analysing the data. In this talk, you’ll hear about Good Tables, a tabular data validator that is able to check for issues like:

  • All rows have the same number of columns
  • There are no duplicate rows
  • The data types are correct (e.g. a numeric column has only numbers, a date column has only dates in a specific format, etc.)

It also allows writing custom checks using Python.

Goodtables is useful both if you are a data publisher, by helping you to increase your data quality and facilitate the data reusability, and if you are a data user, by giving you a quick way to check the data for errors. It can be executed locally or via https://goodtables.io, a continuous tabular data validation service.

You’ll also understand about how the Frictionless Data’s Data Package and Table Schema specifications can help you describe and load datasets.

About the speaker

Vitor Baptista is the engineering lead for the Open Knowledge International. Since joining in 2012, he worked on a range of projects related to open data, like building data portals using CKAN, improving fiscal transparency with OpenSpending, aggregating and releasing clinical trial data with OpenTrials, and more. His main interests are in how we can use data and data visualization to make better decisions to improve the world. He is currently based in Birmingham, UK.

Live stream

There will be a live stream of the talk on this page from 1pm on 4 May 2018.

Live stream