Friday lunchtime lecture: Can't find the right code? Explain yourself!

Friday 01 December 2017, 1:00pm - 1:00pm

Open Data Institute, 65 Clifton Street, London EC2A 4JE

Friday lunchtime lectures are for everyone and are free to attend. You bring your lunch, we provide tea and coffee, an interesting talk, and enough time to get back to your desk.

Data standards almost always have two fundamental components: syntax and semantics.

The syntax is established by a schema, which specifies how the data should be structured. Should it be saved as XML, JSON, CSV? Should dates be written YYYY-MM-DD? What order should the columns be in? These types of questions are essential and catered for in most well-established data standards.

Semantics are more abstract. Questions arise, like: what does this data tell me about the world? What does this number mean? What is the purpose of that transaction? Answering these is more variable and often less formal than the syntax. The most common and trusted technique is to create a codelist – a dictionary of terms which establishes the semantics for a given data point. They can be simple sets of terms – detailing transaction types which can be published in a standard – all the way to rich hierarchical taxonomies. They all associate a code with a meaning, to allow people to communicate with them.

This talk explores the changing role of codelists in a time when textual data analysis is advancing rapidly. The root question is this: is it still best to start giving meaning to your data by looking for a codelist and a code, or can we use description text itself, trusting modern data analysis to do the heavy lifting?

First, we'll consider and contextualise the use of codelists within standards like Open Contracting, IATI, and 360Giving. Then we'll look at how meaning can be established in text where no standard exists – online reviews for restaurants, for example – by applying machine learning (live demo alert!). Finally, we'll apply the same approach to description text found in established open data standards and compare our results.

Rory Scott is a member of the Open Data Services Cooperative, working with national governments, multilateral organisations, and civil society to better share, understand, and use open data about international development and humanitarian financing. He does this primarily by talking to people about their data, writing Python and R code, and wrangling spreadsheets.

Book your place

Live stream

If you’re unable to attend this lunchtime lecture you don’t need to miss out. You can watch the talk via a live stream here.

About our Friday lunchtime lectures

With a broad range of topics in open data such as tracking government expenditure, British landscape mapping and creating art, there’s bound to be something that interests you.

The sessions run from 1pm to 1.45pm weekly during UK school term-times, with informal networking until 2pm. Each lecture lasts for around 20 minutes, leaving time for questions afterwards. The lectures don't require any specialist knowledge, but are focused around communicating the meaning and impact of open data in all areas of life.

Each week the lectures are streamed live on YouTube and then uploaded to our YouTube and Vimeo channels. You can also download the Friday lunchtime lectures as a podcast on iTunes.

You can follow the lectures and contribute to the discussion using #ODIFridays on Twitter