An open letter to Santa and a vision of open data for 2014

This was my contribution to the Open Data Christmas Friday Lunchtime Lecture.


Dear Santa,

Thank you for giving me some of the things that I asked for last year. Like the great technical team here! Oh, and that lovely price paid data from Land Registry. And obviously the lego. I'm very grateful.

I've tried very hard, but I know that I haven't been entirely good this year.

There was that time that I used some data that I shouldn't have. The terms and conditions said things like 'personal use only' and 'must not be changed in any way' and 'you must not download anything from this website'. But the page I got it from said 'open data' and it was just a link on the page and ... honestly, the click was accidental and then, since it was on my computer anyway, and it was really useful... I felt bad afterwards. And I don't think they really mind.

Oh, and I think I've been a bit ratty with those people Owen mentioned in his post about adaptation of the Open Government Licence. You know, where they say 'you can do anything with this data because it's available under the Open Government Licence... as long as you comply with the following conditions', and then they require you to not use it for anything actually useful. I do think they mean to be open. They just accidentally listened to lawyers who don't understand what open means. So I'm sorry for getting angry at them.

And I admit that I have used some bad language. Yes, it was about PAF. Yes, when this amazingly important national information asset was sold off along with Royal Mail. I mean I know from what I read that the government screwed up pretty much every other part of that particular privatisation, but it's the PAF part that's important to me. So yes, I did swear. But it was really fucking stupid.

Anyway, I'm hoping, Santa, that you can overlook the bad things I've done. I'm only putting three things on my list. And I promise I'll be really good and share them if you give them to me.

First, I want a good data portal. I know I know, everyone wants a data portal. You probably have loads of them stacked up at the North Pole just waiting for deployment. But I really really want a good one.

I don't want one that teases me by listing datasets that sound really really interesting from the title, and then I go through to the page, and I read the description and it sounds really juicy, and then, then I look closer and I find that they not open. And it's 'for commercial reasons'.

I don't want one where when it feels like half the data I try to get hold of gives a 404 page. Or where I get bumped to the home page of the organisation that's supposed to be publishing it, and I have to remember the name of the data that I was looking for in the first place so I can search for it there. And I really don't want one where the other half of the data on it is years out of date. I mean I know that people get tired of publishing data, I just hate feeling like I have to wade through piles of unwanted trash every time I use open data.

So please can you give me a data portal that helps me find good, up-to-date data when I need it? I don't care if it only has 20 datasets on it, so long as they're all, you know, not rubbish.

The second thing I'd really like is some open data from a big private company. I really don't care what it is. It could be lists of products, or statistics about the business, or a classification scheme that they use. Honestly I don't care. But I want it to be properly open, not open-wash open. Not open where you say it's open and then only let people use it for the duration of a hackathon. Not open where it's free to use so long as you don't use it in a business.

I just want one company that publishes open data properly. One company that thinks that open data is for life, not just for Christmas.

And my third thing, well... can you destroy Excel? Is that possible? Actually, Excel is a really useful tool, so perhaps not destroy that, just all the open data spreadsheets that it's been used to create? Actually no, not destroy them because then we'd lose the data too, so I guess, just change them somehow.

I'm so fed up with spreadsheets that look beautiful but are really really hard to get data out of. You know those ones with 50 different sheets each of which gives data about one measure? Or ones where there are five tables on one sheet, each one below the next, separated by headers? Or ones where people use tabs within cells, or italics, or bold, to indicate some kind of hierarchy of values?

I don't actually know what's worse. Is it the thought that these spreadsheets are being created automatically using some code that could easily be used to create usable data instead? Or is it the thought that each one of these spreadsheets has been lovingly created and crafted by someone?

Anyway, that's my final request: no more Excel spreadsheets for data.

I guess these are quite big things to ask for with only five days to go. And now I think of it, I have a sneaking suspicion that I asked for these last year... Perhaps you need some help? If we work together, perhaps we can aim for next Christmas?

Lots of love,

Jeni