Data infrastructure technology: are blockchains the answer?

How do we make sure data infrastructure is always available, and always trustworthy? Are blockchains the answer?

We believe that data infrastructure is fundamental to our future. What we normally mean by this is "data as infrastructure" – data is becoming part of the infrastructure of society. In ODI Labs, we have a slightly different take on the issue: if data is to be infrastructure for society, what does that mean about the technology underlying the "infrastructure for data"?

null Photo source: Flickr - r2hox (CC BY-SA 2.0)

If data is becoming essential to society, then it must be:

  • resilient – always available when needed, with access able to route around damage. It can't drop offline because of a datacentre outage, or a forgotten domain renewal.
  • robust – data must be verifiable and reliable, resistant to tampering. The concepts of maldata and data spam aren't in wide circulation yet, but at some point they will be.
  • scalable – having vital data hosted on a single server will not scale up when that dataset is suddenly in high demand.

A new class of technologies is appearing that cope with many of these problems. In particular, distributed data storage – where the data doesn't reside in one place but across the network itself – is on the rise. And a lot of people are talking about one specific implementation of that idea: blockchains.

By the way, if you don't know what I'm talking about, blockchains are basically a way of storing information (transactions, in the case of Bitcoin) in a distributed fashion across the Internet without needing a trusted central server. This quick primer from the BBC is a good introduction to the idea.

There is a lot of hype about "putting things into the blockchain" at the moment. While the technology is fascinating and has huge potential, there are a few things we need to be aware of.

A blockchain, or THE blockchain?

null

Photo source: Flickr - r2hox (CC BY-SA 2.0)

There’s a confusing tendency for people to talk about THE blockchain, as opposed to A blockchain, and that implies Bitcoin. Storing data in the Bitcoin blockchain is possible, and has been done since day one, but it isn't really advisable.

So, instead of using the Bitcoin blockchain, most blockchain data storage systems are using their own chains – for instance, Namecoin, Ethereum, and Factom (who are putting Honduras’ Land Registry into a blockchain).

However, all of these still have a cryptocurrency involved. The work to verify the blockchain is done by many people, and they need to be paid for the compute time they contribute. Therefore, these systems all have their own currencies internally, like “Ether” or “Factoids”.

Is it realistic to run a distributed data store based on a pseudocurrency model? Does the very concept of financial return introduce the wrong incentives into the system? At the end of the day, who pays to maintain an effective yet radically distributed system?

Immutability

null

Photo source: Flickr - r2hox (CC BY-SA 2.0)

Blockchains are designed to be immutable, to have data written into them and be available forevermore. Technically, that seems a desirable quality as it means you can't go back and rewrite history: nobody can deny that a transaction took place, even if it was revoked later on.

A digital register may supersede or expire your permission to do something, but it shouldn’t be able to later refute that permission was ever issued to you. Paul Downey

However, technology, meet society. In the sphere of human life, immutability can be a major problem.

What about the recent EU right to be forgotten ruling? What's your legal recourse when the data you want removed from public view is stored in an immutable data store? Is there any truly immutable data?

Here's an example: in the UK, if you change your gender, you of course have the right to have your new gender reflected in all official records. That, though, includes rewriting history and backdating your new gender, so that the gender you were assigned at birth doesn't appear even in old records. If, say, your driving licence is stored in a blockchain, the old version can't be modified. It can be revoked and replaced with an updated one, but the original record is still there.

Sure, you could solve those problems by storing only pointers to data in a blockchain, and having the data somewhere else, somewhere mutable, but then you've lost the resilience aspect of the technology; the data is still centralised, even though the index is distributed.

How then, do we design data storage in blockchains so that immutability is limited to the things that need to be immutable?

Beyond blockchains

null

Photo source: Flickr - r2hox (CC BY-SA 2.0)

Nowadays, when most people say "blockchain", and even when I say it myself, I treat it as a shorthand for "undefined radically distributed storage technology". There are many other options out there, from the non-bitcoin blockchains like Ethereum and MaidSafe, to other systems like Tahoe-LAFS, and even older technologies like BitTorrent. (For great in-depth discussion of these and many others, watch the Redecentralize interviews).

Make no mistake, there is huge (and radical) potential in this technology area for data, and for society as a whole, but we need to understand how these technologies are best applied.

Standards

And whether it's blockchains or something else, there are plenty of questions. How do we standardise storage in such a system so that we get a single network of data, as opposed to having to use a different storage system every time we want a new type of information? What are the data protocols for distributed storage? How do we talk about, and perhaps enforce, ownership and licensing?

What are we doing?

We are exploring the potential applications of these technologies in the context of data infrastructure. This applies at different scales: global, national and city data infrastructure. It also applies across sectors: finance, agriculture, nutrition and global development.

ODI Labs are exploring these issues and, as with everything we do at the ODI, we will be collaborating with our network of Partners, Supporters, Nodes, and Startups. If you would like to get involved in collaboration, and sponsorship, please get in touch.

We want to experiment with the technologies, work out some of the tricky social questions, and help guide the future of distributed data storage in the right direction.


James Smith is the Head of Labs at the Open Data Institute. You can find him on Twitter as @floppy.


All images feature the artwork of Ryoji Ikeda. More information about this artwork can be found here.

The Open Data Institute connect, equip and inspire people around the world to innovate with data.