Guest post: Linked data and the future of the web
By Dave Weller
Today’s world would have been almost unimaginable then, and the phenomenal growth of the web has been intrinsically linked with many of the technology advances of the past 25 years. It will be no surprise if that continues for the next 25.
We’re used to hearing about big data, but open data (“data or content … anyone is free to use, reuse, and redistribute”) is also on the rise. People are predicting that in addition to bringing societal benefits, this will have a positive economic benefit too. Some have predicted that impact to be worth trillions.
Data on its own can’t make that impact. What you actually need is to link data – not just one data point in isolation, but other data that you can use to infer meaning and to help you make decisions from that data. For open data, the current trajectory of growth, particularly the shortage of linked open data is putting that multi-trillion economic potential at risk. At Thomson Reuters, we have been working with the Open Data Institute on recommendations for creating value with open data through the use of identifiers.
Identifiers: the key to linked data
When you, as an individual, are given a fact – a specific data point – you intuitively start to make links to other facts that give you a complete picture. If I were to tell you that I work at Thomson Reuters, your brain would intuitively link to other facts that you know about Thomson Reuters – for example that the logo is orange, there are offices in London, and it’s the same company behind Reuters news. But if you’re a machine, and one of those facts is labeled as ‘Thomson Reuters,’ another as ‘Reuters’ and another as ‘TR’, then those connections are much more difficult to make. Using different labels is manageable when you have a human to interpret and spot differences, but as data volumes grow it is vital that machines do this work in order to cope with the scale.
Open data is great because it is fundamentally there for others to reuse, much like the infrastructure of the web itself. But this freedom leads to imprecision, which restricts the full potential of the open data commons because data is not easily discoverable by others. For this to work, you need open identifiers – labels – to explain what data is referring to. It is in everyone’s interest – data consumers, data publishers and identifier publishers – to connect open data to suitable open identifiers.
Information identity as a new discipline
At Thomson Reuters we’ve been thinking about how to connect our data since before the web was even created. With the ODI we have captured this knowledge in a white paper for the open data community, and for anyone who wants to create value by connecting their own data. The web is orders of magnitude larger, more complex and more distributed than Thomson Reuters. This makes the recommendations even more pertinent if we are all to benefit from the open data that is out there.
We believe that information identity is a new discipline in its own right, something that all organisations using data (which is pretty much everyone!) should be thinking about. Information identity is vital to be able to link your own data with other data sources, whether that is from Thomson Reuters, from open data or wherever.
Will the future web be one completely connected, where every organisation (and their data) is in the web? Or will it be one where there are increasing data silos, so that connection and meaning is lost?
The future depends on the value all of us place on information identity and the ability to link data. By all thinking about the identifier schemes we use, we help ourselves, and others, to discover and connect to other data sources – and that’s where the real value comes.
This blog also appears on the Thomson Reuters website.