What is open data?
Open data is information that is available for anyone to use, for any purpose, at no cost.
Open data has to have a licence that says it is open data. Without a licence, the data can’t be reused. The licence might also say:
- that people who use the data must credit whoever is publishing it (this is called attribution)
- that people who mix the data with other data have to also release the results as open data (this is called share-alike)
For example, the Department for Education makes available open data about the performance of schools in England. The data is available as CSV and is available under the Open Government Licence, which only requires reusers to say that they got the data from the Department for Education.
These principles for open data are described in detail in the Open Definition.
Good open data:
- can be linked to, so that it can be easily shared and talked about
- is available in a standard, structured format, so that it can be easily processed
- has guaranteed availability and consistency over time, so that others can rely on it
- is traceable, through any processing, right back to where it originates, so others can work out whether to trust it
The Open Data Institute aims to help organisations who are using data be aware of which data they can and can’t use and to help organisations who want to publish data to make that data sharable, structured, reliable and traceable.
What’s the benefit of publishing open data?
We traditionally think of data only being relevant to a “Data Industry”. Open data broadens this horizon to encompass everyone.
Organisations in the data industry
The data industry makes money from selling access to data. For these organisations, open data serves as an easy on-ramp for potential customers who might go on to pay for access to that or similar data.
These providers can encourage usage of their paid-for data by:
- providing only a subset of their data as open data
- providing access to open data in a way that makes it hard to get all the data at once (e.g. only through API and not as a bulk download)
- only providing guarantees about availability (through Service Level Agreements) to paying customers
- limiting the frequency of access to the open data (e.g. rate-limitation on APIs)
- only providing open access to stale information (e.g. having a lag on availability or only updating every six months)
- using share-alike licensing and thus limiting which other data it can be mixed with
- requiring that people register to access the data
The data industry can also make money by selling services around the data that they make available for free; because they have a deep understanding of the data they hold, they can offer a unique level of insight into it.
Evidence from similar industries shows that moving to a service model rather than a product model will help the data industry survive the growing trend of commodotised, open data. The Open Data Institute can help these organisations make that shift.
Organisations in the public and private sectors, and everyone else!
Organisations that aren’t in the data industry often don’t recognise the potential of the data that they hold, but can frequently get direct benefits from publishing it as open data.
For example, publishing open data can:
- increase custom for an organisation’s main (paid-for, non-data) product or service
- ease information sharing within the organisation and with partners and customers
- help achieve other organisational goals (eg by providing a wider audience for published information)
- provide incentives for others to help maintain datasets, reducing the maintenance cost for the organisation
- encourage innovation around the organisation’s data (eg new analysis techniques they can take advantage of)
- help third parties to develop skills and services which the organisation can contract rather than developing in house
The Open Data Institute can help these organisations to identify data they own that can be opened up to realise these benefits.
How does open data relate to…
Data comes in all shapes and sizes. The “open” in “open data” is predominantly about the licensing terms that are applied to data, and hence how it can be used by others. Here are some other phrases that often come up when talking about open data.
Open standards are standards that are developed through a fair, transparent, collaborative process, available under a royalty-free license. Open standards may apply to data formats, to the protocols and APIs that are used to pass around information, and to tool configuration.
To be most useful, open data should be made available using a format defined in an open standard, for example as an XML, JSON or RDF format, and should be delivered over a protocol defined in an open standard, such as HTTP, as well as being licensed with an open licence.
Big data involves very large datasets, such as billions of tweets or terabytes of sensor data. Big datasets have particular challenges for open data:
- they may make API access (to access partial datasets) more attractive than dumps for some developers
- they may mean that data dumps are hard to create
- they may require publication of the dataset through special means (eg as a torrent)
- they make feeds of changes to data (enabling people to patch a dump they originally downloaded) particularly useful
Open data can include small, medium and big data.
Linked data is a way of integrating data with the web. It uses URLs to identify things with more data about those things being published at those URLs.
For example, with linked data, each school would have a web address that was used in data about the school, and putting that web address in a browser would give you more data about the school.
Publishing data as linked open data has the advantages of
- being able to link up and merge together data from different sources that refer to the same identified thing
- being able to publish data in a distributed way, with intermediaries collecting and aggregating data
The midata initiative encourages organisations to provide access by individuals to the personal data that organisations have about them. Open data should not include personal data, but adding open data to personal data increases its utility.
Do we all have to use the same standard data format?
If many different data owners are publishing data about the same kind of thing, such as how much they spend on different services, it helps re-users of that data if they all publish the data using the same format. In general, reusing existing formats (such as using JSON or CSV rather than creating a custom text-based data format) helps re-users to take advantage of existing tooling.
Do we all have to create APIs?
Open data can be made available through an API or through a bulk download. The appropriate way of making the data available will depend on the type of the data, how frequently it changes, and the kind of things that people want to do with the data. For example, data that changes rapidly is often best served through an API, but data that could provide insights when analysed in bulk should be made available as a big download.