By Rachel Wilson
Geospatial data has a lot of potential for government and businesses, but there are challenges to publishing and using it, from a lack of standards and tools
The ODI’s R&D team have been exploring the challenges to publication and use of open geospatial data by government and small businesses. Our report exploring the needs of geospatial data users highlighted that the lack of common standards for data attributes across UK geospatial datasets is a source of friction.
GeoJSON is a web-friendly file structure designed for representing simple geographical features, along with their non-spatial attributes. These attributes, or “properties”, are user defined and are usually specific to a domain of interest.
One of the challenges we have encountered is lack of support in data publishing tools for people working with GeoJSON to define simple schemas for their data files.
A GeoJSON file of toilet ‘features’ looks like this. Highlighted are the user-defined properties specific to the domain of public toilets, including BabyChange – a yes/no value that indicates whether a baby change facility exists – and ChargeAmount – which is a number, presumably a decimal number, here without a pound symbol.
In this way, the GeoJSON structure permits us to associate useful information with the pure geospatial data. You can see the geospatial data, together with its properties, displayed on this map of Great British Toilets.
Geospatial tools to promote collaboration
Someone who wants to collect geospatial data on a topic might want to work collaboratively. Imagine a nationwide campaign to observe wildlife in back gardens – whoever creates the campaign would like contributors to provide data in a GeoJSON file, using the same properties.
Thinking back to the toilet example above: should data about the charge amount be in pounds or in pence? Should we specify the currency? What if this were an international project – should you specify a currency symbol or another property for currency code? Should BabyChange be a yes/no value or a true/false value, and, if the answer is no, is it permissible to leave it blank? Imagine if everyone who contributed data made a different choice.
One of the fundamental ways data publishers can increase quality and consistency, and therefore reusability, of their data is to use a schema. A schema is a document that describes the structure of your data including rules and constraints on the values. Datasets that use the same schema will have the same structure, property names and datatypes and are therefore easier to combine.
The creator of the the wildlife campaign would like to create a schema for their geospatial data. But how would they create such a schema and how would they share it for others to find? Once someone has found the schema, how might they use it to structure and validate their data for consistency before publishing?
We have found there is a gap in the tools available for working with GeoJSON. It is not common practice, nor is it simple to define a schema for the user-defined properties of the features. We have not yet found any support for defining or applying schemas in commonly used geospatial tools.
In response to this need we have made some sketches showing how geospatial tools might provide the means to define a schema for GeoJson features, and publish the schemas on the web. Our demonstration tool Octopub already publishes data to github, so could we extend it to publish schemas?
Models vs Schemas
Early on in our thinking we realised that a schema applies to an entire dataset and GeoJson already has a standard structure. The domain expert doesn't need to know about the GeoJSON structure itself, but they will best understand what properties we need to capture for a feature, and they will understand what constraints there are on the values (eg a decimal number with no currency symbol). Perhaps what we need to create first is a model, or outline, of what should be specified in the properties.
Based on this “model” we can then generate a complete schema for validating that a dataset is both in GeoJSON format and also contains the required properties, with values that comply with any constraints.
Here is a sketch showing what a model would look like in Octopub, the ODI's demonstrator tool for data publishing.
We imagine you would build up a model by adding a series of properties. These properties could have many constraints.
Here is how we might add a property. In the image below we have already defined several constraints, ready to complete the property name and description before adding the complete property to the model.
After a model has been created we could click a button to publish it. This would generate a schema for the complete GeoJSON file in a standard format and publish to GitHub with an accompanying homepage of documentation that would look much like the model page above.
We welcome feedback on this idea. Does the language make sense: model vs schema, properties and constraints. Does the workflow make sense? Is there demand for such a workflow in the tools you use?
Please get in touch with your thoughts and suggestions at [email protected]. And if you are the developer of a tool for data management or publishing, we would love for you to share how you support, or would support, GeoJSON schema publishing in the data toolmakers forum.