Geospatial data in a world of generative AI

Wed Mar 20, 2024

The geospatial industry has evolved over the last several years with AI developments. Global map-makers, including big tech and national mapping agencies, have long made use of image recognition techniques to automate the inference of features from aerial and satellite imagery, enabling rapid mapping of the world and have invested in improving these capabilities as data and technology have developed. Ecologists are using pattern matching techniques to identify habitats from complex datasets and then using advanced analytics to predict where habitats could be successfully created. And law enforcement is making use of advanced analytics to understand patterns of criminality and more effectively target law enforcement activities.

We set out to test the hypothesis that the explosion of interest in generative AI is prompting new waves of thought and regulation that could change what is needed to deliver established use cases for AI with geospatial data. We asked questions including:

Could new regulation around data and AI may make existing applications less viable?
Is growing awareness of how geospatial data is used in AI causing organisations to close data due to intellectual property and liability concerns? What might be the consequences?
How far do the AI applications for geospatial data change what kind of data is needed to ensure these technologies work well? (for example, an increasing need for data that is high quality, open, and with clear governance approaches)

Overall, we saw a cautious optimism about the benefits of generative AI systems in their use of geospatial data, for purposes such as:

Capacity - Making it easier to access certain datasets or analytical capacity without skills, including mapping places that haven’t recently collected geospatial data through elements such as feature detection

Speed - By using synthetic data in testing models, the speed of design and roll out can be increased - or for generating official statistical data

Deployment - More ‘natural’/user-friendly ways of interacting with geospatial data through ‘services’ based on generative AI

Scale - Monitoring over larger areas (e.g., nature reserves)

Experimentation - Designing research/initiatives that use geospatial data in the early stages, or using digital twins to assess impacts of scenarios in planning

Participation - Using synthetic data generated by AI models for interrogating models, which may in turn help more people to build models due to privacy is

But there is a lot that needs to be done to realise these benefits.

Five themes emerged in terms of the challenges and needs coming from these developments, closely reflecting our broader goals for our data-centric AI programme.

Better geospatial data for AI

As one participant said: “We have forgotten how important data is to the geospatial industry”. We heard concerns about the quality, suitability and quantity of existing geospatial data for use in AI systems to drive effective machine learning and particularly generative AI models.

The discussion focused on the need for higher quality data with better provenance and better representation of different societal groups within this data. For instance, 92.4% of the population in South Sudan does not have internet access, which will reduce their representation within geospatial datasets. It was also recognised that improved standards are needed to help understand and use data appropriately.

We need to improve the understanding of data, and its provenance, to ensure AI implementations are robust. Further, there is a need for clearer licensing and IP infrastructure to ensure high-quality data can be used in line with permissions granted. Meanwhile, AI may, participants told us, help organisations to gather and integrate data. In our data-centric AI programme at the ODI, we are taking an interest across these topics, and thinking about what the ‘data needs’ are to support a shift towards AI applications.

Ensuring openness remains a success for geospatial data in an evolving data/AI landscape

One particular concern was that there are various examples of data closing due to concerns about the use of publicly available data within AI models, both within the geospatial industry (for example, some countries are implementing controls to ensure data can only be used within their territories), and some corporations are increasing controlling of how their data is used (for example the Guardian closing their data to scraping in the wake of the release of ChatGPT).

There is a need to ensure that AI will help and not hinder openness, and questions about what the nature of open data should be with AI. We also heard about the need to nuance the discussion on what ‘openness’ should look like, in a context where ‘open washing’ is very much a reality. This includes the question of whether ‘openness’ remains the right term to use. Among others we heard a refrain we often hear here at the ODI “try to make data as open as it can be, but as closed as it needs to be”.

Ensuring representation and inclusion in how geospatial data is used in AI

Significant swathes of the global population are not represented in data and online. If models use data that does not apply to the rest of the world - bias will emerge if it is used in training data. We have seen a loss of trust in the Global South due to the perceived poor behaviour of governments, commercial organisations and NGOs. The rapid development of technology and infrastructure requirements for AI may create barriers to entry for SMEs and industry innovators.

Commercial organisations need to ensure public perception is maintained and trust is not broken by the use of algorithms and datasets that people are uncomfortable with, while resources are needed to support the equitable distribution of access to computing.

Better shared understanding and literacy about how AI is changing the (geospatial) data ecosystem, and the opportunities, challenges and threats

There are concerns about geodata and AI tech being used ‘badly and quickly’, creating issues such as loss of trust and worsening perception of data and AI services. A lot of information can be derived from geographic data using AI, which means that there is a need to develop an understanding of best practice, and mechanisms for addressing problematic uses. Senior stakeholders within organisations need a better understanding of the potential and challenges that can emerge for industries from new AI technologies and their impacts on things like regulatory environments.

There is a need for better mutual understanding about how to use geospatial data well in the context of AI. Generative AI tools should be seen as just that - as tools, such as for testing models, that can then be subject to scrutiny etc.

There also needs to be a societal and policy readiness for AI. This means that part of maturing the sector must include understanding changing perspectives on data and AI use, and better understanding of the legislative landscape.

Bringing geospatial data and the broader data landscape together

Participants spoke about the tendency to view the geospatial data landscape as separate to others. However, the challenges to a large extent remain the same across all different industries and data forms, and in many ways ‘geospatial is not special’ (a commonly debated issue). There is the common issue of a no-man’s land with some ‘murkiness’ about developments such as the use of web scraping and huge data sources to train new AI tools. With copyright cases raging, concerns about harms emerging from use of tools continue.

It was agreed that there is a need for the geospatial industry to avoid creating a new category of ‘GeoSpatial AI’ and instead build a culture of talking about ‘GeoSpatial for AI’. Any future conversation must enable geospatial, AI and data specialists to collaborate toward common goals.

The next steps

We hope this is only the beginning of coalescing stakeholders to address all of the needs we have outlined. We are looking to develop partnerships across the landscape of data’s role in AI as a part of our data-centric AI programme. If you would like to work with us on any of these issues, we would love to hear from you.

About us

Our five year plan

What we do

Solid

Membership

Geospatial data in a world of generative AI

Better geospatial data for AI

Ensuring openness remains a success for geospatial data in an evolving data/AI landscape

Ensuring representation and inclusion in how geospatial data is used in AI

Better shared understanding and literacy about how AI is changing the (geospatial) data ecosystem, and the opportunities, challenges and threats

Bringing geospatial data and the broader data landscape together

The next steps

Related

Geospatial data in a world of generative AI

ODI Summit 2023: Data Changes

Outcome of the INSPIRE regulations 2009 review [report]