Cridland (CC by 2.0)

TomTom – Data assurance is ‘critical for us, and it’s so difficult to do’

Mon Oct 25, 2021
$download_content = get_field('download_content');

We talk to TomTom about how it ensures that the data it uses is correct, ethical and safe for road-users

TomTom sells data, software and services which are used in many cars, trucks and other vehicles around the world, and by other customers to build their own applications. TomTom uses data from over tens of thousands of source files a year – from car sensors to local government open data – but how does it ensure that this data is correct, ethical and safe for road-users? 

ODI’s Head of Marketing and Production Hannah Foulds talks to Fay Simcock, Senior Product Owner at TomTom

 

Hi Fay. Could you start by telling us a bit about your role at TomTom, and what it involves?

I work for the division of TomTom that makes the base map itself, which includes aggregating relevant data, organising it in the right way, and then presenting it in a standard format. I don’t work on the devices themselves, just the data that feeds them.

We have lots of data sources coming in, and my role is to make sure it’s discoverable and used correctly.

 

Does TomTom use data collected by other organisations or sources? And if so, could you share examples?

Yes, we collect data from a number of different sources. For example, to collect basic geometry data (the lines that make up the map), we use a mixture of sources – including driver data and even paper maps! We also aggregate address and ‘places of interest’ data – such as petrol stations. For this we use any open data we can legally use, alongside data we collect ourselves and also data that we buy (eg UK postcodes).

We’re also increasingly using data coming from car sensors. Here, we use a data stream rather than the images themselves, and this can provide us with lots of different information, including things like speed limits.. This is important to get right – what if the car gets it wrong? To account for this, we may need to compare sources to test for validity or even go out to check.

As mentioned, we do use open data. Lots is provided from government organisations, but it doesn’t always align (sometimes the gov doesn’t know the speed limit on its own roads). A lot of open data doesn’t align because the collection systems aren’t good enough. Much of the UK’s traffic data could be weeks out of date because it currently still relies on local authorities publishing their data as text.

 

What are the biggest risk factors with using these other data sources? What could go wrong?

It’s all about the various regulations around data that’s used in the cars. If there’s inconsistent data and we choose the wrong source, then we could be liable.

A lot of our concern is where our legal responsibility falls as a data aggregator. We spend a lot of money on the provenance – and recording the provenance – of data. Much of our concern is around safety.

Sensor data is a growing area, but there’s a real lack of standardisation around it.

 

How do you ensure these risks don’t turn into actual issues? What checks or processes do you put in place?

It all starts with a legal check – the basis of everything we do starts with ‘can we use this data?’, especially if it’s free or open.

All our contracts or terms are carefully stored, but terms and conditions can sometimes change once we’ve already started using it.

We also collect data via web scraping – where data is extracted from a website. But there are few standards around being absolutely sure you can use data on a website – every website is different.

 

How do you check the data you’re using is up to date? 

It’s very difficult. We do a thorough investigation at the beginning – comparing the data against what we already have (eg data we’ve created ourselves). We may also go out and externally verify that data.

After this, we document that the supplier and the dataset are trusted. However, there’s no standard for checking data – we are determining the trustworthiness ourselves. We have a vast suite of checks we run on the data, such as seeking anomalies in the data patterns (as well as missing data).

We’ve developed these checks in-house ourselves over a number of years, as there is no agreed method for checking data or an industry body for map making. There’s some individual standards, but nothing about what’s acceptable metadata.

 

Who’s responsible for checking the data is trustworthy? Is it one or two people or is it more widespread? 

We have over 100 people looking for data sources. All of these people seeking data sources are responsible for verifying that the data is trustworthy (and that it meets a certain standard – and defining that standard).

TomTom’s reputation is dependent on the data.

 

How do you make sure that everyone’s standards are consistent? 

Through training – we’ve defined a competency matrix for the people that acquire the data, to quite a detailed level. We also have course material backing that up.

For the checks, there’s checklists and templates that they need to fill in.

It’s still not perfect – data is not homogenous. What someone has to do to verify data in some African countries would be very different to verifying data from the US.

 

How do you keep abreast of all the tools and processes available to you, and how other organisations are doing it?

On a really detailed level, it’s quite difficult. There’s quite a lot of secrecy in map making. Customers (such as big automotive companies) help us – they will be abreast of what everyone’s doing, and will inform TomTom if they think we’re lacking in some way.

Automotive companies take this very seriously – they understand that a car is now essentially a gigantic computer (eg with technology like self-driving cars etc).

 

What challenges have you faced when trying to do this?

The biggest challenge we face is a lack of metadata. When we get a dataset with no context, we don’t know where it’s from or if it’s had any quality checks.

If the data is important enough, we will go back and ask about the context and record the metadata ourselves. If there was more of a standard around it, everyone would benefit.

 

How well do you think TomTom manages data assurance? Is this something you think your customers already expect from you? 

Customers are 100% aware that they want data assurance, and they will tell TomTom – or the car manufacturers – as soon as something goes wrong. For example, if speed limits outside of schools are wrong, customers will hold you accountable.

For us, data is what we make our money on. It’s critical for us, and it’s so difficult to do.

 

You pass aggregated data onto other organisations, such as game developers. Do you feel a sense of responsibility for how the data then gets used? Is there any process for how that data gets passed on?

We may need to limit what data a company like a games company would get, and we always create contracts that state what someone can use and they can’t – we don’t sell the data without asking how it’s used first.

We will always get inadverted uses – data has a meaning, and if people don’t understand the context, they get wrong results.

 

What advice would you give to other organisations trying to manage the data coming from other sources?

The variations in data is massive – even in reasonably structured data (or non-textual data). A lot of road data comes from people who have built it up over years (such as local authorities), and there may be nothing in common with how the neighbouring authority collects it.

Learn your data. Learn what’s in there and what’s missing, then translate it into a common format to compare it in some way.

For example, we’re finding that as Eastern Europe’s road data is generally collected more recently, it’s quality is much better than in the UK – where it’s been developed over years and has no common platform.

Get data assurance advice, tools and guides

The ODI is looking at how we can make it easier for you to ensure your data is being collected, accessed, used, and shared in trustworthy ways. In the coming year we’ll be releasing a range of outputs to help you in your role. Sign up below to be among the first to find out when these outputs are released.