Using artificial intelligence and open data for innovation and accountability

In the light of the UK’s new industrial strategy and budget, as well as the ODI’s recent participation in a House of Lords evidence session around how AI and personal data should be owned, managed, valued and used for the benefit of society, the ODI’s Head of Technology Olivier Thereaux examines our work in this area.

AI Image: Ford in Cologne by GillyBerlin, (CC By 2.0)

Artificial intelligence (AI) is currently enjoying a renaissance in industry and popular imagination, and in the most recent UK government budget. AI’s popularity can be partly explained by the fact that, for the first time, we have enough large-scale data for training AI systems. There are public datasets for computer vision, natural language, speech and many more non-public datasets within businesses and governments. Recent improvements in hardware are also making it more cost-effective to train and run machine-learning models.

This renaissance is both exciting and worrying. AI is already being used to save time, money, and even lives (with doctors using it to help medical diagnosis). On the other hand, this hype may fuel the potential for AI to be misused, and result in monopolies that stifle innovation.

The UK government’s recent budget included a £75 million investment in both developing AI and managing the challenges around it. This investment will establish a new Centre for Data Ethics and Innovation to support safe and ethical innovation with AI, and facilitate data access through ‘data trusts’. Similar initiatives are being implemented in other countries, including Australia, France, China and the United States.

At the ODI, we believe that fostering AI innovation requires an open approach that includes open data, open source code and open culture. This is essential because algorithms in autonomous and machine learning systems need large quantities of high-quality data to perform well. We must focus on making data openly available where possible, and in formats that are machine-readable. Increasing access to data is key to creating a competitive and equitable AI market, where innovation can flourish, as our Head of Policy Peter Wells wrote earlier this year.

Currently, most AI-centred innovation is based on a business model where training data is considered protected Intellectual Property, and AI systems are generally provided as inscrutable 'black boxes' with no knowledge of their internal workings. This is problematic: such business model homogeneity can have a chilling effect on innovation and stall a thriving AI sector.

As Peter says in his blog post:

This brings with it the risk that this exciting new sector will be dominated by the large digital platforms who currently dominate the internet. This will reduce competition and innovation. It will reduce the diversity of the problems to which AI solutions are applied. It will stifle the energy and fresh ideas that startups and SMEs bring. It will reduce the overall size of the AI market and the benefits that AI could bring to all of us.

Better access to data maximises its use and value; however, we must also implement controls that protect the privacy, interests and security of people, organisations and states.

Another concern is the possibility that AI could be misused when making decisions that affect people’s lives. Most AI systems are impervious to inquiry; even with access to the source code of a machine learning algorithm and the ability to audit the architecture and weights of a neural network, it is almost impossible to determine how that neural network detects patterns or comes to any kind of decision or result to understand any bias that may exist, the impact of it on different groups of people, and how it could be mitigated. While there are interesting efforts to design AI systems that are able to "explain" their inner working – in the same way as humans are able to explain their decisions without full knowledge of the brain’s processes – these efforts will take years to conclude, if ever. New legislation such as the GDPR and the proposed UK Data Protection Bill may help by pushing for a stronger “right to explanation”. However, for now their impact on this issue will remain limited.

The key to the AI's inner-working resides in the training data, which contains most of the encoded bias. The bias in what is included (and what is not) is being translated into racist image recognition systems, as well-meaning engineers encode historic and current data into inscrutable algorithms that create an unquestionable status quo that reflects our current economies and societies. High crime rate in a given post code may end up condemning people who live there to an endless string of automated rejections - both a egregious misunderstanding of statistics, and a terrifying case of “computer says no”.

This is what technologist Maciej Ceglowski calls “money laundering for bias”: the risk that blind faith in the superiority and efficiency of AI will end up crystallising data about the past and the present into future systemic unfairness by way of blatant logical fallacies.

This is not a hypothetical scenario: some US police departments have been enthusiastic about AI systems that promise crime detection or more efficient sentencing. Without careful design this can lead to the adoption of flawed, often ineffective and sometimes unfair systems.

So, how can we capture the benefits of AI while mitigating the risks? We have two suggestions.

1. Better access to data will unlock the potential of data-hungry machine learning systems, but is also a way to ensure that the systems we create are safe.

It is important that this focus on opening data is not just limited to government and scientific research data. At the ODI, we have been ramping up our efforts to make data held by the private sector more broadly available, making it as open as possible while protecting people’s privacy, commercial confidentiality and national security.

As we wrote in our joint response to the UK Government’s green paper on industrial strategy earlier this year:

We would encourage government and the review to engage with the AI community to determine useful areas for service development; identify where access to public or private datasets can help to create a fair and equitable AI market and provide incentives to help organisations publish the data. This data might be published openly so that anyone can use it, or shared under controlled conditions to manage commercial and privacy risks.

The recent review for the UK government on "Growing the artificial intelligence industry in the UK" touched upon this question with the idea of "data trusts". This idea needs to be explored and we will be working on that next year. In the meantime, there is so much more that can be done to ensure that data held by the private sector is safely and equitably available.

This is why we have recently started to conduct research on AI business models. This research will explore whether the "black-box model" is the best way to build an AI service, and whether there are viable alternatives which are more compatible with our hopes for an "as open as possible" future.

2. We want to create a data economy where rights and responsibilities are adequately distributed, and where more control over the usage and sharing of data is given to the individual.

This is particularly important for fueling the uptake of AI: these algorithms often tend to use personal data as training sets. The ability of AI algorithms to spot patterns also makes them very effective at re-identifying personal data in “anonymised” data sets, causing significant concerns about individual and group privacy.

With this in mind, and building on our existing work on personal data APIs in sectors like banking and retail, we are now collaborating with Projects by IF to understand what kind of design patterns could enable safe and effective access to personal data in the utility sector. We also plan to focus some of our upcoming research and development activities on Data Trusts and on the important topics of anonymisation, de-identification and re-identification of data sets. This will help both public and private sector organisations make potentially personal and re-identifiable data more accessible in a way that protects privacy while creating a fair market.

The future for AI is not pre-determined: it is up to us to create and shape the future we want. We hope our efforts will help create a future which is as open as possible and benefits everyone.

If you have ideas or experience in open data that you'd like to share, pitch us a blog or tweet us at @ODIHQ.