The recent and spectacular advances made in generative artificial intelligence have been possible because of shared data. Chat GPT and other large language models were trained on gigabytes of text freely available from the Internet. The strong link between advances in AI and ample IA (information availability) is reflected in the UK government’s National AI Strategy, which highlights data sharing as a critical catalyst.
Central banks and financial regulators can play constructive roles here. Since the financial crisis of 2007-09, regulators have acquired significant new sources of microdata to remedy prior gaps in their market intelligence which the crisis manifested. For example, many financial regulators now have access to transactional data on over-the-counter derivatives markets. Many also collect granular data on key credit products such as mortgages. These data are important informational ingredients in policymaking today.
However, regulators should regard these data as not only inputs into policy but as instruments for its implementation. Increasing access to such data is an additional policy lever regulators can pull to achieve their objectives. These include promoting healthy competition, safeguarding system stability, and tackling financial crime.
Promoting healthy competition
If, as is often said, data is the ‘oil’ powering the ‘Fourth Industrial Revolution,’ then those who have access to ample supply have a competitive advantage over those for whom it is scarce. In banking, for example, this means that large firms with longer loan books and deeper pockets to hire data scientists, may be best positioned to make the most of advances in machine learning. By more effectively analysing their in-house information using AI, large firms might win market share from small firms.
If this happened, it could have consequences contrary to regulatory objectives. Although consumers might benefit initially from innovations delivered by incumbents, this might come at the cost of poorer quality services over time as the sector became less competitive. Additionally, this outcome would strengthen the hand of incumbents to extract public subsidies because they are perceived to be ‘too big to fail.’
Regulators could ward off this development by removing barriers to entry stemming from information asymmetries. Indeed, some regulators are already. The UK’s Financial Conduct Authority, for example, has made hundreds of datasets available to startups through its permanent Digital Sandbox launched this year. The Sandbox includes AI-generated synthetic data, which is statistically representative of real microdata but shed of sensitive aspects, making it suitable for sharing.
Safeguarding system stability
Although small firms and startups might benefit the most from regulators sharing data, all firms could benefit from having a more holistic view of the markets they operate in. For example, large incumbents might forecast defaults better if they could train their credit risk models on the entire distribution of borrowers instead of just their subset, particularly when launching a new product or entering a segment of the market where they have no prior experience. More data could improve the risk models of all firms, reinforcing the stability of the entire financial system.
This desired outcome must be weighed against potential unintended consequences. For example, some consumers might find it harder to access credit if firms converged on a set of best performing models. Additionally, if firms developed similar models because they were trained on the same data, then firms might behave alike during times of stress, amplifying systemic risk. The relative costs and benefits from sharing data for different use-cases could be assessed experimentally by regulators and researchers.
Initiatives to increase access to data also must be done in accordance with data protection and other relevant regulation. Fortunately, there are an increasing portfolio of privacy enhancing technologies which can enable firms to train models on regulatory microdata without regulators actually transferring the data to them, as detailed in a recent report published by the UK’s Information Commissioner’s Office. In some circumstances, microdata might even be shared publicly. For example, the US Federal Housing Finance Agency via Fannie Mae and Freddie Mac already publish loan-level data from which firms can build more accurate credit models.
Tackling financial crime
Financial institutions, regulators, and the public share a common interest in combating financial crime. Yet today financial institutions still attempt to predict and prevent these crimes working largely in isolation from one another. This is a costly and suboptimal solution.
Thankfully, there are signs that a more collaborative approach is gaining ground. For example, the Monetary Authority of Singapore is collaborating with banks in its jurisdiction to build a platform for sharing information on financial crime. The BIS has also recently completed a pilot study of technologies to support data sharing across borders to bolster anti-money laundering checks.
A particularly promising approach is federated learning. With federated learning, firms train a common model on their own data. They then share these calibrated models with a regulator who combines them. The aggregated model is then sent back to firms. Firms thereby benefit from running a financial crime detection model trained on more data than any one of them have alone.
The opportunity ahead
One of the striking incongruities in financial services today is that while many machine learning algorithms used by firms are open and publicly sourced, the data on which they are applied are closed and proprietary. Regulators are uniquely placed to harmonize this situation. They can increase access to their own data and promote safe sharing between firms, like the UK’s Competition and Markets Authority did through Open Banking regulation.
As researchers at the Open Data Institute have pointed out, data is different from other factors of production like land in being ‘non-rival’: the same data can be used by multiple parties in parallel without being depleted. Increasing access to data can therefore stimulate output and innovation, as recent advances in AI attest.
In sum, regulators today should view data not merely as evidence for informing policy but as a policy tool itself. In so doing, regulators can better achieve their objectives, including ambitions to support economic growth and international competitiveness.
This blog was originally published on www.centralbanking.com.