Understanding the common technical infrastructure of shared and open data

Wed Nov 25, 2020

As part of our research on data institutions, the ODI is embarking on a new project exploring their technical infrastructure. In the coming months we're going to be documenting and designing improved tech for shared and open data initiatives. We need your help to make it happen.

Hadoop. NoSQL. Blockchain. BigTable. Data Warehouses. Data Lakes. Azure. Redshift. ELT. ETL. For at least as long as ‘big data’ and ‘AI’ have been buzzwords, the tech world has witnessed an ongoing explosion of technologies and approaches for processing, managing, and publishing data.

At the same time, we’ve seen a growing understanding of the benefits of opening up and sharing across data silos has been steadily expanding the range of data-handling requirements and use-cases – along, often, with their complexity. In the meantime, older and more mature technologies haven’t gone away: tools like RDBs, RESTful APIs, and even CSV files are often pressed into service for these novel use-cases, and often fit the bill.

Which tool for what?

Part of the ODI’s mission is to help ensure that data remains as open and shareable as possible – and we’re concerned that this proliferation of tools for data-handling can sometimes act as an obstacle for organisations hoping to spread the benefits of their data as widely as they can.

It’s true that, whatever your data management needs, there’s probably already a solution (or, more likely, combination of solutions) that will work for you. But it can be harder than ever to determine which these actually are. Does your organisation need a distributed ledger? Continual model updating? Differential privacy guarantees? The range of options and permutations is enormous.

The risk of picking the wrong solutions – or of developing an expensive and difficult-to-maintain one – is correspondingly high. From the ODI’s ongoing work on data institutions, we know that data-access initiatives can be costly to set up and keep viable – and that maintaining critical data infrastructure forms a large part of that cost. In an ideal world, there would be a few basic technical patterns, with a limited number of variants, that would help data initiatives share their data sustainably.

Data infrastructure exploration

In the coming months, the ODI’s aim is to explore how far we can make basic technical patterns a reality. That goal may sound ambitious, and it is. We have a lot of research to do, and big questions to ask. But the ODI is building on its substantial expertise in this area. Last year we released our data trusts tech report, surveying the technical underpinnings of data initiatives across a range of sectors. Our technology and R&D teams have always been happy to get their hands dirty designing and creating open data publishing tools, such as Octopub, CSVLint, and the Frictionless Data Toolkit. Most ambitiously, with the OpenActive initiative and others, we’ve helped to nurture and foster complete open data tech stacks and ecosystems into maturity.

The ODI will be kick-starting this latest data infrastructure exploration with a short project to delve deeper into what constitutes a core or shared technical infrastructure for data institutions – or whether there are, in fact, multiple different scenarios that each require their own widely-varying technical stack. For example, how different is infrastructure supporting open data initiatives from that of organisations needing to share sensitive data? Where are their requirements different, and where do they converge? In addition to building on our own expertise in this area, we want to work with existing initiatives to learn from their experience. In addition, our work here will go hand-in-hand with other work the ODI is undertaking in this area – such as sustainable data institutions and data infrastructure for common challenges.

Get involved: new role and tech architecture fund

To help us succeed, we are looking to bolster our technical capacity – both by increasing the range of tools data institutions can use to maintain, curate and govern their data, and by growing our technology team.

With regard to the toolset: the ODI is keen to hear from organisations who already build and maintain core building blocks for data access and sharing, and who would welcome help make them better and more interoperable. With this end in view, we are launching a new tech architecture fund with an initial award amount of 4x £15,000.

For our team: to support our infrastructure work, we looked to enhance our technology team with a new member who has a thoroughly architectural mind and experience of working for or with data access programmes or data institutions. We originally advertised this position as being for a ‘Data Architect’. On reflection we realised that term was too narrow, but we were still looking for somebody who could analyse data systems and design potential solutions at an architectural level, and so advertised for a ‘Technical Architect’. Please note this vacancy is now closed.

And, of course, if you have any other questions, concerns, or comments in this area to contribute, or if you’re hoping to collaborate with us in a way we haven’t already mentioned above, please don’t hesitate to email us. The ODI knows the questions we’re asking are large ones. We welcome any reflections or further feedback you might have.

About us

Our five year plan

What we do

Solid

Membership

Understanding the common technical infrastructure of shared and open data

Which tool for what?

Data infrastructure exploration

Get involved: new role and tech architecture fund