If you’ve ever asked a technical specialist a question, you’ve probably been met with the frustrating reply: ‘well, it depends’. This is how we felt in 2018 after assessing technologies used by data trusts in three pilot organisations.
By data trust, we mean a legal structure that provides independent stewardship of data. A data steward is a person or organisation who collects, maintains and shares data.
The three pilot organisations were very different in terms of organisational maturity, domain, budgets and industry norms. These differences made them interesting to study, but so greatly impacted the technology options we could draw only two conclusions: (a) there is no universal technology solution for a data trust – no specialist hardware or software; (b) a data trust is a data steward at heart, but with additional functional and non-functional requirements. For example it could use technology to facilitate decision-making, control data access or keep an audit trail. However, those ‘operational’ technologies would interact with the same core technologies as used by any other data steward.
So, what technologies might data stewards use for data hosting, access or – in the case of data trusts – specialist operational functions? What budget would they need? What skills? Well… that depends. And we found little commonality among our three pilots.
We wondered if we looked wider, would we find commonalities or patterns? We know that design patterns are useful ways to communicate successful solutions to common problems. And so we started a short research project to investigate the common technology used for data stewardship and access.
The right tool for the right job
Technologies are tools that enable. To match the right tool to the right job, we first need to understand two things: what is the job to be done, and what are the properties of the tool.
To better understand the ‘job’ of data stewardship, the ODI has explored some specific models such as data institutions and data access initiatives. We’ve identified some important roles data stewards play in a data ecosystem, and proposed big-picture planning activities for initiatives whose role is strengthening data infrastructure to address specific challenges.
We are also fortunate to be connected to a diverse set of organisations via the R&D stimulus fund and Microsoft Peer Learning Network. We interviewed some of them about their technology infrastructure and how it enables their operation and goals.
When all you have is a… spreadsheet?
There are many tools that can do the same job. The materials of technology are hardware and software so we can create almost any tool imaginable! It helps to relate the function of the tool to its purpose. There might be dozens of chisels, but you’ll need one of them to fit a joint rather than sandpaper.
The diagram below is an attempt to generalise the technologies of data access into elemental building blocks so that we can have conversations relating technologies to purpose.
We’d love feedback on this sketch in case our experiences have limited our imagination (when all you have is a hammer etc). Is it missing anything? For those without a technical background, does this framing clarify anything for you, or is it too general?
Browsing the store
Even knowing the function of a tool we face a confusing array of options. To narrow the field we might ask questions like: What’s your budget? Would you like a hand-crafted tool that will last a lifetime or will a mass-produced one suffice? How serious will it be if you make a mistake? How handy are you? Do you have to own it or can you borrow it?
These questions explore the characteristics that constrain your options. We found two categories that may influence technical choices.
Dataset archetypes influence the form, scope and scale of data and how it’s managed. For example:
- One-off studies
- Sensor feeds
- Statistical indexes
- Personal records
- Social graphs
And organisational characteristics that influence decision making, such as:
- Budget and funding models
- Size of organisation relative to stakeholder communities
- Placement on the data-spectrum
- Business model, eg co-operative, non-profit, corporate enterprise
- Organisational maturity
- Tech-‘gusto’ eg openness to cutting-edge or deeply embedded technologies
- Interaction style eg B2B, B2C
- Governance model and regulatory context
- Temporal eg asynchronous, archive or real time access
All these aspects will be significant to some extent. As an exercise we explored some divergences for different characteristics. For example:
Similarly for the six various roles, eg:
Might we discover patterns for each characteristic? For example, do ‘co-operatives with large budgets that combine sensitive sensor feeds’ always build ‘decentrally published data downloads restricted by biometric authentication’?
Patterns emerge when there are common solutions to common problems. To look for evidence we’re experimenting with an interview structure that uses the layers and elemental building blocks as a framework for investigating how organisations’ technical infrastructures enable their operation.
We’ve found the framing useful in early interviews. Grounding technologies in their purpose helps clarify the boundaries and relationships between the technology at a helpful level of detail.
Complementing strategic planning activities
One pattern or dimension under consideration is the systems architecture or ‘topology’ ie whether any technical components such as databases are centralised or decentralised, and how many connections there are to other components.
Even from a small sample of interviews we’ve noticed the topology of the technology seems to mirror the data ecosystem. Melvin Conway would not be surprised.
Because an organisation’s stewardship role influences how it interacts with others, we wonder if we will find patterns of systems architecture common to each role. If so, strategic activities like ecosystem mapping could flow naturally into technology planning.
Which raises questions like: In what ways is data strategy related to technical strategy?
As an organisation evolves so should the technology that enables them; how to evolve towards and away from technical choices that are no longer appropriate? This is important because getting this wrong can impact an organisation’s ability to operate and lead to loss of trust.
Deeper and DAMA
‘Non-functional requirements’ are crucial factors revealed by questions starting ‘How important is it if…’. For hand-tools we might ask how important is it that it can be dropped from a height; that it is reusable; that it fits in your pocket? For technology we might talk about reliability, quality, compatibility or portability; topics that are discussed in over 650 pages of the Data Management Body of Knowledge (DMBok) reference book.
Our colleague Tim Hill has valiantly distilled DMBok down to 20 crucial areas of data management requiring technical support such as governance, legalities, functionality and data quality. Framed as a question guide, the answers would suggest different technical solutions. For example the answer to ‘How resilient to failure does the system need to be?’ may influence backup and data distribution strategy.
Matching all these requirements to specific technical solutions requires a breadth and depth of specialist knowledge about features, appropriateness and tradeoffs. For example, what queries are best answered by graph databases; when might you use third-party platform services; what constraints does a security measure introduce? We could spend years developing this knowledge at the ODI, but perhaps we could partner with technical specialists.
Further research is needed to understand what data stewards need from technical guidance at what stage.
What might we produce
We think technology advice will need to be more three-dimensional than simple flowcharts, since each aspect will matter to some extent. There might be potential for a ‘technical planning toolkit’ which would dovetail (ahem) nicely with data landscaping and other activities that examine the organisation’s context.
A technical planning activity might explore:
- Role in the ecosystem and interactions with stakeholders
Ecosystem mapping could inform subsequent technical considerations appropriate to the organisation’s stewardship role and organisational characteristics
- Requirements of the technology for each layer
Use the question guide and the ‘visual thinking tool’ as a framework to surface non-functional requirements. Functional requirements would further determine the specifics of software solutions.
Based on some very early feedback we’ve identified three potential audiences:
- Founders of organisations who may be non-technical, who want foundational knowledge and a framework to (a) explore technical options and (b) have conversations with technical solution providers
- Organisations that are producing data strategies who want to choose technical infrastructure that enables their data strategy, rather than being led by the technology
- Organisations that are reviewing their technology infrastructure who want to make strategic decisions to support their evolution
We started this project with a daunting awareness that ‘it depends’. Now we have:
- clarified the high-level building blocks that relate to data access and stewardship
- a framework to discuss what it depends upon: an organisation will need to make technical choices for each layer of technical infrastructure appropriate to its stewardship role and organisational characteristics.
There’s more to discover about how these organisational and dataset characteristics might influence the technical options and to what extent. Thinking about how to take this forward:
- We still think we will find patterns, particularly within stewardship roles, but we need to build an evidence base; using the frameworks above we can explore organisations’ technical infrastructure and how it enables their operation.
- Audiences with different levels of technical understanding will need appropropriate guidance. We need to carry out user needs research to discover what forms of advice organisations need and when.
- Coordinate with planning workshops and landscaping activities to iterate our ideas and guidance.
This was a short, ambitious research project spanning depth and breadth; more than we can cover alone. We think this could be the start of some fruitful partnerships to develop what we think we’ve found, whether as investigators or technical advisors.
We’re curious to know, in what ways could these simple resources be helpful? Is there any nuance they’re missing? What do you think the pivotal characteristics or questions are? Have you found technical solutions that always crop up in similar circumstances?
If you’d like to experiment with early resources, offer yourselves as a case study, or would like to partner with us as fellow investigators we’d also love to hear from you.