Blockchain’s potential role in the future of archiving

Thu Apr 19, 2018

Blockchains and other distributed ledger technologies have generated significant interest in recent years. This interest, as with many emerging technologies, seems to conform to the Gartner Hype Cycle. In the face of this hype, the ODI has tried to provide some reasoned input about the many promises of these new technologies

The practice of archiving – collecting and storing documents and other objects for posterity – has a long and varied history. Modern archives are tasked with storing today’s documents so they can be used by future generations to understand contemporary society. In the case of official government archives, they are uniquely tasked with preserving the official documents generated by the internal workings of government.

Records that are potentially sensitive now – eg records that could compromise government operations if published contemporaneously but would be useful for historical scrutiny in the future – are securely stored until a predetermined amount of time passes, at which point they become available to the wider public. While they are being stored, only archivists with appropriate permission can access them to ensure the objects are properly preserved.

Today, a whole field of academic study and numerous institutions are dedicated to examining the functioning of archives, and solving the many challenges of modern archival practice. One of the key challenges for archives is the shift from primarily physical objects to primarily digital objects. Organisational practices, in government and beyond, are increasingly shifting from physical to digital – from paper memos to emails, printed reports to PDFs, overhead projector transparencies to digital presentation slides.

This presents many problems in terms of management and storage which archivists and academics are working hard to solve. Of these problems, we are specifically focused on guaranteeing the integrity of these born-digital objects – ie that they remain unaltered while stored in the archive. While changing physical objects without obvious evidence of tampering is difficult, digital objects by their nature are relatively easy to change. This raises an important question: how can archives guarantee for citizens that the document they receive is the same document that was originally archived?

ARCHANGEL: archiving and blockchain

The goal of the ARCHANGEL project is to understand how a DLT-based system might be used as a mechanism to verify that documents have not been altered or adapted while stored in archives. The basic premise involves applying an algorithm to the object to create a hash (a one way process to create a reproducible fingerprint of a digital object) of the document which can then be stored in a distributed ledger. The hash will not reveal the content of the document but will allow anyone who has access to the original object to verify that it contains the same content as when it was first hashed.

Simplified diagram of the ARCHANGEL project proposed process
Figure 1. Simplified diagram of the ARCHANGEL project proposed process

Building and delivering this system is complicated by another challenge in digital archiving – changing digital formats. Digital formats shift over time – new ones are created, old ones are retired – and new software might no longer support the same formats supported by

previous versions. For example, some modern video players will not open older video files, which presents a fundamental challenge for digital archivists striving to preserve documents into the future.

For this project we need to ensure that the hashing algorithms we use respect how archives choose to tackle this problem – in short, hashing the content outside of the format. While this might prove easier for some objects – eg text files – it presents a much more acute challenge for formats like video. Our approach for ARCHANGEL will be to explore and prototype the creation of hashes using machine-learning methods, particularly for image and video content, rather than ‘traditional’ byte-level hashes.

In the project we will not limit ourselves to exploring the feasibility of the technology itself; we will also explore the wider context, for example the user needs and existing processes of both archivists and end-users. This work will include looking at how to create novel sustainable and open digital business models around archiving integrity and examining potential freemium models implemented through smart contracts.

On the project, we intend to work as openly as possible, for example by publishing all the code for the prototype under an open licence, as it is developed. We are keen to engage with archivists and others interested in how the project might be useful for their own work.

If you wish to get involved or find out more, email [email protected].