Might blockchain have a role to play in the future of archiving?

Blockchains and other distributed ledger technologies have generated significant interest in recent years. This interest, as with many emerging technologies, seems to conform to the Gartner Hype Cycle. In the face of this hype, the ODI has tried to provide some reasoned input about the many promises of these new technologies.

People to talk to about this

Blockchains and other distributed ledger technologies (DLTs) have generated significant interest in recent years. This interest, as with many emerging technologies, seems to conform to the Gartner Hype Cycle (a graphic representation of the maturity and adoption of technologies). In the face of this hype, the Open Data Institute has tried to provide some reasoned input about the many promises of these new technologies. As well as providing commentary, we carried out research and development in 2017 to ask questions of blockchain as a technology and its applicability to certain types of problems.

While we believe that applying a critical lens to emerging technologies is always useful, we are equally keen not to dismiss them out of hand. For us, the most important facet of a new technology is that its application is useful for solving a tangible problem. Emerging technologies that are intellectually interesting but do not solve people’s or organisations’ problems better than existing solutions, are not inherently valuable to society. When it comes to blockchain and other DLTs, we have always been open to the possibility that they might solve tangible problems and that these solutions should be explored.

Modern archival challenges

We believe we have found one problem for which a blockchain-based solution is worth exploring: guaranteeing the integrity of digital archives. For context, the practice of archiving – collecting and storing documents and other objects for posterity – has a long and varied history. Modern archives are tasked with storing today’s documents so they can be used by future generations to understand contemporary society. In the case of official government archives, they are uniquely tasked with preserving the official documents generated by the internal workings of government.

Records which are potentially sensitive now – that could compromise government operations if published contemporaneously but would be useful for historical scrutiny in the future – are securely stored until a predetermined amount of time passes, when they become available to the wider public. While they are being stored, only archivists with appropriate permission can access them to ensure the objects are properly preserved.

Today, a whole field of academic study and numerous institutions are dedicated to the functioning of archives, and solving the many challenges of modern archival practice. One of the key challenges for archives is the shift from primarily physical objects to primarily digital objects. Organisational practices, in government and beyond, are increasingly shifting from physical to digital – from paper memos to emails, printed reports to PDFs, overhead projector transparencies to digital presentation slides. This presents many problems in terms of management and storage which archivists and academics are working hard to solve. Of these problems, we are specifically focused on guaranteeing the integrity of these born-digital objects – that they remain unaltered while in the archive. While changing physical objects without obvious evidence of tampering is very difficult, digital objects by their nature are relatively easy to change. How can archives provide citizens with the guarantee that the document they receive is the same document that was originally archived?

ARCHANGEL: archiving and blockchain

We have partnered with the National Archives and the University of Surrey to undertake an research project, funded by the Engineering and Physical Sciences Research Council (EPSRC) called ARCHANGEL, which will run until December 2018. The goal of the ARCHANGEL project is to understand how a DLT-based system might be used as a mechanism to verify that documents have not been altered or adapted while stored in archives. The basic premise involves applying an algorithm to the object to create a hash (a one way process to create a reproducible fingerprint of a digital object) of the document which can then be stored in a distributed ledger. The hash will not reveal the content of the document but will allow anyone who has access to the original object to verify that it contains the same content as when it was first hashed.

Simplified diagram of the ARCHANGEL project proposed process
Figure 1. Simplified diagram of the ARCHANGEL project proposed process

Delivering this system is complicated by another challenge in digital archiving – changing digital formats. Digital formats shift over time – new ones are created, old ones are retired – and new software might no longer support them even if it is the same software developer. For example, you cannot open some older video files in modern video players which presents a fundamental challenge for digital archivists wishing to preserve documents into the future.

For this project we need to ensure that the hashing algorithms we use respect how archives choose to tackle this problem – in short hashing the content outside of the format. While this might prove easier for some objects, for example text, it presents a much more acute challenge for formats such as video. ARCHANGEL’s approach to this is to explore and prototype the creation of hashes using machine-learning methods, particularly for image and video content, rather than ‘traditional’ byte-level hashes.

In the project we will not be limited to the feasibility of the technology itself but also the wider context, for example exploring the user needs and existing processes of both archivists and end-users. This work will include looking at how to creating novel sustainable and open digital business models around archiving integrity, including examining potential freemium models implemented through smart contracts.

On the project, we intend to work as openly as possible, for example by publishing all the code for the prototype under an open licence, as it is developed. We are keen to engage with archivists and others interested in how the project might be useful for their own work.

If you wish to get involved or find out more, email [email protected].