P6 – Fundamental Tools for web Archive research (FUTARC)

Corpus Creation, Completeness, and Versions

The aim of this project is to develop a number of fundamental tools to be used when studying web archives. Studying an entire web archive is probably the exception, and therefore research projects are very likely to delimit a certain part of a web archive as their object of study (delimted in time or space, by file types, content, html-code, or other). Therefore, tools for corpus creation are needed.

In addition, web archives are in many cases incomplete — things are missing (files, elements on a web page, etc.) — but they are also in many cases too complete — there may be several versions of ‘the same’. Therefore, tools are needed for handling incompleteness as well as versions.

Thus, the use of fundamental tools for corpus creation, completeness, and version handling can be considered a first step for any subsequent study of material in a web archive and a prerequisite for making informed and critical choices as to what to study.

Project team: Niels Brügger & Ulrich Karstoft Have