The Historical Development of Tracking and e-Commerce on the Danish Web

Week 1: Help identify traces of web trackers and “shopping baskets” in archived websites (what has been archived, where in the source code can we find traces) and assess the usefulness of existing tools. Help extract corpora and prepare them for analysis (cleaning, preprocessing etc.).

Week 2: Pilot/test, focusing on social media plugins. Help with scripting and processing of the data. Also test whether the same or similar methods can be applied to the analysis of e-commerce.

Week 3: Large scale analysis: mapping trackers and e-commerce functionalities on the entire corpus. This might require new or modified scripts.

Week 4: Analysing the development over time. IT developer support for the comparative analysis and for visualising the results.

Support Type: Project

Description: The purpose of the project is to map and analyse the historical development of two different (but related) technologies/functionalities on the Danish web: 1) tracking technologies (e.g. http and Flash cookies, beacons, fingerprinting, html web storage etc.), and 2) shopping baskets (e-commerce). Studies have shown a widespread use of tracking technologies and e-commerce on the live web but a historical study of the development of these technologies on the Danish web has, to our knowledge, not been done. The data for the project will be the historical Danish web as it is preserved in Netarkivet.

The main research questions are: 1) When and how has different tracking technologies been used on the Danish web, and how pervasive is the reach of companies like Facebook and Google/Alphabet? 2) When and how has different technologies for e-commerce been used on the Danish web, and which are the main companies involved in the e-commerce over time?

E-commerce and tracking is related because targeted advertising is one of the main reasons for tracking users online. Doing a study, which searches for technologies for both tracking and e-commerce, is also advantageous because we expect a similar methodology to be useful for the two types of web functionality. The project will build upon experiences gained in the ongoing project “Probing a nation’s web domain – the Historical Development of the Danish Web”, but where the Probing-project has focused on questions like the size of the web, number of files types and word frequencies, this project will aim to develop a methodology for searching for specific parts of the source code of websites. The aim is to identify the specific traces of these technologies (e.g. lines of code, names or similar), and what data sources these can be found in (the crawl.logs, the Solr index, the WARC files etc.), and then to analyse the rise, spread and possibly decline of different technologies for tracking and shopping online.

Support period: Autumn 2018, Spring 2019, and Autumn 2019

Project Team:

Principal investigator: Janne Nielsen, Assistant Professor
Department of Media and Journalism Studies, Aarhus University

Niels Brügger, Professor with Special Responsibilities
Department of Media and Journalism Studies, Aarhus University