It would probably be fair to describe the amount of data moving and changing on the web as “staggering”.
- More than one hour of video is uploaded on Youtube.com every second. If someone should wish to analyse the contents of uploads from 2.5 hours it would take a full year, without pause or sleep, to see it all. And that is just for one major service.
- More than 10 million photos are uploaded to Facebook every hour.
- Google processes over 24 petabyte (one petabyte corresponds to 1015 characters; as a comparison 1015 seconds amounts to 37.5 million years) of data per day, which again, roughly, is over a thousand times the data amount stored in the world’s largest library; The Library of Congress in Washington, DC.
Add to such examples the fact that changes occur constantly and as abundantly as content is uploaded. Websites are born and die; and pages, comments and content are revised, moved around, repeated or deleted etc. constantly by millions of people online.
The undertaking of preserving as much as possible of what can be found on the web – and how it has been developed, changed and revised – is indeed a difficult and ambitious task. And also an important one, if anyone should wish to trace thoughts, development and dissemination of ideas and beliefs, public reactions to major events, in the place where such things primarily occur in the 21st century; which is the World Wide Web.
Web archives store not only millions of web pages, but also a multitude of copies of the same pages, preserving minor and major changes…
…or at least, a sort of overview of tendencies, because, naturally; if a website changes fourteen times and was archived four times over a year, then all changes are not documented. But at least, a sort of overview of changes over the year in question was preserved.