Page Body

Page Main

Post Main

Post Article

Michael L. Nelson, writing for Motherboard:

What Reid’s team did not appear to anticipate is that copies of her blog would appear in other web archives, one of which was the Library of Congress’s web archive, which does not honor robots.txt exclusion. In fact, three of the example blog posts her lawyers claim were fraudulent…

In this case, there are copies in two distinct (geographically and administratively) systems, but they are not independent observations. The important point is that while the robots.txt redacted the Internet Archive’s version of the page, it did not redact the version in the Library of Congress.

How can you use multiple web archives? Services archive.is and perma.cc are on-demand public web archives that allow submission of individual pages (similar to the “save page now” feature at the Internet Archive), webrecorder.io allows for the creation of personal web archives, and the Los Alamos National Laboratory Time Travel service allows for querying of multiple web archives (for example, blog.reidreport.com is held in five different web archives other than the Internet Archive.)

Even though Reid’s story has likely ended, it is only a matter of time before a similar story unfolds. For those that seek to hold public figures accountable, a more rigorous interaction with and presentation of archived pages will limit uncertainty. For those on the receiving end of such scrutiny, a more careful consideration of the scope (as well as limitations) of not just the Internet Archive but all public web archives, will better inform their response.

Paul Ciano

Enjoyed this post?

Subscribe to my feed for the latest updates.