WayBackMachine Sued Over…?

    July 14, 2005

Apparently, once you put a web site out in a public domain area, there’s still some contention over who “owns” the right to display the content, even after the site in question has been removed.

This seems to be the angle that’s being pursued by the company Healthcare Advocates, who are suing one of the more popular web sites in our cyberspace, the Internet Archive. For those who may not be familiar, the Wayback Machine is a non-profit digital archive that keeps a library of that offers: “permanent access for researchers, historians, and scholars to historical collections that exist in digital format.”

WayBackMachine Sued Over...?

Essentially, the Wayback Machine keeps a copy of almost every site or page that appears on the open Internet. If you’d like, you can check out what WebProNews looked like in its infancy. Needless to say, this site is very useful and informative. However, it appears as if not everyone is happy with they way they collect and store this information, especially when its used against them in a court of law.

Because of this, the Wayback Machine is being sued by the aforementioned Healthcare Advocates for storing copies of the company’s pages that were actually used against Healthcare Advocates in a separate lawsuit. The legal representatives state that by Wayback’s storage of the pages in question, the archive violated the site’s robot.txt file, which attempted to block their old pages from being crawled. The legal team representing Healthcare Advocates seem to believe a robot.txt file is legally binding, which the New York Times correctly pointed out as being false.

As Danny Sullivan points out, the robot.txt is a voluntary method sites use to dissuade crawlers from indexing these pages, not a legally binding contract. This is important because Wayback is not being sued for copyright infringement, they are being sued for violating a mythical agreement the Healthcare Advocates developers believed their robot.txt file gave them with other crawlers

However, even though robot.txt files are not legally binding, Wayback still respects them and will remove pages if the robot.txt tells it to. This is made apparent on their site exclusion page, which explains how to use a robots.txt file in order to prevent their crawlers from indexing:

To exclude the Internet Archive’s crawler (and remove documents from the Wayback Machine) while allowing all other robots to crawl your site, your robots.txt file should say:

User-agent: ia_archiver
Disallow: /

Perhaps Healthcare Advocates wrote their robots.txt incorrectly?

Chris Richardson is a search engine writer and editor for WebProNews. Visit WebProNews for the latest search news.