iEntry 10th Anniversary RSS Newsletter Advertising
Visit Twellow.com
Text: Decrease Font Size Increase Font Size | Print Print Article | Share: Delicious Digg StumbleUpon Post to Twitter Post to Facebook
Thursday, March 30, 2006

GoogleBot the "Spider of Doom"

A funny story is circulating in tech circles about how Googlebot inadvertently destroyed the database of a content management system (CMS) based site that took months of work to build.

As the story goes, a web development firm was given a contract to rebuild an existing site using a CMS. As the client already had a site with a significant amount of content, they took it slow and fully populated the site with all the content from the previous site. When they had finally uploaded everything, they took the site live.

"Things went pretty well for a few days after going live. But, on day six, things went not-so-well: all of the content on the website had completely vanished and all pages led to the default "please enter content" page. Whoops."

After painstaking investigation, Googlebot, the spider Google uses to find information on the web, was found to be the cause.

When one of the users entered information to the CMS (using copy and paste), he or she included an EDIT hyperlink that was left in a multi-user document. As a human error, this wouldn't normally be a problem because users are required to log-in with a password before they can make changes.

"But, the CMS authentication subsystem didn't take into account the sophisticated hacking techniques of Google's spider. As it turns out, Google's spider doesn't use cookies, which means that it can easily bypass a check for the "isLoggedOn" cookie to be "false". It also doesn't pay attention to Javascript, which would normally prompt and redirect users who are not logged on. It does, however, follow every hyperlink on every page it finds, including those with "Delete Page" in the title."

In short, Googlebot muscled its way into the CMS and followed the edit link. The rest was history, or at least that's what became of months of work. Fortunately, a recent backup of the full site was available for uploading.

Add to document.write("Del.icio.us") | Digg | Yahoo! My Web

Technorati:

Jim Hedger is the SEO Manager of StepForth Search Engine Placement Inc. Based in Victoria, BC, Canada, StepForth is the result of the consolidation of BraveArt Website Management, Promotion Experts, and Phoenix Creative Works, and has provided professional search engine placement and management services since 1997. http://www.stepforth.com/ Tel - 250-385-1190 Toll Free - 877-385-5526 Fax - 250-385-1198

News Tags: Spider, Web, Googlebot
About the author:
Jim Hedger works with Metamend Search Engine Marketing as a SEO Consultant, lead copywriter and head blog writer. Jim has been involved in the SEO field since the days of the dinosaurs and felt he had lost a personal friend when Disney went "ol' Yeller" on Infoseek. Over the course of his career, Jim has gotten drunk with Jeeves the Butler, tossed sticks to that sock-puppet dog from Pets.com and come out of a staring contest with Googlebot confidently declaring a tie. When not traveling between conferences, Jim lives with a perpetually annoyed cat named Hypertext in the Pacific techno-outport of Victoria British Columbia.

Publish A Comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
1 + 5 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
SEARCH












Subscribe to WebProNews


Send me relevant info