Webmasters: Googlebot Caught in Spider Trap, Ignoring Robots.txt
Sometimes webmasters set up a spider trap or crawler trap to catch spambots or other crawlers that waste their bandwidth. If some webmasters are right, Googlebot (Google’s crawler) seems to be having some issues here.
In the WebmasterWorld forum, member Starchild started a thread by saying, “I saw today that Googlebot got caught in a spider trap that it shouldn’t have as that dir is blocked via robots.txt. I know of at least one other person recently who this has also happened to. Why is GB ignoring robots?”
Another member suggested that Starchild was mistaken, as such claims have been made in the past, only to find that there were other issues at play.
Starchild responded, however, that it had been in place for “many months” with no changes. “Then I got a notification it was blocked (via the spidertrap notifier). Sure enough, it was. Upon double checking, Google webmaster tools reported a 403 forbidden error. IP was google. I whitelisted it, and Google webmaster tools then gave a success.”
Another ember, nippi, said they also got hit by it 4 months after setting up a spider trap, which was “working fine” until now.
“The link to the spider trap is rel=Nofollowed, the folder is banned in robot.txt. The spider trap works by banning by ip address, not user agent so its not caused by a faker – and of course robots.txt was setup up correctly and prior, it was in place days before the spider trap was turned on, and it’s run with no problems for months,” nippi added. “My logs show, it was the real google, from a real google ip address that ignored my robots.txt, ignored rel-nofollow and basically killed my site.”
We’ve reached out to Google for comment, and if and when we receive a response.
Meanwhile, Barry Schwartz is reporting that one site lost 60% of its traffic instantly, due to a bug in Google’s algorithm. He points to a Google Webmaster Help forum thread where Google’s Pierre Far said:
I reached out to a team internally and they identified an algorithm that is inadvertently negatively impacting your site and causing the traffic drop. They’re working on a fix which hopefully will be deployed soon.
Google’s Kaspar Szymanski comment on Schwartz’s post, “While we can not guarantee crawling, indexing or ranking of sites, I believe this case shows once again that our Google Help Forum is a great communication channel for webmasters.”