Quantcast

Unvalidated Robots.Txt Risks Google Banishment

Get the WebProNews Newsletter:


[ Search]

The web crawling Googlebot may find a forgotten line in robots.txt that causes it to de-index a site from the search engine.

Unvalidated Robots.Txt Risks Google Banishment
Unvalidated Robots.Txt Risks Google Banishment

Webmasters welcome being dropped out of Google about as much as they enjoy flossing with barbed wire. Making it easier for Google to do that would be anathema to being a webmaster. Why willingly exclude one’s site from Google?

That could happen with an unvalidated robots.txt file. Robots.txt allows webmasters to provide standing instructions to visiting spiders, which contributes to having a site indexed faster and more accurately.

Google has been considering new syntax to recognize within robots.txt. The Sebastians-Pamphlets blog said Google confirmed recognizing experimental syntax like Noindex in the robots.txt file.

This poses a danger to webmasters who have not validated their robots.txt. A line reading Noindex: / could lead to one’s site being completely de-indexed.

The surname-less Sebastian recommended Google’s robots.txt analyzer, part of Google’s Webmaster Tools, and only using the Disallow, Allow, and Sitemaps crawler directives in the Googlebot section of robots.txt.

follow me on Twitter

Unvalidated Robots.Txt Risks Google Banishment
Top Rated White Papers and Resources
  • http://ez-onlinemoney.com/blog/ Joshua Spaulding

    Not to be a smart a**, but why would it be surprising if Google deindexed a site/page that include Noindex: in the robots.txt??

    Why would you have it in there in the first place if you didn’t want to be excluded?

    Maybe I missed something?

  • http://sebastians-pamphlets.com/ Sebastian

    Thanks for the coverage Dave. :)

    Joshua, folks not familiar with the REP syntax often leave garbage or experimental statements in robots.txt. As long as the crawlers ignore those, such forgotten stuff is not a big deal. It becomes risky when a search engine experiments itself, and the engine’s interpretation doesn’t match the webmaster’s thoughts.

    • http://urwrong.org UR Wrong

      When you put noindex is is to be excluded, so this presents no threat. In fact it accomplishes your goal. It’s those spiders which ignore your wishes that you need worry about.

  • Join for Access to Our Exclusive Web Tools
  • Sidebar Top
  • Sidebar Middle
  • Sign Up For The Free Newsletter
  • Sidebar Bottom