Google reminds webmasters about robot invasion
Users of Google’s Webmaster Central tools have access to an effective module for creating robots.txt files for their sites.
Robots.txt holds the key for site publishers to gain proper indexing of their content by search engines. Scrupulous crawlers obey the tenets of robots.txt, spidering what the file allows them to do and avoiding paths disallowed by the webmaster.
Google posted a note about their robots.txt generator, housed in their Webmaster Tools. It permits the creation of blanket robots.txt files, or ones with more granular designations about robots and where certain ones can and cannot go.
Of course, a knowledge of robots.txt syntax and a few minutes in vi does the same thing, but there may be webmasters who prefer the comfort of a clean graphical interface to a text editor.
Google also has the advantage of a robots.txt analyzer in Webmaster Tools. This allows site publishers to test out the file and see if any of its contents could be problematic for arriving spiders.
They also noted a couple of caveats about robots.txt. First, not every search engine supports all of the possible extensions to the robots.txt standard. Second, there are unscrupulous crawlers that will ignore the file and grab whatever they can. Sensitive content should be either password protected if it needs to be online.