Google On Robots Exclusion Protocol

    February 23, 2007

A post on official Google blog informs about Robots Exclusion Protocol. Sometime back we informed you about a previous post on Robots.txt file.

It imparted important details to the web publishers about how they can control indexing and accessing of sites by search engines and Google itself. The important tool for the same purpose is the robots.txt file. Robots.txt file gives powerful control to site owners on how the site is searched.

The more recent post on robots exclusion protocol provides more details and examples of mechanisms to control access and indexing of your website by Google.

This post simplifies the procedure of preventing Googlebot from following a link. “Usually when the Googlebot finds a page, it reads all the links on that page and then fetches those pages and indexes them. This is the basic process by which Googlebot "crawls" the web. This is useful as it allows Google to include all the pages on your site, as long as they are linked together.” It further says that one can add the NOFOLLOW tag to a  page which tells the Googlebot not to follow any links it finds on that page.

Further on, the post intricately explains how to control caching and snippets. “Usually you want Google to display both the snippet and the cached link. However, there are some cases where you might want to disable one or both of these. For example, say you were a newspaper publisher, and you have a page whose content changes several times a day. It may take longer than a day for us to reindex a page, so users may have access to a cached copy of the page that is not the same as the one currently on your site. In this case, you probably don’t want the cached link appearing in our results."

To know more on how robots exclusion protocol can assist read the complete post.


Reddit | Furl

Bookmark WebProNews: