Controlling How Your Site Is Indexed
Over at the Google Blog, Dan Crow is publishing a series of how-to’s about controlling what the Googlebot says about your site through the Robots Exclusion Protocol. This is the cheap-and-easy version.
Don’t Index That Link!
Crow says if you don’t want the links on specific page indexed, you can use the NOINDEX tag for that page. If, however, doing this requires you to continually add NOINDEX and remove it, say for a continually updated and redirected news page, then you may want to go another way and save yourself some grief. If that news page is found through a gateway page, add a NOFOLLOW tag to the entry page instead so the Googlebot will stay put.
The code should look like this: <META NAME="ROBOTS" CONTENT="NOFOLLOW">
Don’t Say That!
Sometimes webmasters don’t want cached versions of webpages showing up in the search results, especially on the chance that the information is dated, or has been updated. Sometimes, webmasters don’t want the usually-important "snippet" displayed in the results either.
To keep the Googlebot from creating a cached version on Google’s servers, Crow says to use the NOARCHIVE tag.
It looks like this: <META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">
For snippets, a NOSNIPPET tag is useful, and kills two birds at once. The NOSNIPPET tag automatically prevents archiving as well.
That code is: <META NAME="GOOGLEBOT" CONTENT="NOSNIPPET">