Google Flexes Robots Exclusion Protocol
Two new features added to the protocol will help webmasters govern when an item should stop showing up in Google’s web search, as well as providing some control over the indexing of other data types.
One of the features, support for the unavailable_after tag, has been mentioned previously. Google’s Dan Crow made that initial disclosure.
He has followed that up with a full-fledged post on the official Google blog about the new tag. The unavailable_after META tag informs the Googlebot when a page should be removed from Google’s search results:
This information is treated as a removal request: it will take about a day after the removal date passes for the page to disappear from the search results. We currently only support unavailable_after for Google web search results.
After the removal, the page stops showing in Google search results but it is not removed from our system.
Fully removing something from Google still requires the URL removal tool, found as one of Google’s Webmaster Central tools.
Google also extended some control over assets beyond web pages to webmasters. Those who publish PDF, audio, video, or other file types can direct the crawler on how Google should manage access to them from its index.
“We’ve extended our support for META tags so they can now be associated with any file,” said Crow. “Simply add any supported META tag to a new X-Robots-Tag directive in the HTTP Header used to serve the file.”
Supported META tags include options like noarchive, nosnippet, noindex, and unavailable_after. Google sees these as offering enough flexibility to satisfy site publishers; we imagine they have organizations like AFP and Copiepresse in mind here.