Love Letters from GooglebotBy: Chris Crum - December 3, 2008
In a mildly weird fashion, Google answered some questions regarding HTTP status codes and "if-modified-since" from users who were either made up or had their names changed to protect their anonymity. Names like Little Jimmy, Temp O’Rary, Janet Crinklenose, and Frankie O’Fore (my apologies to any Janet Crinklenoses who may be out there, but I think I’m safe in my assumption that the name is fictitious).
"Sorry, guys – the fluffy distraction of the dating theme stopped me reading it. I’ll find another, concise article to read about this," says one comment on Google’s post. I guess this isn’t the one he’s looking for since I’m already in my second paragraph about it. But point taken. Let’s try to pick through this nonsense.
One letter talks about cleaning up a site, deleting some old pages, and whether or not 404 pages are ok. "404s are the standard way of telling me that a page no longer exists," says….umm, the GoogleBot. "I won’t be upset—it’s normal that old pages are pruned from websites, or updated to fresher content. Most websites will show a handful of 404s in the Crawl Diagnostics over at Webmaster Tools. It’s really not a big deal. As long as you have good site architecture with links to all your indexable content, I’ll be happy, because it means I can find everything I need."
The post then goes on to address similar items like 301 and 302 redirects for pages linked to by ohers that no longer exist or have been moved, and dynamic pages with changing content. The GoogleBot says for example:
"Once you’re indexed, it’s the polite way to tell your visitors that your address is still the right one, but that the content can temporarily be found elsewhere. In these situations, a 302 (or the rarer ‘307 Temporary Redirect’) would be better. For example, orkut redirects from http://orkut.com to http://google.com/accounts/login?service=orkut, which isn’t a page that humans would find particularly useful when searching for Orkut***.
It’s on a different domain, for starters. So, a 302 has been used to tell me that all the content and linking properties of the URL shouldn’t be updated to the target – it’s just a temporary page.
You can find more info in the post like how supporting the "If-Modified-Since" header and returning 304 can save bandwidth, and you may find the answers to redirect-related questions you have. That is if you can stomach the presentation of the information.