As major sites like Wikipedia and reddit prepare for a Wednesday blackout to protest the Stop Online Piracy Act and its Senate cousin the Protect IP Act, many smaller sites are also debating whether or not they want to participate in what is being called “SOPA Blackout Day.”
White there is still a great deal of debate surrounding the possible efficacy vs. consequences of going dark for an entire day to protest domestic legislation, plenty of sites will turn off on January 18th.
Check it out below, and remember, the internet must remain free.
Website outages and blackouts the right way
tl;dr: Use a 503 HTTP status code but read on for important details.
Sometimes webmasters want to take their site offline for a day or so, perhaps for server maintenance or as political protest. We’re currently seeing some recommendations being made about how to do this that have a high chance of hurting how Google sees these websites and so we wanted to give you a quick how-to guide based on our current recommendations.
The most common scenario we’re seeing webmasters talk about implementing is to replace the contents on all or some of their pages with an error message (“site offline”) or a protest message. The following applies to this scenario (replacing the contents of your pages) and so please ask (details below) if you’re thinking of doing something else.
1. The most important point: Webmasters should return a 503 HTTP header for all the URLs participating in the blackout (parts of a site or the whole site). This helps in two ways:
a. It tells us it’s not the “real” content on the site and won’t be indexed.
b. Because of (a), even if we see the same content (e.g. the “site offline” message) on all the URLs, it won’t cause duplicate content issues.
2. Googlebot’s crawling rate will drop when it sees a spike in 503 headers. This is unavoidable but as long as the blackout is only a transient event, it shouldn’t cause any long-term problems and the crawl rate will recover fairly quickly to the pre-blackout rate. How fast depends on the site and it should be on the order of a few days.
3. Two important notes about robots.txt:
a. As Googlebot is currently configured, it will halt all crawling of the site if the site’s robots.txt file returns a 503 status code for robots.txt. This crawling block will continue until Googlebot sees an acceptable status code for robots.txt fetches (currently 200 or 404). This is a built-in safety mechanism so that Googlebot doesn’t end up crawling content it’s usually blocked from reaching. So if you’re blacking out only a portion of the site, be sure the robots.txt file’s status code is not changed to a 503.
b. Some webmasters may be tempted to change the robots.txt file to have a “Disallow: /” in an attempt to block crawling during the blackout. Don’t block Googlebot’s crawling like this as this has a high chance of causing crawling issues for much longer than the few days expected for the crawl rate recovery.
4. Webmasters will see these errors in Webmaster Tools: it will report that we saw the blackout. Be sure to monitor the Crawl Errors section particularly closely for a couple of weeks after the blackout to ensure there aren’t any unexpected lingering issues.
5. General advice: Keep it simple and don’t change too many things, especially changes that take different times to take effect. Don’t change the DNS settings. As mentioned above, don’t change the robots.txt file contents. Also, don’t alter the crawl rate setting in WMT. Keeping as many settings constant as possible before, during, and after the blackout will minimize the chances of something odd happening.
You can then head on over to webmaster central to continue the discussion.