Submit Your Article
Join the WebProWorld Forum!
Text: Decrease Font Size Increase Font Size | Print Print Article | Share: Delicious Digg StumbleUpon Post to Twitter Post to Facebook

Why Your Robots.txt Blocked URLs May Show up in Google

Matt Cutts Talks Uncrawled URLs in Search Results

4 comments Tuesday, October 6, 2009

Matt Cutts has appeared in yet another Google Webmaster Video, and this time he has a whiteboard with him so he can illustrate what he's talking about. What he's talking about this time are uncrawled URLs in search results.

Cutts says Google gets a lot of complaints from webmasters who say the search engine is violating their robots.txt files, with which they intend to keep Google from crawling certain pages. Sometimes those URLs still end up in search results.

According to Matt, what is happening in most cases is that when someone's saying "I blocked example.com/go" in robots.txt, it turns out that the snippet Google returns in search results just brings back a URL with no text for the snippet. The reason for this is that Google didn't actually crawl the page.

"It did abide by robots.txt. You told us this page is blocked, so we did not fetch this page," says Matt. It is a URL reference. "We saw a link to it, but we didn't fetch the page itself," he explains.

Google didn't actually fetch the page itself, and that's why there's no text snippet. In case you were wondering what the point of showing them at all is, Cutts breaks out an example looking at the California DMV, whose site is: www.dmv.ca.gov.

Cutts notes that at one point the California Department of Motor Vehicles had a robots.txt that blocked all search engines. "Now these days pretty much every site is savvy enough, you know, at one point the New York Times and eBay and a whole bunch of different sites would use robots.txt," he says.

If someone searches for "California DMV" in Google, there's pretty much only one answer, he says. So that is the answer that Google wants to return. Luckily for Google a lot of people were linking to that page with the anchor text "California DMV". That helps Google be able to return the result without having to crawl the page.

Cutts also says that they can get descriptions from a directory like the Open Directory Project (DMOZ). He cites Nissan and Metallica.com as examples of sites that used to block Google with robots.txt. They had been listed in the Open Directory Project, however, and Google went and got the information from there to include as the snippet.

When this type of thing happens, it looks like the page was crawled, when in fact it wasn't. "So we are able to return something that can be very helpful to users without violating robots.txt by not crawling that page," says Cutts.

He also notes that when you don't want pages to show up, you can use the "noindex" meta tag at the top of the page. When Google sees this tag, it drops the page from its search results completely. Another option is the URL removal tool.

About the author:
Chris Crum has been a part of the WebProNews team and the iEntry Network of B2B Publications since 2003. Follow WebProNews on Facebook or Twitter. Twitter: @CCrum237

Informative Post

Useful information for SEO beginners...It helps to improve website traffic and search engine ranking. Thanks for sharing. Keep posting!

There were quite some

There were quite some instances when I saw some pages for which I had assigned robot.txt show up in search results. Thanks for the informative post :)

Publish A Comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
3 + 6 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
SEARCH
Polaroid Brings Back Instant Film Polaroid Brings Back Instant Film
Many of you probably remember when Polaroid ended its line of Instant Film cameras back in 2008. Fortunately, the company has decided to bring back its legendary Polaroid OneStep camera but with a modern twist.

Digittronics Exhibits Cool RCX4 Digittronics Exhibits Cool RCX4
If you're a fan of flying helicopters, or Star Wars for that matter, you are going to love the RCX4 Star Stryker from Digittronics. WebProNews caught up with Patrick Le to get a closer look at the intriguing gadget.

Washable Electronics from Seal Shield Washable Electronics from Seal Shield
How would you like to wash your keyboard? Although it sounds a bit absurd, it is possible, thanks to Seal Shield. The company produces a line of dishwasher safe products consisting of keyboards, mice, and remote controls.






Subscribe to WebProNews


Send me relevant info