Quantcast

Disabling Google and Other Search Engines From Crawling a Site

Get the WebProNews Newsletter:


[ Search]

Reader question: I have a online database of horror movies, and I have a good Google rank. In my traffic logs I noted the last month a really growing of the bandwidth: one of the most important browsers of the server logs is Googlebot, so this traffic was generated for the spidering engine of Google. I have the 20 Gb bandwidth limit and I don’t want to pay for excess, so I disable Google into my Web site. My question is:

If I disable Google to my Web site, its possible Google.com erase or drop down my Web site for his directory?

Many thanks for your time and keep up the good work.

Answer: Many thanks for posting this question because Web server issues and excluding robots are a very important aspect of search engine marketing (SEM). The reader did not specifically state how he kept Googlebot from spidering his site. I am assuming that the reader used the Robots Exclusion Protocol.

Robots Exclusion Protocol

The Robots Exclusion Protocol is a means of instructing robots (or spiders) from crawling a site. With the Robots Exclusion Protocol, Web site owners can instruct search engine spiders to not index individual Web pages, subdirectories, or even an entire site. Instructions can also be tailored for individual search engines.

There are two types of robots exclusion: a meta tag or a text file.

To let Google know that you do not want a page crawled, you can create the following meta tag:

<META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW">

To let all search engine spiders know that you do not want a page crawled, you can create the following meta tag:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

For this tag to be effective on a whole site, you will have to place this tag on every page of your site. This process can be quite boring and time consuming. For that reason, I prefer to use the robots exclusion text file, commonly referred to as robots.txt, because it can easily be applied to an entire site.

The robots.txt is a text file that you place on your server that instructs search engine spiders to NOT record the information in specified areas on your Web site, and not to follow the links on your Web site. In other words, text file lets the search engine spiders know which sections of your site are off limits.

I usually create my robots.txt files in NotePad (PC) or SimpleText (Mac). But you can create simple text files in HTML software such as Dreamweaver.

Google will request the robots.txt file before trying to index any page within your site. For example, if do not want Google to record any of the information on the site, type the following text into a text editor:

User-agent: Googlebot
Disallow: /

Be sure to save the file as robots.txt. Do not use any other file extension. If you save the file as a Word document and call it robots.doc, Google will ignore that file.

When search engines crawl to frequently

I understand the reader’s concern about bandwidth. If Google or any search engine crawls a site too frequently, it takes up bandwidth. All of us pay for bandwidth.

However, when you instruct Google (or any search engine) to not crawl your site, you are essentially communicating, “Don’t show my Web pages in your search results.”

I do not believe the reader’s intention was to exclude all of his Web pages from Google search engine results pages (SERPs). He just wants Google not to request pages from his server so often.

Google actually has a Web page with this information and an email address. This is a direct quote from Google’s Webmaster FAQs page:

“Please send an email to googlebot@google.com with the name of your site and a detailed description of the problem. Please also include a portion of the weblog that shows Google accesses, so we can track down the problem more quickly on our end.”

The URL for the information on this page is at http://www.google.com/webmasters/faq.html.

When to use the Robots Exclusion Protocol

Some content is not important to site visitors and search engines, such as items in a CGI-BIN directory. When your target audience searches for information, they are not interested in your site’s programs that generate your forms or your drop-down menus. They are not interested in a section of a Web site that is under construction. They are not interested in redundant content, either. Using the Robots Exclusion Protocol ensures that unnecessary information is not shown in search results pages.

For more details about the Robots Exclusion Protocol, please visit: http://www.robotstxt.org/wc/faq.html.

Shari Thurow is Marketing Director at Grantastic Designs, Inc., a full-service search engine marketing, web and graphic design firm. This article is excerpted from her book, Search Engine Visibility (http://www.searchenginesbook.com) published in January 2003 by New Riders Publishing Co. Shari can be reached at shari@grantasticdesigns.com.

Shari Thurow Answers SEO Questions: Click Here For Free Answers

Disabling Google and Other Search Engines From Crawling a Site
Top Rated White Papers and Resources
  • http://www.updatemate.com Tony

    Hi,

    While most discussion on the web is about how to get a website IN to Google, I recently had a client who did NOT want to be listed in Google (for whatever reason!).

    So, he asked me, "Does Google have the legal right to crawl my website and list information in its search engine?"

    My response was to tell him that once a website is created online it becomes part of the public domain and therefore search engines are entitled to (and do) visit the site with their crawlers, spiders, robots, etc.

    Is this the correct legal answer?

    Tony

  • http://web-designer-s.com/a/web-design/ web site design articles

    I also don`t understand why would someone want Google not to index his web site, when google is one of the best source of traffic.

    • http://www.magnetmarketing.in Jaininder

      is it possible to crwal a spider from a specific search engine like google only??

  • http://truelovewanted.com Get Back With my Ex

    There are lots of reasons a site owner might want to keep a search engine out of part of his site. Thanks for the great tutorial.

  • http://seethrureviews.com/Rocket-Spanish-Download-Reviews.html Rocket Spanish Download

    I block google from accessing pages on my site such as “about us” etc. It doesn’t really help me in any way for those pages to get indexed.

    Also, some webmasters who post duplicate content can also benefit from not allowing googlebot to index their page. You can have thousands of pages on your site with dup content but place adsense on them. So when people get on your site through pages that are indexed, they can access the other pages with dup content on them.

  • http://www.dubaifurnishedapartments.com Dubai Hotels and Apartments

    Nice way to disallow robots. Thanks to share Mam

  • http://www.tradehandles.co.uk Door Handles, Knobs UK

    Robots.txt is for folders, and NoFollow tag is for Outgoing Links… Itz pretty clear now. thanks

  • http://www.azhariqbal.com SEO Consultant Pakistan Karachi

    I like the NoFollow tags, i knew it but the article explained and removed some doubts. great

  • Mark

    Great post ….. its helpful for new guys

  • http://www.bhomes.com/uae/short_term_rental/index.xhtml Dubai furnished apartments

    In the article you are discussing about bandwidth. Is Google use our site bandwidth is that’s you mean?

  • http://www.bhomes.com/uae/short_term_rental/index.xhtml Dubai furnished apartments

    In the article you are discussing about bandwidth. Is Google use our site bandwidth is that’s you mean?

  • http://www.gomasterkey.com/ Real Estate Software

    Thanks for sharing, some good pints must be helpful for all of us. Keep writing.

  • http://www.pizzaliefer.de Pizza Lieferservice Duetchland

    What it has to do if i want to stop GOOGLE to jump links.. i mean i give some links and i want to disallow google to go there.

  • http://www.etifone.com Cheap International Calls

    Please suggest me, i have studied that a nofollow link is used to disallow google robot.

    is it rel=’nofollow’?

  • Kamran

    Help required , My site is http://www.cashforgoldleprechaun.com/ , No search engine is crawling it, I have made many changes , include site map , webmaster tool, Please check if there any development mistake ot any robot txt that not allowing all search engines (Google , Yahoo , Bing) to crawl my site.

  • http://www.elysianrealestate.com Dubai Real Estate

    I think the best way to stop search engine to crawl a website is placing a robots.txt file.

  • http://www.heightgainer.com jeff007

    There are ways you can prevent them from crawling your website but you must know some basic HTML code. Do a google search for the keyword “No Follow” and then add “HTML TAG” next to that and it shall help you.

    But why would you not google to crawl your website? I have a website called Grow Taller Formula and I want to be crawled as much as possible.

    Good luck and email me if you need some HTML advice I am a computer programmer. Thank you.

    Jeff

  • Join for Access to Our Exclusive Web Tools
  • Sidebar Top
  • Sidebar Middle
  • Sign Up For The Free Newsletter
  • Sidebar Bottom