Fake Chrome OS Screenshots Punk Tech Media Mystery Blogger Comes Clean
Reader question: I have a online database of horror movies, and I have a good Google rank. In my traffic logs I noted the last month a really growing of the bandwidth: one of the most important browsers of the server logs is Googlebot, so this traffic was generated for the spidering engine of Google. I have the 20 Gb bandwidth limit and I don't want to pay for excess, so I disable Google into my Web site. My question is:
If I disable Google to my Web site, its possible Google.com erase or drop down my Web site for his directory?
Many thanks for your time and keep up the good work.
Answer: Many thanks for posting this question because Web server issues and excluding robots are a very important aspect of search engine marketing (SEM). The reader did not specifically state how he kept Googlebot from spidering his site. I am assuming that the reader used the Robots Exclusion Protocol.
Robots Exclusion Protocol
The Robots Exclusion Protocol is a means of instructing robots (or spiders) from crawling a site. With the Robots Exclusion Protocol, Web site owners can instruct search engine spiders to not index individual Web pages, subdirectories, or even an entire site. Instructions can also be tailored for individual search engines.
There are two types of robots exclusion: a meta tag or a text file.
To let Google know that you do not want a page crawled, you can create the following meta tag:
<META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW">
To let all search engine spiders know that you do not want a page crawled, you can create the following meta tag:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
For this tag to be effective on a whole site, you will have to place this tag on every page of your site. This process can be quite boring and time consuming. For that reason, I prefer to use the robots exclusion text file, commonly referred to as robots.txt, because it can easily be applied to an entire site.
The robots.txt is a text file that you place on your server that instructs search engine spiders to NOT record the information in specified areas on your Web site, and not to follow the links on your Web site. In other words, text file lets the search engine spiders know which sections of your site are off limits.
I usually create my robots.txt files in NotePad (PC) or SimpleText (Mac). But you can create simple text files in HTML software such as Dreamweaver.
Google will request the robots.txt file before trying to index any page within your site. For example, if do not want Google to record any of the information on the site, type the following text into a text editor:
User-agent: Googlebot
Disallow: /
Be sure to save the file as robots.txt. Do not use any other file extension. If you save the file as a Word document and call it robots.doc, Google will ignore that file.
When search engines crawl to frequently
I understand the reader's concern about bandwidth. If Google or any search engine crawls a site too frequently, it takes up bandwidth. All of us pay for bandwidth.
However, when you instruct Google (or any search engine) to not crawl your site, you are essentially communicating, "Don't show my Web pages in your search results."
I do not believe the reader's intention was to exclude all of his Web pages from Google search engine results pages (SERPs). He just wants Google not to request pages from his server so often.
Google actually has a Web page with this information and an email address. This is a direct quote from Google's Webmaster FAQs page:
"Please send an email to googlebot@google.com with the name of your site and a detailed description of the problem. Please also include a portion of the weblog that shows Google accesses, so we can track down the problem more quickly on our end."
The URL for the information on this page is at http://www.google.com/webmasters/faq.html.
When to use the Robots Exclusion Protocol
Some content is not important to site visitors and search engines, such as items in a CGI-BIN directory. When your target audience searches for information, they are not interested in your site's programs that generate your forms or your drop-down menus. They are not interested in a section of a Web site that is under construction. They are not interested in redundant content, either. Using the Robots Exclusion Protocol ensures that unnecessary information is not shown in search results pages.
For more details about the Robots Exclusion Protocol, please visit: http://www.robotstxt.org/wc/faq.html.
Shari Thurow is Marketing Director at Grantastic Designs, Inc., a full-service search engine marketing, web and graphic design firm. This article is excerpted from her book, Search Engine Visibility (http://www.searchenginesbook.com) published in January 2003 by New Riders Publishing Co. Shari can be reached at shari@grantasticdesigns.com.
Shari Thurow Answers SEO Questions: Click Here For Free Answers
Fake Chrome OS Screenshots Punk Tech Media
15 Comments
Thanks Shari
Thanks for sharing, some good pints must be helpful for all of us. Keep writing.
I have a Question
In the article you are discussing about bandwidth. Is Google use our site bandwidth is that's you mean?
I have a Question
In the article you are discussing about bandwidth. Is Google use our site bandwidth is that's you mean?
Great post ..... its helpful
Great post ..... its helpful for new guys …
________________
Mark
Moving pictures
I like the article
I like the NoFollow tags, i knew it but the article explained and removed some doubts. great
Robots.txt and NoFollow
Robots.txt is for folders, and NoFollow tag is for Outgoing Links... Itz pretty clear now. thanks
(WPN reader)
Nice way to disallow robots. Thanks to share Mam
I block google from
I block google from accessing pages on my site such as "about us" etc. It doesn't really help me in any way for those pages to get indexed.
Also, some webmasters who post duplicate content can also benefit from not allowing googlebot to index their page. You can have thousands of pages on your site with dup content but place adsense on them. So when people get on your site through pages that are indexed, they can access the other pages with dup content on them.
Good for test sites.
Its very wise to block google and other/or all search engines from crawling your 'test' site to avoid duplicate contents which may hurt your search engine rankings.
Using robots.txt helps fine tune sections on your website which may not be of interest to readers and also helps on the bandwidth.
For a test site, use the following inside robots.txt file and place them inside your home directory:-
User-agent: *
Disallow: /
Then go to any free online sitemap generator such as xml-sitemaps.com and generate a sitemap for your site. Check and see if it works. The sitemap should not return any links from your site.
Beaded Table Placemats, Beaded Purse, Beaded Pencil Case and Borneo Crafts & Gifts
Privacy
There are lots of reasons a site owner might want to keep a search engine out of part of his site. Thanks for the great tutorial.
Disabling Google and Other Search Engines From Crawling a Site
I also don`t understand why would someone want Google not to index his web site, when google is one of the best source of traffic.
crwaling of a site
is it possible to crwal a spider from a specific search engine like google only??
Robots.txt
Robots.txt is the preferred way of disabling search engines from crawling a site. But for blogs where most of the times access to root directory is not provided the job becomes tedious. it is especiallly true if you have to hide some of your pages and show others.
Get more information on Organic search Engine optimization and how to get more hits on a blog through this ?
Sandy
How to increase Adsense income?
Great topic
I was a bit cofused on the topic (I am still cofused a bit). why on the earth a webmaster will choose not to crawl his pages when google is one of the best source of traffic.I read somewhere that if you have many forms of same content (such as html form and orint form) then you apply it but applying robots.txt to whole of the site is absurd i thin.
Delhi india & Delhi travel guide
What legal right does Google have to crawl my website?
Hi,
While most discussion on the web is about how to get a website IN to Google, I recently had a client who did NOT want to be listed in Google (for whatever reason!).
So, he asked me, "Does Google have the legal right to crawl my website and list information in its search engine?"
My response was to tell him that once a website is created online it becomes part of the public domain and therefore search engines are entitled to (and do) visit the site with their crawlers, spiders, robots, etc.
Is this the correct legal answer?
Tony
Post new comment