Bing Details How To Control BingBot Crawling Behavior

    May 4, 2012
    Zach Walton
    Comments are off for this post.

BingBot is the cute name that Microsoft gives the bot that crawls the Web indexing the content is returned to users who use its search function. BingBot might start to index unwarranted content or it might index enough content sometimes. Microsoft has some tips to make your Web site BingBot friendly so that the right amount of content is always crawled.

I’m sure that any Webmaster worth their domain is aware of robots.txt files. It’s the file that one uses to control the behavior of Web crawlers. It essentially tells the bot what to crawl and what not to crawl.

It’s important to note that BingBot will look for the robots.txt file at each subdomain. Say that you have a Web site like http://us.contoso.com/products.htm, BingBot will look for the file at http://us.contoso.com/robots.txt. Note that this URL is for the U.S. subdomain. If you find yourself hosting multiple subdomains for multiple countries, you’re going to need a robots.txt file for each subdomain. If not, BingBot will just assume it’s invited in and start crawling all those cyberskeletons in your subdomain’s closet.

Even though it might start to dig around your domain looking for anything and everything, BingBot is not a rude guest. It will find what it came for and not be back for a few hours. After it’s been gone for a while, it will come back and see if the robots.txt file has changed since it last visited.

Of course, BingBot proves that it’s not the great neighbor that you previously thought because it will assume it can use the default set of directives if there is not one set. That’s why the Bing team suggests that Webmasters tell BingBot the “URL paths where you want them to crawl, and the URL paths you want them not to crawl.” They also suggest that you keep only one default set of directives to reduce the number of errors.

If after all the neighborly advances of BingBot has won you over, you have the option of only allowing BingBot into your Web site. You only have to change a few lines in the robots.txt file to allow BingBot into your Web site while keeping everybody else out. BingBot will still only follow one set of directives though so be sure to guide it to what you want it to see or else BingBot will revert to its default actions.

If you don’t want BingBot getting too deep into your Web site, it does honor the Crawl-delay directive that will limit the amount of pages it can crawl. It’s important to note that Crawl-delay only sets the amount of time that BingBot is allowed to crawl your pages. Set the delay to five seconds and BingBog will crawl your Web page in small five second windows. Subdomains play a role again in that each robots.txt file is going to need its own Crawl-delay directive or BingBot will just off acting on its own again.

As a final note, the Bing Webmaster Tools has a feature that allows you to control whenever BingBot is allowed to vist your site. It’s best to have the BingBot visit more frequently during off hours to prevent any interference from high traffic. This is all determined by a positive or negative factor that BingBot will apply to itself automatically. Be warned though – if you set a Crawl-delay in the robots.txt file, it will override the Bing Webmaster Tools setting.

As you can see, BingBot is a good neighbor, but only if you train it to be. The tools presented here should help you give BingBot the directives it needs to crawl only the important parts of your Web site. If not, it will just follow the default directive and stumble around your Web site crawling all over the place. You don’t want that, do you?

  • http://www.ssrichardmontgomery.com ron

    (1) If you don’t want people/robots to see it don’t have it on the site.
    (2) As in show business all publicity is good publicity, for websites there might be a term which the robot picked up that somebody is looking for that it would not have seen had you put a restriction on it.
    (3)Take this post the mere fact of it being here will give somebody or a robot that did not know my site existed to look at it.
    (4) prove me right go have a little look, or a big one!( grin)

  • http://www.epstweb.com EP System Technologies

    Controlling the aspects of the robot.txt file is the formula all website development companies do “second nature”… However… If there are materials or useless pages that you “Do Not” want crawled… remove them if they are not essential.. This will not only improve your content files indexed, but your overall SEO formula…..

    David McGuigan – CEO
    EP System Technologies

  • http://www.cyrustechnoedge.com/ Sachin Kumar

    if you want to grow your PR than make a search engine friendly website content must be fresh.

  • http://www.webtuti.com Camisetas futbol baratas

    I am having a hard time indexing my website to Bing and Yahoo, I already sign in to Bing webmaster tools, and submited a sitemap, also addes some url to be fetched, but still no progress, although webmaster tools show that my website search results are increasing, but when i search for the keyword i don´t see my website anywhere.

    Please help!!