Bing Details How To Control BingBot Crawling Behavior
BingBot is the cute name that Microsoft gives the bot that crawls the Web indexing the content is returned to users who use its search function. BingBot might start to index unwarranted content or it might index enough content sometimes. Microsoft has some tips to make your Web site BingBot friendly so that the right amount of content is always crawled.
I’m sure that any Webmaster worth their domain is aware of robots.txt files. It’s the file that one uses to control the behavior of Web crawlers. It essentially tells the bot what to crawl and what not to crawl.
It’s important to note that BingBot will look for the robots.txt file at each subdomain. Say that you have a Web site like http://us.contoso.com/products.htm, BingBot will look for the file at http://us.contoso.com/robots.txt. Note that this URL is for the U.S. subdomain. If you find yourself hosting multiple subdomains for multiple countries, you’re going to need a robots.txt file for each subdomain. If not, BingBot will just assume it’s invited in and start crawling all those cyberskeletons in your subdomain’s closet.
Even though it might start to dig around your domain looking for anything and everything, BingBot is not a rude guest. It will find what it came for and not be back for a few hours. After it’s been gone for a while, it will come back and see if the robots.txt file has changed since it last visited.
Of course, BingBot proves that it’s not the great neighbor that you previously thought because it will assume it can use the default set of directives if there is not one set. That’s why the Bing team suggests that Webmasters tell BingBot the “URL paths where you want them to crawl, and the URL paths you want them not to crawl.” They also suggest that you keep only one default set of directives to reduce the number of errors.
If after all the neighborly advances of BingBot has won you over, you have the option of only allowing BingBot into your Web site. You only have to change a few lines in the robots.txt file to allow BingBot into your Web site while keeping everybody else out. BingBot will still only follow one set of directives though so be sure to guide it to what you want it to see or else BingBot will revert to its default actions.
If you don’t want BingBot getting too deep into your Web site, it does honor the Crawl-delay directive that will limit the amount of pages it can crawl. It’s important to note that Crawl-delay only sets the amount of time that BingBot is allowed to crawl your pages. Set the delay to five seconds and BingBog will crawl your Web page in small five second windows. Subdomains play a role again in that each robots.txt file is going to need its own Crawl-delay directive or BingBot will just off acting on its own again.
As a final note, the Bing Webmaster Tools has a feature that allows you to control whenever BingBot is allowed to vist your site. It’s best to have the BingBot visit more frequently during off hours to prevent any interference from high traffic. This is all determined by a positive or negative factor that BingBot will apply to itself automatically. Be warned though – if you set a Crawl-delay in the robots.txt file, it will override the Bing Webmaster Tools setting.
As you can see, BingBot is a good neighbor, but only if you train it to be. The tools presented here should help you give BingBot the directives it needs to crawl only the important parts of your Web site. If not, it will just follow the default directive and stumble around your Web site crawling all over the place. You don’t want that, do you?