Google Starts Controversial Form Crawling Program

Small number of sites in initial rollout

Get the WebProNews Newsletter:

[ Search]

Googlebot received an update that permits it to complete certain forms, and learn more about the site hosting them.

Websites place content behind forms for the purpose of collecting information from a visitor requesting access to it. The site publisher might want those details for demographic details to improve marketing campaigns, for example.

Google thinks it can present better results to searchers by having access to the URLs behind forms, improving the site’s exposure in the process. The Google Webmaster Central blog promised their crawls will be well-behaved:

Only a small number of particularly useful sites receive this treatment, and our crawl agent, the ever-friendly Googlebot, always adheres to robots.txt, nofollow, and noindex directives. That means that if a search form is forbidden in robots.txt, we won’t crawl any of the URLs that a form would generate. Similarly, we only retrieve GET forms and avoid forms that require any kind of user information.

However, concerns have been raised about Google crawling forms not marked as forbidden. Kevin Heisler complained at Search Engine Watch the practice could violate the privacy of corporate data.

Though confident in Google’s intentions, Heisler thinks potential backlash from corporate interests could be a problem. “The costs to CEOs, CIOs and CTOs at corporations far outweigh the benefits to consumers,” he said.

Google Starts Controversial Form Crawling Program
Top Rated White Papers and Resources
  • http://www.netndx.com sean

    before too long gbot will break through captchas and start signing up as a user. All in the interest of helping the consumer ;)   

    • jojo

      Those dang captchas.  I could hardly read half of them ‘specially  them warpy funky funhouse mirror ones.  Anyway Google’s intentions are good. Google knows where to draw the line and is fair about it.   It’s all for and about the benefit for users of the internet.

      • http://datawebpro.com/news VIP2020

        "It’s all for and about the benefit for users of the internet."


        For Google it’s really all about control of information and money.

  • http://www.heaven4affiliates.com Josip Barbaric

    I really wonder where this will lead!?


    I’m sure google has the best intention, and that is to get more coverage of the web, but I don’t like it!!!

  • http://www.liselibozkurtlar.com serhat yalma

    hi my name is serhat.my internet site don’t in google:(please help me

  • http://www.acneskinguide.com/ Acne

    Sounds like something Google would want to do. Why on Earth whould they need information about the website by running and utilizing a form?

    Sounds pretty ridiculous.

  • http://www.2bjewelled.com bjewelled

    to "robots.txt, nofollow, and noindex directives."?

    If so, could somebody explain why I have many pages with internal links from every other page all but one as "nofollow" yet Google  lists the number of internal links as every page? 

    Not particularly bothered, just curious.

    Otherwise, this sounds like more "personalized search" nonsense. Personally, if I am searching the web I like to make my own mind up as to what I consider relevant not have search engines trying to guess. Just because I search one topic in one way does not mean I am searching the next topic in the same way.

  • http://www.directline-holidays.co.uk directline Holidays & Flights

    Although shrewd SEOptimisers have moved dynamic content into spiderfriendly pages, some of the most detailed and rich content on the web is still ‘hidden behind’ forms.

    I fail to see any value to google of ‘hacking’ areas of websites that the website owner wishes to be private. I suspect that they would only follow forms that have specific drop-down options (not free text fields).

  • http://www.logodesignnext.com/ logodesigner

    This kind of sucks as now we’d have to noindex, nofollow even those pages which are behind forms.

  • http://www.acneskinguide.com/ Acne

    I think that the move to try and understand every little aspect of website in the index is a little ridiculous for Google. What is the purpose of it? Do they not have enough data using the 1000 other data avenues to evalute websites?

  • http://www.paraibainternational.com Tanzanite

    What is the point of this move on Google’s part? I think eventually this type of technology will ultimately risk the very privacy we have.

  • http://www.mothersfamilyrings.com Mothers Rings

    When doing a search in Yahoo for something, I received one listed that had a warning that this site sends unsolicited emails, possibly Yahoo found this out by spidering the form on their site?

  • Join for Access to Our Exclusive Web Tools
  • Sidebar Top
  • Sidebar Middle
  • Sign Up For The Free Newsletter
  • Sidebar Bottom