Google Starts Controversial Form Crawling Program
Googlebot received an update that permits it to complete certain forms, and learn more about the site hosting them.
Websites place content behind forms for the purpose of collecting information from a visitor requesting access to it. The site publisher might want those details for demographic details to improve marketing campaigns, for example.
Google thinks it can present better results to searchers by having access to the URLs behind forms, improving the site’s exposure in the process. The Google Webmaster Central blog promised their crawls will be well-behaved:
Only a small number of particularly useful sites receive this treatment, and our crawl agent, the ever-friendly Googlebot, always adheres to robots.txt, nofollow, and noindex directives. That means that if a search form is forbidden in robots.txt, we won’t crawl any of the URLs that a form would generate. Similarly, we only retrieve GET forms and avoid forms that require any kind of user information.
However, concerns have been raised about Google crawling forms not marked as forbidden. Kevin Heisler complained at Search Engine Watch the practice could violate the privacy of corporate data.
Though confident in Google’s intentions, Heisler thinks potential backlash from corporate interests could be a problem. “The costs to CEOs, CIOs and CTOs at corporations far outweigh the benefits to consumers,” he said.