Is Googlebot Getting More Human-Like?

Google may be getting better at crawling javascript and Ajax. In a Tumblr post, developer Alex Pankratov wrote this week about spotting an “ajax request issued from document.ready() callback of one website’s pages.” “This means that the bot now executes the Javascript on the pages it crawls,” Pankratov wrote. “The IP of 66.249.67.106 is crawl-66-249-67-106.googlebot.com […]

Google may be getting better at crawling javascript and Ajax.

In a Tumblr post, developer Alex Pankratov wrote this week about spotting an “ajax request issued from document.ready() callback of one website’s pages.”

“This means that the bot now executes the Javascript on the pages it crawls,” Pankratov wrote. “The IP of 66.249.67.106 is crawl-66-249-67-106.googlebot.com and the A record is a match, so this is in fact a Google Bot.”

He then shows a line, which he says “is fetched via Ajax by a Javascript function in response to the menu item click,” and adds, “Also, note the x argument – it is dynamically added and only by that specific function. This means that the bot now emulates a user clicking around the site and then seeing which actionable items lead to which additional pages.”

Sean Gallagher at Ars Technica equates this to Googlebot learning to read interactive pages more like humans. “It appears Google’s bots have been trained to act more like humans to mine interactive site content, running the JavaScript on pages they crawl to see what gets coughed up,” he writes.

Google has indicated that it is getting better at handling javascript and AJAX. Here’s a video Google’s Matt Cutts put out about how Google handles AJAX a while back:

Cutts was asked, “How effective is Google now at handling content supplied via Ajax, is this likely to improve in the future?”

He responded, “Well, let me take Ajax, which is Asynchronous Javascript, and make it just Javascript for the time being. Google is getting more effective over time, so we actually have the ability not just to scan in strings of Javascript to look for URLs, but to actually process some of the Javascript. And so that can help us improve our crawl coverage quite a bit, especially if people use Javascript to help with navigation or drop-downs or those kinds of things. So Asynchronous Javascript is a little bit more complicated, and that’s maybe further down the road, but the common case is Javascript.”

“And we’re getting better, and we’re continuing to improve how well we’re able to process Javascript,” he continues. “In fact, let me just take a little bit of time and mention, if you block Javascript or CSS in your robots.txt, where Googlebot can’t crawl it, I would change that. I would recommend making it so that Googlebot can crawl the Javascript and can crawl the CSS, because that makes it a lot easier for us to figure out what’s going on if we’re processing the Javascript or if we’re seeing and able to process and get a better idea of what the page is like.”

Speaking of Googlebot, Google also put out a new video about the hardware and software that run it.

Update: Barry Schwartz at Search Engine Roundtable says, “Google has been doing this for a while. Back in 2009 GoogleBot was executing JavaScript and in November 2011 Google began doing so with AJAX.”

Is Googlebot Getting More Human-Like?

Notice an error?

Ready to get started?