Quantcast

Is Googlebot Getting More Human-Like?

Google appears to be getting better at javascript, AJAX

Get the WebProNews Newsletter:
Is Googlebot Getting More Human-Like?
[ Search]

Google may be getting better at crawling javascript and Ajax.

In a Tumblr post, developer Alex Pankratov wrote this week about spotting an “ajax request issued from document.ready() callback of one website’s pages.”

“This means that the bot now executes the Javascript on the pages it crawls,” Pankratov wrote. “The IP of 66.249.67.106 is crawl-66-249-67-106.googlebot.com and the A record is a match, so this is in fact a Google Bot.”

He then shows a line, which he says “is fetched via Ajax by a Javascript function in response to the menu item click,” and adds, “Also, note the x argument – it is dynamically added and only by that specific function. This means that the bot now emulates a user clicking around the site and then seeing which actionable items lead to which additional pages.”

Sean Gallagher at Ars Technica equates this to Googlebot learning to read interactive pages more like humans. “It appears Google’s bots have been trained to act more like humans to mine interactive site content, running the JavaScript on pages they crawl to see what gets coughed up,” he writes.

Google has indicated that it is getting better at handling javascript and AJAX. Here’s a video Google’s Matt Cutts put out about how Google handles AJAX a while back:

Cutts was asked, “How effective is Google now at handling content supplied via Ajax, is this likely to improve in the future?”

He responded, “Well, let me take Ajax, which is Asynchronous Javascript, and make it just Javascript for the time being. Google is getting more effective over time, so we actually have the ability not just to scan in strings of Javascript to look for URLs, but to actually process some of the Javascript. And so that can help us improve our crawl coverage quite a bit, especially if people use Javascript to help with navigation or drop-downs or those kinds of things. So Asynchronous Javascript is a little bit more complicated, and that’s maybe further down the road, but the common case is Javascript.”

“And we’re getting better, and we’re continuing to improve how well we’re able to process Javascript,” he continues. “In fact, let me just take a little bit of time and mention, if you block Javascript or CSS in your robots.txt, where Googlebot can’t crawl it, I would change that. I would recommend making it so that Googlebot can crawl the Javascript and can crawl the CSS, because that makes it a lot easier for us to figure out what’s going on if we’re processing the Javascript or if we’re seeing and able to process and get a better idea of what the page is like.”

Speaking of Googlebot, Google also put out a new video about the hardware and software that run it.

Update: Barry Schwartz at Search Engine Roundtable says, “Google has been doing this for a while. Back in 2009 GoogleBot was executing JavaScript and in November 2011 Google began doing so with AJAX.”

Is Googlebot Getting More Human-Like?
Top Rated White Papers and Resources
  • http://dess.me Denis

    That is exactly what I am led to believe! I am even afraid searching with keywords to see my site’s rank(position). I use another browser to check my SE position because I know search engines or other websites, follow what you are doing by the cookies they install in your browser. Just what I think. I am afraid b/c my sites lost traffic two times.

  • http://seosultan.co.cc SEO Sultan

    It’s funny because I experienced this recently as well. I was pruning one of my client’s sites and noticed that they had a lot of weird links indexed in Google and upon investigating this with them, we found that Google had been accessing functions from AJAX calls.

    Now there’s two possible ways this could have happened: 1) GOogle is getting better at tracking human behavior or 2) Someone linked to the AJAX calls and Google found the link and spidered it.

    I want to believe it’s #1 but it could just as well be #2 although I was unable to find any trace or referrer information to prove #2 was the case.

  • http://www.davidgriebel.com/ David Griebel

    Google bot is indeed executing javascript.

    Even worse, it’s locating “Page Methods” – which are .Net’s simplified form of a web service. My page has javascript ajax calls to the page method. Inside the page method I have tracking which proves that ip 66.249.67.177 (.googlebot.com) called my page, then 17 seconds later, it found my page method and executed it.