Google put out a pretty interesting Webmaster Help video today with Matt Cutts answering a question about a topic a lot of people would like to understand better - how Facebook and Twitter affect Google rankings.
"Facebook and Twitter pages are treated like any other pages in our web index, and so if something occurs on Twitter or occurs on Facebook, and we're able to crawl it, then we can return that in our search results," he says. "But as far as doing special, specific work to sort of say, 'Oh, you have this many followers on Twitter or this many likes on Facebook,' to the best of my knowledge, we don't currently have any signals like that in our web search ranking algorithms."
"Now let me talk a little bit about why not," he continues. "We have to crawl the web in order to find pages on those two web properties, and we've had at least one experience where we were blocked from crawling for about a month and a half, and so the idea of doing a lot of special engineering work to try and extract some data from web pages, when we might get blocked from being to crawl those web pages in the future is something where the engineers would be a little bit leery about doing that."
"It's also tricky because Google crawls the web, and as we crawl the web, we are sampling the web at finite periods of time. We're crawling and fetching a particular web page," he says. "And so if we're fetching that particular web page, we know what it said at one point in time, but something on that page could change. Someone could change the relationship status or someone could block a follower, and so it would be a little unfortunate if we tried to extract some data from the pages that we crawled, and we later on found out that, for example, a wife had blocked an abusive husband or something like that, and just because we happened to crawl at the exact moment when those two profiles were linked, we started to return pages that we had crawled."
Cutts says they worry a lot about identity because they're "sampling an imperfect web," and identity is simply hard.
"And so unless we were able to get some way to solve that impact, that's where we had better information, that's another reason why the engineers would be a little bit wary or a little bit leery of trying to extract data when that data might change, and we wouldn't know it because we were only crawling the web."
Funny, because they don't seem to be that leery about crawling Wikipedia content, which powers much of Google's Knowledge Graph, and from time to time leads to erroneous or otherwise unhelpful information being presented as the most appropriate answer to your query. Google has, in the past, presented bad Wikipedia info for hours after it was corrected on Wikipedia itself.
Cutts goes on to say that he's not discouraging the use of Twitter and Facebook, and that a lot of people get "a ton of value" from both Facebook and Twitter. He also notes that both are a "fantastic avenue" for driving visitors and traffic to your site, letting people know about news and building up your personal brand. Just don't assume that Google is able to access any signals from them.
He also says that over a "multi-year, ten-year kind of span," it's clear that people are going to know more about who is writing on the web. Google will be more likely to understand identity and social connections better over that time, he says.
Image via YouTube