Last year, we saw the emergence of the technology PubSubHubbub, which provides real-time notifications to subscribers of content when there is new content or updates being made. There has recently been talk about Google developing a system that would use this technology it its indexing process.
Do you want your content indexed instantly? Share your thoughts.
In fact, Google’s Matt Cutts spoke with WebProNews about this, among other things:
"Maybe some small site, you might only find a chance to crawl its pages once a week, but if that site is blogging like every 20 minutes, boom , you hit the submit button, and the search engines can find out about it," explained Cutts.
"Now the tension is that more spammers would use this as well, so you can’t just say, ‘I’m gonna index everything that everybody pushes to me.’ So finding the right balance there is tricky, but the potential is really, really exciting," he said.
"You can definitely imagine the reputable blogs getting very fast updates – the ones that we think are trustworthy, and then over time, maybe ramping that up, so that more and more people have the ability to do…just like, instant indexing," he says.
And here we see another way Google may end up looking at the trust factor, with regards to ranking.
Can We Learn from How Google Does Real-Time Search?
Liz Gannes at GigaOm recapped a few things Google senior product manager Dylan Casey said at SMX last month:
Casey said perhaps the most complex project in real time is to determine when to trigger the appearance of real-time results in search results. "We have huge internal debates on: Is this a good answer to this question, or are we just creating a tool for low-quality content?" he said.
Casey spent some effort justifying Google paying to include Twitter’s real-time firehose of tweets, saying it was an intensive technical integration on both sides, and that tweets are a fundamentally different form of communication due to the restrictions of their form. For example, Google has developed a ‘complex system’ for removing users’ public tweets that are later deleted or marked private.
Earlier this year, Amit Singhal, who has led development of real-time search at Google talked about how Google ranks tweets. According to him, Google ranks tweets by followers to an extent, but it’s not just about how many followers you get. It’s about how reputable those followers are.
Singhal likens the system to the well-known Google system of link popularity. Getting good links from reputable sources helps your content in Google, so having followers with that same kind of authority theoretically helps your tweets rank in Google’s real-time search.
"One user following another in social media is analogous to one page linking to another on the Web. Both are a form of recommendation," Singhal says. "As high-quality pages link to another page on the Web, the quality of the linked-to page goes up. Likewise, in social media, as established users follow another user, the quality of the followed user goes up as well."
Now Google’s current real-time search product is separate from the whole PubSubHubbub-based system that isn’t in place yet, but Matt’s comments about blogs being trustworthy, indicates to me that trust is going to be key in being able to push content to Google’s index in real-time. So, I wonder if a similar strategy to how Google ranks its current real-time and Twitter results will be employed in determining this kind of trust.
Does This Mean If You’re Not Trusted You Won’t Get Indexed?
"PuSH wouldn’t likely replace crawling, in fact a crawl would be needed to discover PuSH feeds to subscribe to, but the real-time format would be used to augment Google’s existing index," says Marshall Kirkpatrick, who spoke in a session on the real-time web at SXSW, which also included Google’s Brett Slatkin, one of the guys responsible for PuSH (he’s in the following video explaining the technology in simple terms).
Lots of sites out there already have PuSH technology in place. For example, WordPress and Typepad blogs have the ability to "PuSH" their content. That’s a lot of content itself. A lot of user-generated content, and that means the potential for spam is huge, which is why the trust factor is so important.
If PuSh is to be heavily utilized by the search engines, and you want your content indexed as quickly as possible, you’re going to want to do what you can to build community trust and a solid reputation. One more reason to engage in meticulous online reputation management, put out great content, and engage with the community.
Do you want to see Google index the web in real-time? Discuss here.