Google Blog Search Indexing Content Differently

    December 4, 2008
    Chris Crum

Google Blog Search has undergone some changes in how it indexes content. Before, it indexed content from RSS feeds, but now it is going for full content from pages.

Google Blog Search

The new functionality of Google Blog Search is not without its bugs though. One user pointed out some problems in a Google Groups post about her name drawing results based on blogs having her in their blogrolls. This attracted a response from Jeremy Hylton of the Google Blog Search Team:

We have changed the way we index blog posts to include the full content of the page.  We’ve had occasional complaints about the use of the feed content, particularly the problem with partial feeds that you mentioned.  The indexing change has improved the results for a lot of queries, both because we have the full content of the page and because we extract links that are missing from the feeds.  The downside of this change is that we see more results that match only the blogroll and other parts of the page that are common to all of a blog’s posts.

We expected some problems from blogroll matches, but may have underestimated the impact on searches using the link: operator or where the query matches a blog or blogger’s name.  We do expect to fix the problem you’re seeing.  We’ll use the full page content, but exclude the content that isn’t really part of the post.  I’m not sure if we’ll be able to make the change before the end of the year, but we are working on it and are pretty confident that it can be solved.

Barry Schwartz mentions in a comment on Vanessa Fox’s blog that he has been seeing these changes for about a month, so the changes have evidently been around for a while without attracting much attention.