Google has been making a big deal about wanting to be more transparent about its search algorithm lately (without revealing the secret sauce too much of course). And so far, I have to say they’re making good on that promise fairly well.
Is Google being transparent enough for your liking? Let us know in the comments.
We’ve seen plenty of algorithmic announcements made from the company over the course of the year. In November, they discussed ten recent changes they had made. Here’s a recap of those:
- Cross-language information retrieval updates: For queries in languages where limited web content is available (Afrikaans, Malay, Slovak, Swahili, Hindi, Norwegian, Serbian, Catalan, Maltese, Macedonian, Albanian, Slovenian, Welsh, Icelandic), we will now translate relevant English web pages and display the translated titles directly below the English titles in the search results. This feature was available previously in Korean, but only at the bottom of the page. Clicking on the translated titles will take you to pages translated from English into the query language.
- Snippets with more page content and less header/menu content: This change helps us choose more relevant text to use in snippets. As we improve our understanding of web page structure, we are now more likely to pick text from the actual page content, and less likely to use text that is part of a header or menu.
- Better page titles in search results by de-duplicating boilerplate anchors: We look at a number of signals when generating a page’s title. One signal is the anchor text in links pointing to the page. We found that boilerplate links with duplicated anchor text are not as relevant, so we are putting less emphasis on these. The result is more relevant titles that are specific to the page’s content.
- Length-based autocomplete predictions in Russian: This improvement reduces the number of long, sometimes arbitrary query predictions in Russian. We will not make predictions that are very long in comparison either to the partial query or to the other predictions for that partial query. This is already our practice in English.
- Extending application rich snippets: We recently announced rich snippets for applications. This enables people who are searching for software applications to see details, like cost and user reviews, within their search results. This change extends the coverage of application rich snippets, so they will be available more often.
- Retiring a signal in Image search: As the web evolves, we often revisit signals that we launched in the past that no longer appear to have a significant impact. In this case, we decided to retire a signal in Image Search related to images that had references from multiple documents on the web.
- Fresher, more recent results: As we announced just over a week ago, we’ve made a significant improvement to how we rank fresh content. This change impacts roughly 35 percent of total searches (around 6-10% of search results to a noticeable degree) and better determines the appropriate level of freshness for a given query.
- Refining official page detection: We try hard to give our users the most relevant and authoritative results. With this change, we adjusted how we attempt to determine which pages are official. This will tend to rank official websites even higher in our ranking.
- Improvements to date-restricted queries: We changed how we handle result freshness for queries where a user has chosen a specific date range. This helps ensure that users get the results that are most relevant for the date range that they specify.
- Prediction fix for IME queries: This change improves how Autocomplete handles IME queries (queries which contain non-Latin characters). Autocomplete was previously storing the intermediate keystrokes needed to type each character, which would sometimes result in gibberish predictions for Hebrew, Russian and Arabic.
Now, they’ve put out a similar post on the Inside Search Blog, revealing ten more that have been made since than post.
Google lists them as follows:
- Related query results refinements: Sometimes we fetch results for queries that are similar to the actual search you type. This change makes it less likely that these results will rank highly if the original query had a rare word that was dropped in the alternate query. For example, if you are searching for [rare red widgets], you might not be as interested in a page that only mentions “red widgets.”
- More comprehensive indexing: This change makes more long-tail documents available in our index, so they are more likely to rank for relevant queries.
- New “parked domain” classifier: This is a new algorithm for automatically detecting parked domains. Parked domains are placeholder sites that are seldom useful and often filled with ads. They typically don’t have valuable content for our users, so in most cases we prefer not to show them.
- More autocomplete predictions: With autocomplete, we try to strike a balance between coming up with flexible predictions and remaining true to your intentions. This change makes our prediction algorithm a little more flexible for certain queries, without losing your original intention.
- Fresher and more complete blog search results: We made a change to our blog search index to get coverage that is both fresher and more comprehensive.
- Original content: We added new signals to help us make better predictions about which of two similar web pages is the original one.
- Live results for Major League Soccer and the Canadian Football League: This change displays the latest scores & schedules from these leagues along with quick access to game recaps and box scores.
- Image result freshness: We made a change to how we determine image freshness for news queries. This will help us find the freshest images more often.
- Layout on tablets: We made some minor color and layout changes to improve usability on tablet devices.
- Top result selection code rewrite: This code handles extra processing on the top set of results. For example, it ensures that we don’t show too many results from one site (“host crowding”). We rewrote the code to make it easier to understand, simpler to maintain and more flexible for future extensions.
Seeing just these 20 tweaks listed all together, as changes that have just been made in the past month or so, really puts it into perspective just how much Google is adjusting the algorithm. That doesn’t even include the integration of Flight Search results announced after these updates.
Google says all the time that it makes over 500 changes to its algorithm each year, and that it has over 200 signals it uses to rank results. There is always a possibility that one of these changes or signals can have a major impact on your site, as many have found out this past year with the Panda update.
Even a huge company like Yahoo is at the mercy of Google’s algorithm when it comes to search visibility, and they just finally made some big adjustments with Associate Content, not unlike what Demand Media has done this year.
Last month, Google also indicated that it is testing algorithm changes that will look more what appears above the fold of a webpage.
We’re getting close to a new year, and there’s no reason to expect Google’s changes to slow down. Google has been clear, however, that it aims to be more transparent about when these changes occur, and what those changes are. Granted, this transparency will only go so far, because Google will not make all of its signals known, and leave their results too open for gaming. That wouldn’t be good for anybody (except maybe Google’s competitors).
Google does say that these lists of algorithm changes are now a monthly series.
What do you think about the latest changes? Good or bad? Let us know in the comments.