In the latest "Webmaster Help" video from Google, Matt Cutts talks about what goes into evaluating the quality of an algorithm change before they push it out.
Here's the question he was answering:
What are some of the metrics that Google uses to evaluate whether one iteration of the ranking algorithm is delivering better quality results to users than another?
"Whenever an engineer is evaluating a new search quality change, and they want to know whether it's an improvement, one thing that's useful is we have hundreds of quality raters, who have previously rated URLs as good or bad, spam, all these sorts of different things," says Cutts. "So when you make a change, you can see the flux. You can say, 'What moves up? What moves down?' And you can look at example searches where the results changed a lot, for example, and you can say, 'Okay, given the changed search results, take the URLs that moved up - were those URLs typically higher rated than the URLs that moved down by the search quality raters?'"
"And sometimes, since these are pre-computed numbers (as far as the ratings - we've already got a stored data bank of all those ratings from all the raters that we have), sometimes you'll have question marks or these empty areas where things haven't been rated," he continues. "So you can also send that out to the raters, get the results of either a side-by-side, or you could look at the individual URLs, and in a side-by-side they say, 'This set of search results is better,' or 'This set is better,' or they might say, 'This URL is good,' or 'This URL is spam,' and you use all that to assess whether you're making good progress."
"If you make further along, and you're getting close to wanting to launch something, often times you'll launch what's called a 'live experiment,' where you actually take two different algorithms (say the old algorithm and the new algorithm), and you actually take results that would be generated by one, and then the other, and for example, you might interweave them," Cutts continues. "Then, if there are more clicks on the newer set of search results, then you tend to say, 'You know what? This newer set of search results generated by this algorithm might be a little bit better than this other algorithm). And that's great, except for example, in webspam, people love to click on spam, and so sometimes our metrics look a little bit worse in webpspam, because people click on the spam, and we're like, 'Well, we got less spam, and therefore it looks like people don't like the new algorithm as much.' So you have to take all of those ratings with a little bit of a grain of salt because nothing replaces your judgment, and the judgment of the quality launch committee, but we do have a lot of different metrics.'"
Here's a video Google put out a few years ago showing how it makes "improvements" to its algorithm.
Image via YouTube