Computerworld has a new report out about an old patent of Google's that is drawing some attention. It looks at ways Google might be ranking content in Google News, which is not only helpful for all publications trying to gain eyeballs from the aggregator, but interesting in light of how Google is dealing with unhappy publishers around the world.
Does Google News do a good job of ranking content as it should be ranked? Does it favor certain publications too much? Tell us what you think.
Computerworld reports that the application was filed a year ago, and published in July, but patent analyst Bill Slawki points out that the updated version of the patent from 2003 gets rid of some of the old media ideals.
Note: We've updated as Slawski has pointed this out.
The patent describes a number of metrics, listed as: the number of articles produced by the news source during a given time period, an average length of an article from the news source, the importance of coverage from the news source, a breaking news score, usage pattern, human opinion, circulation statistics, the size of the staff associated with the news source, the number of news bureaus associated with the news source, the number of original named entities the source news produces within a cluster of articles, the breath of coverage, international diversity, writing style, and the like.
In February of 2012, a new version of the Google patent was published as a pending application. (A second version was granted in 2012). The third version has the same name as the first version, and it has substantially the same description section as the first version. What’s different is the “claims” section. The claims section of the new version of the patent starts off with:
Gone are things like the “circulation statistics of the news source,” the “number of bureaus associated with the news source,” and other things associated with the kind of journalism that’s done in print.
Either way, the signals listed are worth taking a look at.
Now, it's important to note that just because these are listed as such in the patent, it does not mean that this is the exact recipe to Google's secret News sauce (which is separate from Google web search). Still, it does tell you some of the stuff Google might be thinking about when it comes to news stories. And of course, ranking in Google News can also put you in regular Google web results for hot news items, via Universal Search.
It's hard to say how much weight any one signal would be given, and that would likely fluctuate, anyway, based on the strength of the remaining signals, and an on article-by-article basis.
The article length metric is painted as a valid one in the Computerworld article, but I wonder how much weight that really should be given. Certainly it depends on the content of any particular article. Additional length does not always make a story better. Sometimes it's simply added fluff. More text from one source may not be as relevant as less text from the right source.
Here's what the patent says about that particular metric (labeled as the "second metric"):
The average length may be measured, for example, in words or sentences. In one implementation consistent with the principles of the invention, the second metric may be determined by determining the average length of non-duplicate articles produced by the news source. For example, it may be determined that the average length of an article from CNN is 300 words, while the average article length from Amateur News Network is 150 words. Therefore, the value of the second metric for CNN may be 300 and for Amateur News Network may be 150.
So, based on that description, it would seem that adding additional text to articles regularly, even when it is not needed, would help one better compete with CNN for rankings. Of course, even assuming Google's secret sauce is comprised of these metrics alone (and remember, "and the like" is listed with the metrics, leading one to believe there are more things Google is looking at), there's always the chance other signals will be used more strongly in some cases.
Google does like stories to have substance though. If nothing else, the Panda update taught the web as much. Still, as I discussed with ChaCha CEO (and Panda victim) Scott Jones recently, sometimes a quick answer is really better for the user. It really just depends on the case.
I wonder how valid the "number of articles produced by the news source during a given time period" metric is too. If given too much weight, one could see this signal easily burying an original source, which could come from anywhere. It wouldn't serve the niche blog (which might have a great deal more authority on a subject than a big news outlet like CNN) very well when it covers something first (because it is focused on said niche), if the story is later picked up by said big news outlet.
As Slawki points out, he pretty much made this case about the patent years ago.
This could, however, be offset by the "importance of coverage from the news source" metric, which appears to basically be how many articles a publication produces on a particular subject. For example, a publication writing 500 articles on the crash of the Columbia Shuttle (example given in patent) should rank better for this particular metric than a publication who only put out 10 articles about it. Depending on the story, this could actually benefit the industry-focused niche blog. Again, it comes down to how Google is weighing these signals against one another.
Google's Matt Cutts recently put out a video discussing news stories - specifically whether it''s better to use one article or multiple articles for developing stories. I'm not sure you could call his take on the subject the definitive answer to such a question, but he seems to prefer the one-page route. Interestingly, this seems almost contrary to the signal described above. Of course, one could see how such a metric could be ripe for abuse, but that all depends on how Google is able to fight this kind of spam. I'd still recommend only writing relevant articles, and not just blasting out a bunch of useless stories about a subject.
You can see further description of each of the metrics by reading the patent here. Scroll down to the "Exemplary Processing" section.
Google recently launched a new News Keywords meta tag to give it an additional signal for ranking news content. This simply allows publications to include keywords they want their stories to be associated with, making the importance of having such keywords in a title a little less important. At least that's how Google portrayed the addition.
Do you think Google is currently doing a good job of getting the right stories in front of users? Do you often see examples of where Google is getting it wrong? Let us know in the comments.