Freshness Random At Major Search Engines
Google, Yahoo, and Microsoft varied so much over a three-year study period that researchers found none of them consistently rated best at updating their indexes.
Who’s the tops at keeping their search index smelling fresh? One might think Google is the answer these days, based on anecdotal evidence.
But a report cited by Resource Shelf found the best search engine for index freshness depended on when one asked the question.
Dirk Lewandowski of Hamburg University of Applied Sciences and other researchers considered the question in a forthcoming paper from the UK-based Journal of Information Science. They looked at data from a time span of six weeks in the years 2005, 2006, and 2007: 40 daily updated pages and 30 irregularly updated pages, and tested their updates.
“A major problem identified in our research is the delay in making crawled pages available for searching, which differs from one engine to another,” the paper said. “We found that the best search engine in terms of up-to-dateness changes over the years and that none of the engines has an ideal solution for index freshness.”
The overarching necessity for search engines to keep their indexes as fresh as possible may be found in a study cited by Lewandowski on web pages changing on a regular basis:
Estimating the results of the study extrapolated over the entire Web, the authors find that there are about 320 million new pages every week. About 20 percent of the Web pages of today will disappear within a year. About 50 percent of all contents will be changed within the same period. The link structure will change even faster: About 80 percent of all links will have changed or will be new within a year.
Issues of crawl frequency and indexing patterns confront the engineers behind the algorithms. In the study, Google fared the best at returning the most results aged one day old or less for 83 percent of the pages it delivered.
MSN did a little better than Yahoo, 48 percent to 42 percent. Google did best with a mean up-to-date age for pages of 3.1 days. MSN rated 3.5 days, while Yahoo lagged with 9.8 days on average.
Also, MSN managed to refresh its index in fewer than 20 days, at least as far as the pages being studied was concerned. Lewandowski et al think Google and Yahoo will perform even worse for pages that are not updated regularly.
Google proved fastest at refreshing its pages, while MSN updated pages with a clear frequency; MSN appeared to update constantly. Yahoo came across as the worst updater in the study.