Danny’s fed up with the search engine index size wars, and proposes that the biggies duke it out on a more important front: relevancy (or relevance as we like to call it over here). He proposes that they all agree on some sort of standard test and to have an independent institute or consortium run the tests.
|Do Quality And Relevance Go Hand In Hand?|
I don’t entirely agree on the idea of a common definition of relevance. Search personalization, for example, can take on a variety of forms. In theory, every user would have a personalized set of results sitting in front of them.
Language itself shifts over time. Definitions of what is true often depend on what scientific camp you’re in.
But yes, in an enlightened world, scientists do need to accept at least basic overlapping truths.
So it should be possible to start with baby steps. SE’s probably won’t agree to anything, internally or amongst themselves, that highlights things like SE index spam.
But it would be pretty easy to (a) take a broad basket of keywords and (b) some commonly-agreed benchmarks for what counts as a “spammy page” (even a scoring system); and have assessments done by (c) qualified reviewers to come up with a determination of how contaminated the major SE’s are with spam, on a diverse basket of keywords, in the top 20 listings.
RustyBrick over on Search Engine Watch Forums tried something like this, but IMHO it was too open-ended. I propose a slightly different approach to it that didn’t ask raters to determine which engine was most relevant, but rather, merely count how many pages in the top 20 on the sample keyword queries exceed a certain “spamminess score.” We’re talking about scraped pages, redirects, machine-generated gibberish pages… the real nasty stuff, which appears on a great many queries where it shouldn’t. It wouldn’t necessarily penalize sites for using spammy techniques, though.
If someone’s participating in a link farm, or cloaking, or keyword stuffing in the title tag, but the page the user sees is relevant to their query and likely to lead to a desired result (gaining real insight from original content, making a purchase from the type of vendor they were probably looking for, joining a forum, etc.), then the page shouldn’t be counted as spammy. Actually, SE’s have been thinking along those lines, too. How often have you seen someone using outdated optimization techniques like keyword stuffing in titles and tags, and yet the page itself would have ranked OK anyway, and the SE’s actually do rank it well without penalizing it? SE’s rightly look past a lot of the stuff we might consider “spammy,” as long as the page is relevant.
This type of exercise wouldn’t require us to split hairs in defining relevance. It would give us a base to work from that virtually any sentient, rational being would agree on. If snippets of gibberish content are stolen from multiple sources to create a junk page whose only purpose is to generate a bit of AdSense revenue, etc., then that’s obvious spam. Users aren’t seeking pages of stolen gibberish content… ever. Nor are they wanting a redirect to a casino site when they type “fantasy football statistics 2004.”
In other words… in the parlance of applied social sciences, we need to “operationalize” relevance so coders can actually do their jobs consistently.
Danny, Rusty, if you like this idea, count me into the working group. We could hash out a scoring system on what counts as a “spammy page,” choose a broad (but confidential, to avoid gaming by the SE’s) keyword basket, and round up coders to assess the major engines. This would give us a real “SE spamminess index” as opposed to a highly subjective “relevancy score.” It would get published in the Wall Street Journal before you know it, alongside some of those other famous SEM indexes they’ve been kicking around lately.
In 1999 Andrew co-founded Traffick.com, an acclaimed “guide to portals” which foresaw the rise of trends such as paid search and semantic analysis.