Bing revealed in a blog post this week that it rolled out an update to its algorithm a few months ago that targets URL keyword stuffing. They had alluded to such an update in another recent post.
Igor Rondel, Principal Development Manager for Bing Index Quality writes:
Like any other black hat technique, the goal of URL KWS, at a high level, is to manipulate search engines to give the page a higher rank than it truly deserves. The underlying idea unique to URL KWS relies on two assumptions about ranking algorithms: a) keyword matching is used and b) matching against the URL is especially valuable. While this is somewhat simplistic considering search engines employ thousands of signals to determine page ranking, these signals do indeed play a role (albeit significantly less than even a few years ago.) Having identified these perceived ‘vulnerabilities’, the spammer attempts to take advantage by creating keyword rich domains names. And since spammers’ strategy includes maximizing impressions, they tend to go after high value/ frequency/ monetizable keywords (e.g. viagra, loan, payday, outlet, free, etc…)
Approaches commonly used by spammers, as Rondel lists, include: multiple hosts with keyword-rich hostnames; host/domain names with repeating keywords; URL cluster across same domain, but varied host names comprised of keyword permutations; and URL squatting.
Rondel notes that not all URLs containing multiple keywords are spam, and that the majority actually aren't. For this reason, Bing is using its new detection technique in combination with other signals.
"Addressing this type of spam is important because a) it is a widely used technique (i.e. significant SERP presence) and b) URLs appear to be good matches to the query, enticing users to click on them," he says.
Bing isn't giving out all the details about its detection algorithms to prevent abuse, but does note that it takes into account things like: site size; number of hosts; number of words in host/domain names/path; host/domain/path keyword co-occurrence; percentage of the site cluster comprised of top freqeuncy host/domain name keywords; host/domain names containing certain lexicons/pattern combinations; and site/page content quality and popularity signals.
Image via Bing