Bing Search Quality Insights: New Blog Series Sheds Light On Bing’s Inner-Workings
Google recently started a series of blog posts highlighting various points of progress it has made in its efforts to increase search quality. The results have essentially been lists of monthly changes Google has made to its algorithm and interface. This has been extremely interesting to watchers of the search industry and webmasters who are concerned with search engine optimization.
Bing is now starting a similar series, though if the first entry is any indication, we’re going to see a much more in-depth explanation about things in general, as opposed to lists of specific tweaks.
“Quality improvements in Bing are often subtle but often those little changes are the result of years of research,” says Dr. Harry Shum, Corporate Vice President of Bing R&D. “In the coming weeks and months, you will hear from members of my team on a range of topics, from the complexities of social search and disambiguating spelling errors to whole page relevance and making search more personal. We will also highlight the ideas and projects we have collaborated with colleagues from Microsoft Research and academia to advance the state of the art for our industry. We hope this will not only be useful information for our blog readers, but that they will spark conversations that help us all move the search industry forward.”
The first entry comes from Jan Pedersen, Chief Scientist for Core Search at Bing, who talks about how Bing determines “whole page relevance,” which it uses to determine not just where to rank a result on the search results page, but whether to just return a link or an “answer”.
“As with any relevance problem we start with the question of how to measure if Bing has done a good job,” explains Pedersen. “We could do this by simply asking human judges to compare the output of competing blending algorithms and assess which is better. This turns out to be a difficult judgment task that produces quite noisy and unreliable results. Instead we look at how people behave on Bing in the real world. Based on how they respond to changes we make an assumption that a better blending algorithm will move people’s clicks towards the top of the page. This turns out to be the same as saying that a block of content, or answer, is well placed if it receives at least as many clicks as the equivalently sized block of content below it — or, as we say internally, if its win rate is greater than 0.5. So a good blending algorithm will promote an answer on the page upward as long as its win rate is greater than 0.5. Armed with this metric, we can run online experiments and compare the results of competing blending algorithms giving us a realistic data set.”
Shum did note in a blog post announcing the new blog series that Bing does measure search quality with a mix of offline human judges (presumably similar to Google’s raters) and online user engagement.
“Next we investigate the available inputs into an online blending function that improves this metric,” continues Pedersen. “We can, and do, use historical anonymous click data, but this is not sufficient because it does not generalize to rare queries, or to new content with no history. So, we add in three kinds of additional inputs: confidence scores from the answer provider, query characterizations, and features extracted from other answers and web pages that will be shown on the page. For example, to learn where to place an image answer in the search results for a given query, we consider the confidence score returned from the image search provider, the ranking scores of nearby Web pages, and whether the query is marked as referring to the sort of entities that are well described by images (people, places, etc.).”
Bing actually uses over a thousand signals for blending search functions, according to Pedersen. This is in line with what Bing’s Duane Forrester has said in the past – roughly a thousand signals.
“Finally, we consider the offline and online infrastructure that will be used to create and run a blending function,” adds Pedersen. “We use a very robust, but high-performance learning method, called boosted regression trees, to automatically produce a ranking function given training data. This allows us to use many signals with the confidence that each additional signal will incrementally improve our blending function. Our training sets are fairly large, since they are mined from our billions of anonymous query session logs, so we use our large-scale data mining infrastructure, called Cosmos, to prepare the data and run offline experiments. Once a new blending function has been generated by our offline learning method, it is deployed to a serving component internally called APlus that puts all that data into action and runs after all candidate content blocks that have been generated, where it can be tested via online experimentation and finally placed into production.”
Pedersen says Bing has been focusing on applying all of this to new inputs for “temporarily relevant” answers. Think news stories that die down after a while.
If you’re really into learning about the inner-workings of search engines, it’s a pretty interesting read, and is frankly not he kind of thing we see from Bing very often. It looks like that’s changing now.
There’s not much here in terms of SEO guidance, at least in the first post, but SEO enthusiasts will no doubt want to keep an eye on the series and stay on the lookout for info and tips that could be applied to SEO strategies. Google’s change lists are a bit more useful in this regard.