About a month ago, Facebook finally added the ability to find status updates and other posts in Graph Search, though the feature is still slowly rolling out (like pretty much everything else Facebook announces – “new” News Feed anyone?).
Since then, Facebook has killed the privacy setting that lets you hide from other users in search.
Now, the company has put out a new blog post discussing how Graph Search collects data, builds, updates and serves its index and ranks results.
In discussing the challenges the company faces dealing with schema, Facebook reveals says that it has 70 different kinds of data it sorts and indexes, many of which are specific to certain types of posts.
Ashoat Tevosyan, an engineer on Facebook’s search quality and ranking team, says, “With a trillion posts in the index, most queries return many more results than anyone could ever read. This leads us to the results ranking step. To surface content that is valuable and relevant to the user, we use two primary techniques: query rewriting and dynamic result scoring. Query rewriting happens before the execution of the query, and involves tacking on optional clauses to search queries that bias the posts we retrieve towards results that we think will be more valuable to the user. Result scoring involves sorting and selecting documents based on a number of ranking ‘features,’ each of which is based on the information available in the document data.”
“In total, we currently calculate well over a hundred distinct ranking features that are combined with a ranking model to find the best results,” Tevosyan adds. “We will continue to work on refining these models as we roll out to more users and listen to feedback.”
You can read the post for further explanation of the indexing process.
According to Tevosyan, the majority of Graph Search’s infrastructure, ranking and product has been accomplished by a few dozen engineers over the past year.
I guess that means that Graph Search is going to be getting more useful. It obviously still has a long way to go, and will certainly be more helpful when it hits mobile devices.