Twitter Takes A Hard Look At Real-Time Search Queries
Google, which has essentially ruled the search world for years, once recognized the value of realtime search, and Twitter search in particular. The two companies had a deal, which would enable Google users to get quick, real-time results from Twitter for timely queries. Last year, that feature went away, as the two companies failed to come to terms to extend the deal.
Since then, the lack of the feature has left a glaring hole in Google’s mission to organize the world’s information and make it universally accessible. One example where this was particularly evident, around the time of Muammar Gaddafi’s death.
While Twitter has said that the two companies talk frequently, there hasn’t been much to indicate that the deal and feature will return in the near future. Meanwhile, people are searching the hell out of Twitter. And just think. Every time someone uses Twitter to do a search, it’s an example of an instance where they’re not using Google to search.
There’s more to Twitter search, however, than just sifting through the most recent things people have tweeted.
“News breaks on Twitter, whether local or global, of narrow or broad interest,” says Twitter analytics research scientist Jimmy Lin, Research Scientist, Analytics . “When news breaks, Twitter users flock to the service to find out what’s happening. Our goal is to instantly connect people everywhere to what’s most meaningful to them; the speed at which our content (and the relevance signals stemming from it) evolves make this more technically challenging, and we are hard at work continuously refining our relevance algorithms to address this. Just to give one example: search, boiled down to its basics, is about computing term statistics such as term frequency and inverse document frequency. Most algorithms assume some static notion of underlying distributions — which surely isn’t the case here!”
Twitter has put out a new study (pdf) about “churn” in tweets and real-time search queries. The abstract is as follows:
The real-time nature of Twitter means that term distributions in tweets and in search queries change rapidly: the most frequent terms in one hour may look very different from those in the next. Informally, we call this phenomenon “churn”. Our interest in analyzing churn stems from the perspective of real-time search. Nearly all ranking functions, machine-learned or otherwise, depend on term statistics such as term frequency, document frequency, as well as query frequencies. In the real-time context, how do we compute these statistics, considering that the underlying distributions change rapidly? In this paper, we present an analysis of tweet and query churn on Twitter, as a ﬁrst step to answering this question. Analyses reveal interesting insights on the temporal dynamics of term distributions on Twitter and hold implications for the design of search systems.
The study reaches the following five conclusions:
- Churn. Term distributions change rapidly—signiﬁcantly faster than in web search for the head of the distribution. Even after discounting trending terms promoted by the platform, churn rates of top real-time queries are up to four times higher than those of web searches. For the tail of the distribution, churn drops quickly, and appears to be lower than that observed in web queries.
- Unobserved terms. Similarly, rates of out-of-vocabulary words are higher for top Twitter queries, but lower at the tail of the distribution. This translates to rapid changes in the top user interests, but relative stability in the topics for which users seek real-time results.
- Update frequency. Although query churn is consistently high, during major events it can further increase dramatically, as queries change minute by minute. In fact, to maintain accurate collection statistics requires frequent term count updates—in intervals of 5 minutes or less, according to our data.
- Churn patterns. The time period in which a query remains a top one varies, as does its decay pattern; na¨ıve approaches such as ﬁxed term frequency decays may not be able to correctly model frequency changes over time.
- Predictability. Anecdotal evidence suggests that some query churn may be predicted from past observations, providing a potential source for addressing this issue.
“During major events, the frequency of queries spike dramatically,” says Lin. ” For example, on October 5, immediately following news of the death of Apple co-founder and CEO Steve Jobs, the query ‘steve jobs’ spiked from a negligible fraction of query volume to 15% of the query stream — almost one in six of all queries issued!”
Lin talks about the Japan Earthquake as another example of Twitter’s power, highlighting yet another event, in which today’s Google would be playing second fiddle to Twitter search, in terms of rapid, realtime info.
Twitter is presenting findings at the International Conference on Weblogs and Social Media (ICWSM 2012), a workshop on Social Media Visualization, and a workshop on Real-Time Analysis and Mining of Social Streams (RAMSS).