KDD 2014 on personalized search result diversification online

Our KDD 2014 paper “Personalized search result diversification via structure learning” by Shangsong Liang, Zhaochun Ren and Maarten de Rijke is available online now.

This paper is concerned with the problem of personalized diversification of search results, with the goal of enhancing the performance of both plain diversification and plain personalization algorithms. In previous work, the problem has mainly been tackled by means of unsupervised learning. To further enhance the performance, we propose a supervised learning strategy. Specifically, we set up a structured learning framework for conducting supervised personalized diversification, in which we add features extracted directly from the tokens of documents and those utilized by unsupervised personalized diversification algorithms, and, importantly, those generated from our proposed user-interest latent Dirichlet topic model. Based on our proposed topic model whether a document can cater to a user's interest can be estimated in our learning strategy. We also define two constraints in our structured learning framework to ensure that search results are both diversified and consistent with a user's interest. We conduct experiments on an open personalized diversification dataset and find that our supervised learning strategy outperforms unsupervised personalized diversification methods as well as other plain personalization and plain diversification methods.

Coling 2014 paper on temporal evidence classification online

Our Coling 2014 paper “Prior-informed distant supervision for temporal evidence classification” by Ridho Reinanda and Maarten de Rijke is available online now.

Temporal evidence classification, i.e., finding associations between temporal expressions and relations expressed in text, is an important part of temporal relation extraction. To capture the variations found in this setting, we employ a distant supervision approach, modeling the task as multi-class text classification. There are two main challenges with distant supervision: (1) noise generated by incorrect heuristic labeling, and (2) distribution mismatch between the target and distant supervision examples. We are particularly interested in addressing the second problem and propose a sampling approach to handle the distribution mismatch. Our prior-informed distant supervision approach improves over basic distant supervision and outperforms a purely supervised approach when evaluated on TAC-KBP data, both on classification and end-to-end metrics.

Now playing: The Durutti Column -- At First Sight

ECAI 2014 paper on detecting the reputation polarity of microblog posts online

Our ECAI paper “Detecting the reputation polarity of microblog posts” by Cristina Garbacea, Manos Tsagkias and Maarten de Rijke is available online now.

We address the task of detecting the reputation polarity of social media updates, that is, deciding whether the content of an update has positive or negative implications for the reputation of a given entity. Typical approaches to this task include sentiment lexicons and linguistic features. However, they fall short in the social media domain because of its unedited and noisy nature, and, more importantly, because reputation polarity is not only encoded in sentiment-bearing words but it is also embedded in other word usage. To this end, automatic methods for extracting discriminative features for reputation polarity detection can play a role. We propose a data-driven, supervised approach for extracting textual features, which we use to train a reputation polarity classifier. Experiments on the RepLab 2013 collection show that our model outperforms the state-of-the-art method based on sentiment analysis by 20\% accuracy.

Now playing: Nils Petter Molvaer -- Blue Fandango

ICML 2014 paper on K-armed dueling bandit problem online

Our ICML 2014 paper “Relative upper confidence bound for the K-armed dueling bandit problem” by Masrour Zoghi, Shimon Whiteson, Remi Munos and Maarten de Rijke is available online now.

This paper proposes a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise probabilities to select a promising arm and applying Upper Confidence Bound with the winner as a benchmark. We prove a sharp finite-time regret bound of order O(K log T) on a very general class of dueling bandit problems that matches a lower bound proven in (Yue et al., 2012). In addition, our empirical results using real data from an information retrieval application show that it greatly outperforms the state of the art.

Universiteitsdag: De robot denkt met u mee

Op 24 mei neem ik tijdens de Universiteitsdag deel aan een debat over de rol van intelligente systemen in ons leven. Andere deelnemers zijn Pieter Adriaans en Ben Kröse.


Datum: 24 mei 2014, 13:45-14:45, tijdens de Universiteitsdag in de Oudemanhuispoort. Zie

SIGIR 2014 paper on hierarchical multi-label classification of social text streams online

Our SIGIR 2014 paper on “Hierarchical multipliable classification of social text streams“ by Zhaochun Ren, Maria-Hendrike Peetz, Shangsong Liang, Willemijn van Dolen and Maarten de Rijke is available online now.

Hierarchical multi-label classification assigns a document to multiple hierarchical classes. In this paper we focus on hierarchical multi-label classification of social text streams. Concept drift, complicated relations among classes, and the limited length of documents in social text streams make this a challenging problem. Our approach includes three core ingredients: short document expansion, time-aware topic tracking, and chunk-based structural learning. We extend each short document in social text streams to a more comprehensive representation via state-of-the-art entity linking and sentence ranking strategies. From documents extended in this manner, we infer dynamic probabilistic distributions over topics by dividing topics into dynamic ``global'' topics and ``local'' topics. For the third and final phase we propose a chunk-based structural optimization strategy to classify each document into multiple classes. Extensive experiments conducted on a large real-world dataset show the effectiveness of our proposed method for hierarchical multi-label classification of social text streams.

SIGIR 2014 paper on the use of fusion for diversification online

Our SIGIR 2014 paper “Fusion helps diversification” by Shangsong Liang, Zhaochun Ren and Maarten de Rijke is available online now.

A popular strategy for search result diversification is to first retrieve a set of documents utilizing a standard retrieval method and then rerank the results. We adopt a different perspective on the problem, based on data fusion. Starting from the hypothesis that data fusion can improve performance in terms of diversity metrics, we examine the impact of standard data fusion methods on result diversification. We take the output of a set of rankers, optimized for diversity or not, and find that data fusion can significantly improve state-of-the art diversification methods. We also introduce a new data fusion method, called diversified data fusion, which infers latent topics of a query using topic modeling, without leveraging outside information. Our experiments show that data fusion methods can enhance the performance of diversification and DDF significantly outperforms existing data fusion methods in terms of diversity metrics.

SIGIR 2014 paper on microblog retrieval online

Our SIGIR 2014 paper “A syntax-aware re-ranker for microblog retrieval” by Aliaksei Severyn, Alessandro Moschitti, Manos Tsagkias, Richard Berendsen and Maarten de Rijke is online now.

We tackle the problem of improving microblog retrieval algorithms by proposing a robust structural representation of (query, tweet) pairs. We employ these structures in a principled kernel learning framework that automatically extracts and learns highly discriminative features. We test the generalization power of our approach on the TREC Microblog 2011 and 2012 tasks. We find that relational syntactic features generated by structural kernels are effec- tive for learning to rank (L2R) and can easily be combined with those of other existing systems to boost their accuracy. In particular, the results show that our L2R approach improves on almost all the participating systems at TREC, only using their raw scores as a single feature. Our method yields an average increase of 5% in retrieval effectiveness and 7 positions in system ranks.

Hoe werkt een zoekmachine?

Op 18 mei geef ik een Wakker Worden lezing bij NEMO. Het onderwerp is “Hoe werkt Google?” en er zijn twee momenten waarop de lezing wordt gegeven: 11.00u en 13.00u.


Reserveren kan via de NEMO site:

SIGIR 2014 paper on evaluating intuitiveness of vertical-aware click models online

Our SIGIR paper “Evaluating Intuitiveness of Vertical-Aware Click Models” by Aleksandr Chuklin, Ke Zhou, Anne Schuth, Floor Sietsma and Maarten de Rijke is available online now.

Modeling user behavior on a search engine result page is important for understanding the users and supporting simulation experiments. As result pages become more complex, click models evolve as well in order to capture additional aspects of user behavior in response to new forms of result presentation.

We propose a method for evaluating the intuitiveness of vertical-aware click models, namely the ability of a click model to capture key aspects of aggregated result pages, such as vertical selection, item selection, result presentation and vertical diversity. This method allows us to isolate model components and therefore gives a multi-faceted view on a model's performance. We argue that our method can be used in conjunction with traditional click model evaluation metrics such as log-likelihood or perplexity. In order to demonstrate the power of our method in situations where result pages can contain more than one type of vertical (e.g., Image and News) we extend the previously studied Federated Click Model such that it models user clicks on such pages. Our evaluation method yields non-trivial yet interpretable conclusions about the intuitiveness of click models, highlighting their strengths and weaknesses.

SIGIR 2014 paper on personalized document re-ranking online

Our SIGIR 2014 paper “Personalized Document Re-ranking Based on Bayesian Probabilistic Matrix Factorization” by Fei Cai, Shangsong Liang and Maarten de Rijke is available online now.

A query considered in isolation provides limited information about the searcher’s interest. Previous work has considered various types of user behavior, e.g., clicks and dwell time, to obtain a better understanding of the user’s intent. We consider the searcher’s search and page view history. Using search logs from a commercial search engine, we (i) investigate the impact of features derived from user behavior on reranking a generic ranked list; (ii) optimally integrate the contributions of user behavior and candidate documents by learning their relative importance per query based on similar users. We use dwell time on clicked URLs when estimating the relevance of documents for a query, and perform Bayesian Probabilistic Matrix Factorization as smoothing to predict the relevance. Considering user behavior achieves better rankings than non-personalized rankings. Aggregation of user behavior and query-document features with a user-dependent adaptive weight outperforms combinations with a fixed uniform value.

SIGIR 2014 paper on email recipient recommendation online

Our SIGIR 2014 paper “Recipient Recommendation in Enterprises using Communication Graphs and Email Content” by David Graus, David van Dijk, Manos Tsagkias, Wouter Weerkamp and Maarten de Rijke is available online now.

We address the task of recipient recommendation for emailing in enterprises. We propose an intuitive and elegant way of modeling the task of recipient recommendation, which uses both the communication graph (i.e., who are most closely connected to the sender) and the content of the email. Additionally, the model can incorporate evidence as prior probabilities. Experiments on two enterprise email collections show that our model achieves very high scores, and that it outperforms two variants that use either the communication graph or the content in isolation.

Twitter analyse in De Correspondent

De Correspondent heeft een interessant stuk over Twitter analyse, met een gedeelte waarin ik wat uitspraken doe over het gebruik van Twitter voor het doen van voorspellingen over offline gebeurtenissen.


Vrij Nederland over sociale media analyse

Vrij Nederland heeft een uitgebreid stuk over het gebruik van sociale media en sociale media analyse door politie- en inlichtingendiensten. Ik doe daarin enige uitspraken over het relatieve gemak waarmee allerlei persoonlijke kenmerken kunnen worden afgeleid uit sociale media, zowel uit tekst als uit andere signalen zoals “likes.”


ECIR 2014 proceedings online

With 10 days to go to ECIR 2014, the ECIR 2014 proceedings are online now. You can find information about it at the Springer web site or access the online version directly at this page.

Now playing: Portico Quartet -- Rubidium

HPC grant

My colleague Lars Buitinck and I received a grant from the HPC fund to support the development of a more scalable version of xTAS, our extensible text analysis service. It’s the pipeline that we (and others) use for our text mining work. This is great news as it allows us to port modules to the new architecture that Lars has been developing for the past six months.

Now playing: Bowerbirds -- Silver Clouds

The autonomous search engine

After Tuesday’s talk on personal data mining, I gave another talk to non-experts on Thursday. This time the topic was “The Autonomous Search Engine”. The backbone of the story is the move from supervised to weakly supervised technology development of one of the core components of search engines: rankers. Weak supervision in this context means that we’re not using explicit, manually created labels provided by experts but that we’re making inferences about our ranker technology from naturally occurring interactions with the search engine.

Weakly supervised ways of evaluating rankers and weakly supervised ways of learning to combined the outputs of multiple rankers are being studied a lot. And solutions are being used outside the lab. The next holy grail here is to create new rankers in a weakly supervised way, again based on the implicit signals that we get from user interactions. The audience, consisting of information professionals for whom expert judgments are the norm, was obviously critical but interested, with a number of great questions and a range of suggestions. Thank you!

Now playing: The Durutti Column -- Stuki

Life mining talk

I gave a talk aimed at the general public on personal data mining last night, in Maastricht. The talk is about explaining what type of information can be mined from the content of open sources (news, social media, etc) using state of the art search and text mining technology. And the focus is on extracting personal information, whether this is location or music listening behavior or health or personality traits. It’s not the first time I’ve given this talk and follow-ups are planned for later this Spring. The message is at the end of the talk is very simple: the content of open sources can be incredibly revealing and therefore incredibly sensitive.

Because of the ongoing NSA revelations, I decided to add some material on what can be mined from metadata. The message is the same: like the content of open sources, the metadata can be incredibly revealing and therefore sensitive too. There were good and interesting questions, both on the reach of search and text mining technology and also on the balance between sharing digital trails as many people stand to benefit on the one hand and ownership of such trails on the other hand. Thanks to everyone who attended!

Now playing: Dakota Suite & Quentin Sirjacq -- The Side of Her Inexhaustible Heart (Part III)

Microsoft PhD Fellowship

For a proposal entitled “Leveraging Data Reuse for Efficient Ranker Evaluation in Information Retrieval”, my colleague Shimon Whiteson and I received funding. The proposal was submitted to the Microsoft Research PhD Scholarship Programme. The project is a collaboration with Filip Radlinksi and will run for three years, with a start planned in the fall. We’ll be recruiting soon.

ECIR 2014 paper on blending vertical and web results online

“Blending Vertical and Web results: A Case Study using Video Intent” by Damien Lefortier, Pavel Serdyukov, Fedor Romanenko and Maarten de Rijke is available online now.

Modern search engines aggregate results from specialized verticals into the Web search results. We study a setting where vertical and Web results are blended into a single result list, a setting that has not been studied before. We focus on video intent and present a detailed observational study of Yandex's two video content sources (i.e., the specialized vertical and a subset of the general web index) thus providing insights into their complementary character. By investigating how to blend results from these sources, we contrast traditional federated search and fusion-based approaches with newly proposed approaches that significantly outperform the baseline methods.

ECIR 2014 paper on query-dependent contextualization of streaming data online

“Query-dependent contextualization of streaming data” by Nikos Voskarides, Daan Odijk, Manos Tsagkias, Wouter Weerkamp and Maarten de Rijke is available online.

We propose a method for linking entities in a stream of short textual documents that takes into account context both inside a document and inside the history of documents seen so far. Our method uses a generic optimization framework for combining several entity ranking functions, and we introduce a global control function to control optimization. Our results demonstrate the effectiveness of combining entity ranking functions that take into account context, which is further boosted by 6% when we use an informed global control function.

ECIR 2014 paper on predicting new concepts in social streams online

"Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams” by David Graus, Manos Tsagkias, Lars Buitinck and Maarten de Rijke is available online now.

The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.

ECIR 2014 paper on cluster-based fusion for microblog search online

“The impact of semantic document expansion on cluster-based fusion for microblog search” by Shangsong Liang, Zhaochun Ren and Maarten de Rijke is available online now.

Searching microblog posts, with their limited length and creative language usage, is challenging. We frame the microblog search problem as a data fusion problem. We examine the effectiveness of a recent cluster-based fusion method on the task of retrieving microblog posts. We find that in the optimal setting the contribution of the clustering information is very limited, which we hypothesize to be due to the limited length of microblog posts. To increase the contribution of the clustering information in cluster-based fusion, we integrate semantic document expansion as a preprocessing step. We enrich the content of microblog posts appearing in the lists to be fused by Wikipedia articles, based on which clusters are created. We verify the effectiveness of our combined document expansion plus fusion method by making comparisons with microblog search algorithms and other fusion methods.

ECIR 2014 paper on click-based recommender evaluation online

“Effects of Position Bias on Click-Based Recommender Evaluation” by Katja Hofmann, Anne Schuth, Alejandro Bellogin and Maarten de Rijke is available online now.

Measuring the quality of recommendations produced by a recommender system (RS) is challenging. Labels used for evaluation are typically obtained from users of a RS, by asking for explicit feedback, or inferring labels from implicit feedback. Both approaches can introduce significant biases in the evaluation process. We investigate biases that may affect labels inferred from implicit feedback. Implicit feedback is easy to collect but can be prone to biases, such as position bias. We examine this bias using click models, and show how bias following these models would affect the outcomes of RS evaluation. We find that evaluation based on implicit and explicit feedback can agree well, but only when the evaluation metrics are designed to take user behavior and preferences into account, stressing the importance of understanding user behavior in deployed RSs.

Going out with Streamwatchr

A few weeks ago I visited a local high school as part of a series of efforts to get more high school kids to maintain an interest in computer science and possibly study the subject in university. I gave a sneak preview of a new version of a demo that we’ve been working on with Manos Tsagkias and Wouter Weerkamp, Streamwatchr.

Streamwatchr offers a new way to discover and enjoy music through an innovative interface. We show, in real-time, what music people around the world are listening to. Each time Streamwatchr identifies a tweet in which someone reports about the song that he or she is listening to, it shows a tile with a photo of the artist and a play button (on mouse over) that does, indeed, play the song (from Youtube). Streamwatchr collects about 500,000 music tweets per day, which is about 6 tweets per second.

Screen Shot 2013-12-29 at 13.35.17

I visited a class of 12 and 13 year olds at a local high school here in Amsterdam. As my laptop and the beamer refused to talk to each other I walked around the class room with my laptop to demo Streamwatchr, with Streamwatchr running in full screen. While walking around I talked about some of the technology behind it (entity linking, data integration, open data, etc). Occasionally, I put the laptop down on a table so that the students could interact with Streamwatchr.

I was amazed to see how addictive the interface was … a screen full of tiles, 6 of which flip and change at random every second, kept every kid glued to the laptop. Later (private) demos of Streamwatchr to friends and colleagues led to similar scenes. As I observed the high school kids interact with Streamwatchr, some interesting questions came up. What is the appeal of random changes to visual elements at a pace that seems to be slightly higher than one can actively track? Is it that there is always something on the screen that you have not seen yet? But that you think you don’t want to miss? At which speed should those random changes occur to be optimally captivating? Should the changes really be random or should they provide maximal coverage of the screen (in the obvious spatial sense) to be optimally captivating? There’s a great set of experiments to be run there --- an unexpected side product of an outreach activity.

ECIR 2014 paper on optimizing base rankers using clicks online

“Optimizing Base Rankers Using Clicks: A Case Study Using BM25” by Anne Schuth, Floor Sietsma, Shimon Whiteson and Maarten de Rijke is available online now.

We study the problem of optimizing an individual base ranker using clicks. Surprisingly, while there has been considerable attention for using clicks to optimize linear combinations of base rankers, the problem of optimizing an individual base ranker using clicks has been ignored. The problem is different from the problem of optimizing linear combinations of base rankers as the scoring function of a base ranker may be highly non-linear. For the sake of concreteness, we focus on the optimization of a specific base ranker, viz. BM25. We start by showing that significant improvements in performance can be obtained when optimizing the parameters of BM25 for individual datasets. We also show that it is possible to optimize these parameters from clicks, i.e., without the use of manually annotated data, reaching or even beating manually tuned parameters.

WSDM 2014 paper on efficient on-line ranker evaluation online

“Relative Confidence Sampling for Efficient On-Line Ranker Evaluation” by Masrour Zoghi, Shimon Whiteson, Maarten de Rijke and Remi Munos is available online now.

A key challenge in information retrieval is that of on-line ranker evaluation: determining which one of a finite set of rankers performs the best in expectation on the basis of user clicks on presented document lists. When the presented lists are constructed using interleaved comparison methods, which interleave lists proposed by two different candidate rankers, then the problem of minimizing the total regret accumulated while evaluating the rankers can be formalized as a K-armed dueling bandits problem. In this paper, we propose a new method called relative confidence sampling (RCS) that aims to reduce cumulative regret by being less conservative than existing methods in eliminating rankers from contention. In addition, we present an empirical comparison between RCS and two state-of-the-art methods, relative upper confidence bound and SAVAGE. The results demonstrate that RCS can substantially outperform these alternatives on several large learning to rank datasets.