This paper is concerned with the problem of personalized diversification of search results, with the goal of enhancing the performance of both plain diversification and plain personalization algorithms. In previous work, the problem has mainly been tackled by means of unsupervised learning. To further enhance the performance, we propose a supervised learning strategy. Specifically, we set up a structured learning framework for conducting supervised personalized diversification, in which we add features extracted directly from the tokens of documents and those utilized by unsupervised personalized diversification algorithms, and, importantly, those generated from our proposed user-interest latent Dirichlet topic model. Based on our proposed topic model whether a document can cater to a user's interest can be estimated in our learning strategy. We also define two constraints in our structured learning framework to ensure that search results are both diversified and consistent with a user's interest. We conduct experiments on an open personalized diversification dataset and find that our supervised learning strategy outperforms unsupervised personalized diversification methods as well as other plain personalization and plain diversification methods.
Temporal evidence classification, i.e., finding associations between temporal expressions and relations expressed in text, is an important part of temporal relation extraction. To capture the variations found in this setting, we employ a distant supervision approach, modeling the task as multi-class text classification. There are two main challenges with distant supervision: (1) noise generated by incorrect heuristic labeling, and (2) distribution mismatch between the target and distant supervision examples. We are particularly interested in addressing the second problem and propose a sampling approach to handle the distribution mismatch. Our prior-informed distant supervision approach improves over basic distant supervision and outperforms a purely supervised approach when evaluated on TAC-KBP data, both on classification and end-to-end metrics.
Now playing: The Durutti Column -- At First Sight
We address the task of detecting the reputation polarity of social media updates, that is, deciding whether the content of an update has positive or negative implications for the reputation of a given entity. Typical approaches to this task include sentiment lexicons and linguistic features. However, they fall short in the social media domain because of its unedited and noisy nature, and, more importantly, because reputation polarity is not only encoded in sentiment-bearing words but it is also embedded in other word usage. To this end, automatic methods for extracting discriminative features for reputation polarity detection can play a role. We propose a data-driven, supervised approach for extracting textual features, which we use to train a reputation polarity classifier. Experiments on the RepLab 2013 collection show that our model outperforms the state-of-the-art method based on sentiment analysis by 20\% accuracy.
Now playing: Nils Petter Molvaer -- Blue Fandango
This paper proposes a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise probabilities to select a promising arm and applying Upper Confidence Bound with the winner as a benchmark. We prove a sharp finite-time regret bound of order O(K log T) on a very general class of dueling bandit problems that matches a lower bound proven in (Yue et al., 2012). In addition, our empirical results using real data from an information retrieval application show that it greatly outperforms the state of the art.
Datum: 24 mei 2014, 13:45-14:45, tijdens de Universiteitsdag in de Oudemanhuispoort. Zie http://alumni.uva.nl/agenda/content/evenementen/2014/05/universiteitsdag-2014.html
Hierarchical multi-label classification assigns a document to multiple hierarchical classes. In this paper we focus on hierarchical multi-label classification of social text streams. Concept drift, complicated relations among classes, and the limited length of documents in social text streams make this a challenging problem. Our approach includes three core ingredients: short document expansion, time-aware topic tracking, and chunk-based structural learning. We extend each short document in social text streams to a more comprehensive representation via state-of-the-art entity linking and sentence ranking strategies. From documents extended in this manner, we infer dynamic probabilistic distributions over topics by dividing topics into dynamic ``global'' topics and ``local'' topics. For the third and final phase we propose a chunk-based structural optimization strategy to classify each document into multiple classes. Extensive experiments conducted on a large real-world dataset show the effectiveness of our proposed method for hierarchical multi-label classification of social text streams.
A popular strategy for search result diversification is to first retrieve a set of documents utilizing a standard retrieval method and then rerank the results. We adopt a different perspective on the problem, based on data fusion. Starting from the hypothesis that data fusion can improve performance in terms of diversity metrics, we examine the impact of standard data fusion methods on result diversification. We take the output of a set of rankers, optimized for diversity or not, and find that data fusion can significantly improve state-of-the art diversification methods. We also introduce a new data fusion method, called diversified data fusion, which infers latent topics of a query using topic modeling, without leveraging outside information. Our experiments show that data fusion methods can enhance the performance of diversification and DDF significantly outperforms existing data fusion methods in terms of diversity metrics.
We tackle the problem of improving microblog retrieval algorithms by proposing a robust structural representation of (query, tweet) pairs. We employ these structures in a principled kernel learning framework that automatically extracts and learns highly discriminative features. We test the generalization power of our approach on the TREC Microblog 2011 and 2012 tasks. We find that relational syntactic features generated by structural kernels are effec- tive for learning to rank (L2R) and can easily be combined with those of other existing systems to boost their accuracy. In particular, the results show that our L2R approach improves on almost all the participating systems at TREC, only using their raw scores as a single feature. Our method yields an average increase of 5% in retrieval effectiveness and 7 positions in system ranks.
Reserveren kan via de NEMO site: http://www.e-nemo.nl/nl/bezoek/agenda/wakker-worden-lezingen/
Modeling user behavior on a search engine result page is important for understanding the users and supporting simulation experiments. As result pages become more complex, click models evolve as well in order to capture additional aspects of user behavior in response to new forms of result presentation.
We propose a method for evaluating the intuitiveness of vertical-aware click models, namely the ability of a click model to capture key aspects of aggregated result pages, such as vertical selection, item selection, result presentation and vertical diversity. This method allows us to isolate model components and therefore gives a multi-faceted view on a model's performance. We argue that our method can be used in conjunction with traditional click model evaluation metrics such as log-likelihood or perplexity. In order to demonstrate the power of our method in situations where result pages can contain more than one type of vertical (e.g., Image and News) we extend the previously studied Federated Click Model such that it models user clicks on such pages. Our evaluation method yields non-trivial yet interpretable conclusions about the intuitiveness of click models, highlighting their strengths and weaknesses.
A query considered in isolation provides limited information about the searcher’s interest. Previous work has considered various types of user behavior, e.g., clicks and dwell time, to obtain a better understanding of the user’s intent. We consider the searcher’s search and page view history. Using search logs from a commercial search engine, we (i) investigate the impact of features derived from user behavior on reranking a generic ranked list; (ii) optimally integrate the contributions of user behavior and candidate documents by learning their relative importance per query based on similar users. We use dwell time on clicked URLs when estimating the relevance of documents for a query, and perform Bayesian Probabilistic Matrix Factorization as smoothing to predict the relevance. Considering user behavior achieves better rankings than non-personalized rankings. Aggregation of user behavior and query-document features with a user-dependent adaptive weight outperforms combinations with a fixed uniform value.
We address the task of recipient recommendation for emailing in enterprises. We propose an intuitive and elegant way of modeling the task of recipient recommendation, which uses both the communication graph (i.e., who are most closely connected to the sender) and the content of the email. Additionally, the model can incorporate evidence as prior probabilities. Experiments on two enterprise email collections show that our model achieves very high scores, and that it outperforms two variants that use either the communication graph or the content in isolation.
Now playing: Bowerbirds -- Silver Clouds
Weakly supervised ways of evaluating rankers and weakly supervised ways of learning to combined the outputs of multiple rankers are being studied a lot. And solutions are being used outside the lab. The next holy grail here is to create new rankers in a weakly supervised way, again based on the implicit signals that we get from user interactions. The audience, consisting of information professionals for whom expert judgments are the norm, was obviously critical but interested, with a number of great questions and a range of suggestions. Thank you!
Now playing: The Durutti Column -- Stuki
Because of the ongoing NSA revelations, I decided to add some material on what can be mined from metadata. The message is the same: like the content of open sources, the metadata can be incredibly revealing and therefore sensitive too. There were good and interesting questions, both on the reach of search and text mining technology and also on the balance between sharing digital trails as many people stand to benefit on the one hand and ownership of such trails on the other hand. Thanks to everyone who attended!
Now playing: Dakota Suite & Quentin Sirjacq -- The Side of Her Inexhaustible Heart (Part III)
Modern search engines aggregate results from specialized verticals into the Web search results. We study a setting where vertical and Web results are blended into a single result list, a setting that has not been studied before. We focus on video intent and present a detailed observational study of Yandex's two video content sources (i.e., the specialized vertical and a subset of the general web index) thus providing insights into their complementary character. By investigating how to blend results from these sources, we contrast traditional federated search and fusion-based approaches with newly proposed approaches that significantly outperform the baseline methods.
We propose a method for linking entities in a stream of short textual documents that takes into account context both inside a document and inside the history of documents seen so far. Our method uses a generic optimization framework for combining several entity ranking functions, and we introduce a global control function to control optimization. Our results demonstrate the effectiveness of combining entity ranking functions that take into account context, which is further boosted by 6% when we use an informed global control function.
The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.
Searching microblog posts, with their limited length and creative language usage, is challenging. We frame the microblog search problem as a data fusion problem. We examine the effectiveness of a recent cluster-based fusion method on the task of retrieving microblog posts. We find that in the optimal setting the contribution of the clustering information is very limited, which we hypothesize to be due to the limited length of microblog posts. To increase the contribution of the clustering information in cluster-based fusion, we integrate semantic document expansion as a preprocessing step. We enrich the content of microblog posts appearing in the lists to be fused by Wikipedia articles, based on which clusters are created. We verify the effectiveness of our combined document expansion plus fusion method by making comparisons with microblog search algorithms and other fusion methods.
Measuring the quality of recommendations produced by a recommender system (RS) is challenging. Labels used for evaluation are typically obtained from users of a RS, by asking for explicit feedback, or inferring labels from implicit feedback. Both approaches can introduce significant biases in the evaluation process. We investigate biases that may affect labels inferred from implicit feedback. Implicit feedback is easy to collect but can be prone to biases, such as position bias. We examine this bias using click models, and show how bias following these models would affect the outcomes of RS evaluation. We find that evaluation based on implicit and explicit feedback can agree well, but only when the evaluation metrics are designed to take user behavior and preferences into account, stressing the importance of understanding user behavior in deployed RSs.
Streamwatchr offers a new way to discover and enjoy music through an innovative interface. We show, in real-time, what music people around the world are listening to. Each time Streamwatchr identifies a tweet in which someone reports about the song that he or she is listening to, it shows a tile with a photo of the artist and a play button (on mouse over) that does, indeed, play the song (from Youtube). Streamwatchr collects about 500,000 music tweets per day, which is about 6 tweets per second.
visited a class of 12 and 13 year olds at a local high
school here in Amsterdam. As my laptop and the beamer
refused to talk to each other I walked around the class
room with my laptop to demo Streamwatchr, with
Streamwatchr running in full screen. While walking
around I talked about some of the technology behind it
(entity linking, data integration, open data, etc).
Occasionally, I put the laptop down on a table so that
the students could interact with Streamwatchr.
I was amazed to see how addictive the interface was … a screen full of tiles, 6 of which flip and change at random every second, kept every kid glued to the laptop. Later (private) demos of Streamwatchr to friends and colleagues led to similar scenes. As I observed the high school kids interact with Streamwatchr, some interesting questions came up. What is the appeal of random changes to visual elements at a pace that seems to be slightly higher than one can actively track? Is it that there is always something on the screen that you have not seen yet? But that you think you don’t want to miss? At which speed should those random changes occur to be optimally captivating? Should the changes really be random or should they provide maximal coverage of the screen (in the obvious spatial sense) to be optimally captivating? There’s a great set of experiments to be run there --- an unexpected side product of an outreach activity.
We study the problem of optimizing an individual base ranker using clicks. Surprisingly, while there has been considerable attention for using clicks to optimize linear combinations of base rankers, the problem of optimizing an individual base ranker using clicks has been ignored. The problem is different from the problem of optimizing linear combinations of base rankers as the scoring function of a base ranker may be highly non-linear. For the sake of concreteness, we focus on the optimization of a specific base ranker, viz. BM25. We start by showing that significant improvements in performance can be obtained when optimizing the parameters of BM25 for individual datasets. We also show that it is possible to optimize these parameters from clicks, i.e., without the use of manually annotated data, reaching or even beating manually tuned parameters.
A key challenge in information retrieval is that of on-line ranker evaluation: determining which one of a finite set of rankers performs the best in expectation on the basis of user clicks on presented document lists. When the presented lists are constructed using interleaved comparison methods, which interleave lists proposed by two different candidate rankers, then the problem of minimizing the total regret accumulated while evaluating the rankers can be formalized as a K-armed dueling bandits problem. In this paper, we propose a new method called relative confidence sampling (RCS) that aims to reduce cumulative regret by being less conservative than existing methods in eliminating rankers from contention. In addition, we present an empirical comparison between RCS and two state-of-the-art methods, relative upper confidence bound and SAVAGE. The results demonstrate that RCS can substantially outperform these alternatives on several large learning to rank datasets.