Now playing: Bowerbirds -- Silver Clouds
Weakly supervised ways of evaluating rankers and weakly supervised ways of learning to combined the outputs of multiple rankers are being studied a lot. And solutions are being used outside the lab. The next holy grail here is to create new rankers in a weakly supervised way, again based on the implicit signals that we get from user interactions. The audience, consisting of information professionals for whom expert judgments are the norm, was obviously critical but interested, with a number of great questions and a range of suggestions. Thank you!
Now playing: The Durutti Column -- Stuki
Because of the ongoing NSA revelations, I decided to add some material on what can be mined from metadata. The message is the same: like the content of open sources, the metadata can be incredibly revealing and therefore sensitive too. There were good and interesting questions, both on the reach of search and text mining technology and also on the balance between sharing digital trails as many people stand to benefit on the one hand and ownership of such trails on the other hand. Thanks to everyone who attended!
Now playing: Dakota Suite & Quentin Sirjacq -- The Side of Her Inexhaustible Heart (Part III)
Modern search engines aggregate results from specialized verticals into the Web search results. We study a setting where vertical and Web results are blended into a single result list, a setting that has not been studied before. We focus on video intent and present a detailed observational study of Yandex's two video content sources (i.e., the specialized vertical and a subset of the general web index) thus providing insights into their complementary character. By investigating how to blend results from these sources, we contrast traditional federated search and fusion-based approaches with newly proposed approaches that significantly outperform the baseline methods.
We propose a method for linking entities in a stream of short textual documents that takes into account context both inside a document and inside the history of documents seen so far. Our method uses a generic optimization framework for combining several entity ranking functions, and we introduce a global control function to control optimization. Our results demonstrate the effectiveness of combining entity ranking functions that take into account context, which is further boosted by 6% when we use an informed global control function.
The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.
Searching microblog posts, with their limited length and creative language usage, is challenging. We frame the microblog search problem as a data fusion problem. We examine the effectiveness of a recent cluster-based fusion method on the task of retrieving microblog posts. We find that in the optimal setting the contribution of the clustering information is very limited, which we hypothesize to be due to the limited length of microblog posts. To increase the contribution of the clustering information in cluster-based fusion, we integrate semantic document expansion as a preprocessing step. We enrich the content of microblog posts appearing in the lists to be fused by Wikipedia articles, based on which clusters are created. We verify the effectiveness of our combined document expansion plus fusion method by making comparisons with microblog search algorithms and other fusion methods.
Measuring the quality of recommendations produced by a recommender system (RS) is challenging. Labels used for evaluation are typically obtained from users of a RS, by asking for explicit feedback, or inferring labels from implicit feedback. Both approaches can introduce significant biases in the evaluation process. We investigate biases that may affect labels inferred from implicit feedback. Implicit feedback is easy to collect but can be prone to biases, such as position bias. We examine this bias using click models, and show how bias following these models would affect the outcomes of RS evaluation. We find that evaluation based on implicit and explicit feedback can agree well, but only when the evaluation metrics are designed to take user behavior and preferences into account, stressing the importance of understanding user behavior in deployed RSs.
Streamwatchr offers a new way to discover and enjoy music through an innovative interface. We show, in real-time, what music people around the world are listening to. Each time Streamwatchr identifies a tweet in which someone reports about the song that he or she is listening to, it shows a tile with a photo of the artist and a play button (on mouse over) that does, indeed, play the song (from Youtube). Streamwatchr collects about 500,000 music tweets per day, which is about 6 tweets per second.
visited a class of 12 and 13 year olds at a local high
school here in Amsterdam. As my laptop and the beamer
refused to talk to each other I walked around the class
room with my laptop to demo Streamwatchr, with
Streamwatchr running in full screen. While walking
around I talked about some of the technology behind it
(entity linking, data integration, open data, etc).
Occasionally, I put the laptop down on a table so that
the students could interact with Streamwatchr.
I was amazed to see how addictive the interface was … a screen full of tiles, 6 of which flip and change at random every second, kept every kid glued to the laptop. Later (private) demos of Streamwatchr to friends and colleagues led to similar scenes. As I observed the high school kids interact with Streamwatchr, some interesting questions came up. What is the appeal of random changes to visual elements at a pace that seems to be slightly higher than one can actively track? Is it that there is always something on the screen that you have not seen yet? But that you think you don’t want to miss? At which speed should those random changes occur to be optimally captivating? Should the changes really be random or should they provide maximal coverage of the screen (in the obvious spatial sense) to be optimally captivating? There’s a great set of experiments to be run there --- an unexpected side product of an outreach activity.
We study the problem of optimizing an individual base ranker using clicks. Surprisingly, while there has been considerable attention for using clicks to optimize linear combinations of base rankers, the problem of optimizing an individual base ranker using clicks has been ignored. The problem is different from the problem of optimizing linear combinations of base rankers as the scoring function of a base ranker may be highly non-linear. For the sake of concreteness, we focus on the optimization of a specific base ranker, viz. BM25. We start by showing that significant improvements in performance can be obtained when optimizing the parameters of BM25 for individual datasets. We also show that it is possible to optimize these parameters from clicks, i.e., without the use of manually annotated data, reaching or even beating manually tuned parameters.
A key challenge in information retrieval is that of on-line ranker evaluation: determining which one of a finite set of rankers performs the best in expectation on the basis of user clicks on presented document lists. When the presented lists are constructed using interleaved comparison methods, which interleave lists proposed by two different candidate rankers, then the problem of minimizing the total regret accumulated while evaluating the rankers can be formalized as a K-armed dueling bandits problem. In this paper, we propose a new method called relative confidence sampling (RCS) that aims to reduce cumulative regret by being less conservative than existing methods in eliminating rankers from contention. In addition, we present an empirical comparison between RCS and two state-of-the-art methods, relative upper confidence bound and SAVAGE. The results demonstrate that RCS can substantially outperform these alternatives on several large learning to rank datasets.