SIGIR 2008 poster online (6)

Measuring Concept Relatedness Using Language Models by Dolf Trieschnigg, Edag Meij, Maarten de Rijke and Wessel Kraaij is available online now. Over the years, the notion of concept relatedness has attracted considerable attention. A variety of approaches, based on ontology structure, information content, association, or context have been proposed to indicate the relatedness of abstract ideas. We propose a method based on the cross entropy reduction between language models of concepts which are estimated based on document-concept assignments. The approach shows improved or competitive results compared to state-of-the-art methods on two test sets in the biomedical domain.

Listening to ''Nolita'', by Keren Ann (Play Count: 19)

SIGIR 2008 paper online

A Few Examples Go A Long Way: Constructing Query Models from Elaborate Query Formulations by Krisztian Balog, Wouter Weerkamp and Maarten de Rijke is available online now. In the paper we address a specific enterprise document search scenario, where the information need is expressed in an elaborate manner. In our scenario, information needs are expressed using a short query (of a few keywords) together with examples of key reference pages. Given this setup, we investigate how the examples can be utilized to improve the end-to-end performance on the document retrieval task. Our approach is based on a language modeling framework, where the query model is modified to resemble the example pages. We compare several methods for sampling expansion terms from the example pages to support query-dependent and query-independent query expansion; the latter is motivated by the wish to increase ``aspect recall,'' and attempts to uncover aspects of the information need not captured by the query.

For evaluation purposes we use the CSIRO data set created for the TREC 2007 Enterprise track. The best performance is achieved by query models based on query-independent sampling of expansion terms from the example documents.

Listening to ''A Serious Version'', by King Tubby & The Aggrovators (Play Count: 6)

SIGIR 2008 poster online (5)

Term Clouds as Surrogates for User Generated Speech by Manos Tsagias, Martha Larson and Maarten de Rijke is available online. User generated spoken audio remains a challenge for Automatic Speech Recognition (ASR) technology and content-based audio surrogates derived from ASR-transcripts must be error robust. An investigation of the use of term clouds as surrogates for podcasts demonstrates that ASR term clouds closely approximate term clouds derived from human-generated transcripts across a range of cloud sizes. A user study confirms the conclusion that ASR-clouds are viable surrogates for depicting the content of podcasts.

Listening to ''Allegro Blues'', by Dave Brubeck (Play Count: 2)

SIGIR 2008 poster online (4)

Parsimonious Concept Modeling by Edgar Meij, Dolf Trieschnigg, Maarten de Rijke, and Wessel Kraaij is available online now. We introduce a parsimonious conceptual query model whose retrieval performance matches that of relevance models, while it is also able to generate high quality navigation suggestions in the form of concepts.

Listening to ''The Paris Match'', by The Style Council (Play Count: 12)

SIGIR 2008 poster online (3)

Parsimonious Relevance Models by Edgar Meij, Wouter Weerkamp, Krisztian Balog and Maarten de Rijke is available online. We describe a method for applying parsimonious language models to re-estimate the term probabilities assigned by relevance models. We apply our method to six topic sets from test collections in five different genres. Our parsimonious relevance models (i) improve retrieval effectiveness in terms of MAP on all collections, (ii) significantly outperform their non-parsimonious counterparts on most measures, and (iii) have a precision enhancing effect, unlike other blind relevance feedback methods.

Listening to ''The Paris Match'', by The Style Council (Play Count: 12)

SIGIR 2008 poster online (2)

Personal vs Non-Personal Blogs: Initial Classification Experiments by Erik Elgersma and Maarten de Rijke is available online now. In the poster we address the task of separating personal from non-personal blogs, and report on a set of baseline experiments where we compare the performance on a small set of features across a set of five classifiers. We show that with a limited set of features a performance of up to 90\% can be obtained.

Listening to ''Barrio Vejo'', by Ry Cooder (Play Count: 4)

SIGIR 2008 poster online

Bloggers as Experts, by Krisztian Balog, Maarten de Rijke and Wouter Weerkamp is available online now. We address the task of (blog) feed distillation: to find blogs that are principally devoted to a given topic. The task may be viewed as an association finding task, between topics and bloggers; it resembles the expert finding task, for which a range of models have been proposed. We adopt two language modeling-based approaches to expert finding, and determine their effectiveness as feed distillation strategies. The two models capture the idea that a human will often search for key blogs by spotting highly relevant posts (the Posting model) or by taking global aspects of the blog into account (the Blogger model). The Blogger model outperforms the Posting model and delivers state-of-the art performance, out-of-the-box.

Listening to ''Barrio Vejo'', by Ry Cooder (Play Count: 4)

ACL 2008 paper online

Credibility Improves Topical Blog Post Retrieval by Wouter Weerkamps and Maarten de Rijke is available online now. Topical blog post retrieval is the task of ranking blog posts with respect to their relevance for a given topic. To improve topical blog post retrieval we incorporate textual credibility indicators in the retrieval process. We consider two groups of indicators: post level (determined using information about individual blog posts only) and blog level (determined using information from the underlying blogs). We describe how to estimate these indicators and how to integrate them into a retrieval approach based on language models. Experiments on the TREC Blog track test set show that both groups of credibility indicators significantly improve retrieval effectiveness; the best performance is achieved when combining them.

Listening to ''Lullaby 4 Nina'', by The Durutti Column (Play Count: 7)