SIGIR 2008 poster online (6)
April 24, 2008 06:03 Filed in: Papers
Measuring Concept Relatedness Using Language
Models by Dolf Trieschnigg, Edag Meij, Maarten de
Rijke and Wessel Kraaij is available online now. Over
the years, the notion of concept relatedness has
attracted considerable attention. A variety of
approaches, based on ontology structure,
information content, association, or context
have been proposed to indicate the relatedness
of abstract ideas. We propose a method based on
the cross entropy reduction between language
models of concepts which are estimated based on
document-concept assignments. The approach shows
improved or competitive results compared to
state-of-the-art methods on two test sets in the
biomedical domain.
Listening to ''Nolita'', by Keren Ann (Play Count: 19)
Listening to ''Nolita'', by Keren Ann (Play Count: 19)
SIGIR 2008 paper online
April 23, 2008 22:02 Filed in: Papers
A Few Examples Go A Long Way: Constructing Query
Models from Elaborate Query Formulations by
Krisztian Balog, Wouter Weerkamp and Maarten de Rijke
is available online now. In the
paper we address a specific enterprise document
search scenario, where the information need is
expressed in an elaborate manner. In our
scenario, information needs are expressed using
a short query (of a few keywords) together with
examples of key reference pages. Given this
setup, we investigate how the examples can be
utilized to improve the end-to-end performance
on the document retrieval task. Our approach is
based on a language modeling framework, where
the query model is modified to resemble the
example pages. We compare several methods for
sampling expansion terms from the example pages
to support query-dependent and query-independent
query expansion; the latter is motivated by the
wish to increase ``aspect recall,'' and attempts
to uncover aspects of the information need not
captured by the query.
For evaluation purposes we use the CSIRO data set created for the TREC 2007 Enterprise track. The best performance is achieved by query models based on query-independent sampling of expansion terms from the example documents.
Listening to ''A Serious Version'', by King Tubby & The Aggrovators (Play Count: 6)
For evaluation purposes we use the CSIRO data set created for the TREC 2007 Enterprise track. The best performance is achieved by query models based on query-independent sampling of expansion terms from the example documents.
Listening to ''A Serious Version'', by King Tubby & The Aggrovators (Play Count: 6)
SIGIR 2008 poster online (5)
April 23, 2008 12:34 Filed in: Papers
Term Clouds as Surrogates for User Generated
Speech by Manos Tsagias, Martha Larson and
Maarten de Rijke is available online. User
generated spoken audio remains a challenge for
Automatic Speech Recognition (ASR) technology
and content-based audio surrogates derived from
ASR-transcripts must be error robust. An
investigation of the use of term clouds as
surrogates for podcasts demonstrates that ASR
term clouds closely approximate term clouds
derived from human-generated transcripts across
a range of cloud sizes. A user study confirms
the conclusion that ASR-clouds are viable
surrogates for depicting the content of
podcasts.
Listening to ''Allegro Blues'', by Dave Brubeck (Play Count: 2)
Listening to ''Allegro Blues'', by Dave Brubeck (Play Count: 2)
SIGIR 2008 poster online (4)
April 23, 2008 09:14 Filed in: Papers
Parsimonious Concept Modeling by Edgar Meij,
Dolf Trieschnigg, Maarten de Rijke, and Wessel Kraaij
is available online now. We
introduce a parsimonious conceptual query model
whose retrieval performance matches that of
relevance models, while it is also able to
generate high quality navigation suggestions in
the form of concepts.
Listening to ''The Paris Match'', by The Style Council (Play Count: 12)
Listening to ''The Paris Match'', by The Style Council (Play Count: 12)
SIGIR 2008 poster online (3)
April 23, 2008 09:07 Filed in: Papers
Parsimonious Relevance Models by Edgar Meij,
Wouter Weerkamp, Krisztian Balog and Maarten de Rijke
is available online. We describe
a method for applying parsimonious language
models to re-estimate the term probabilities
assigned by relevance models. We apply our
method to six topic sets from test collections
in five different genres. Our parsimonious
relevance models (i) improve retrieval
effectiveness in terms of MAP on all
collections, (ii) significantly outperform their
non-parsimonious counterparts on most measures,
and (iii) have a precision enhancing effect,
unlike other blind relevance feedback methods.
Listening to ''The Paris Match'', by The Style Council (Play Count: 12)
Listening to ''The Paris Match'', by The Style Council (Play Count: 12)
SIGIR 2008 poster online (2)
April 19, 2008 08:24 Filed in: Papers
Personal vs Non-Personal Blogs: Initial
Classification Experiments by Erik Elgersma and
Maarten de Rijke is available online now. In the
poster we address the task of separating
personal from non-personal blogs, and report on
a set of baseline experiments where we compare
the performance on a small set of features
across a set of five classifiers. We show that
with a limited set of features a performance of
up to 90\% can be obtained.
Listening to ''Barrio Vejo'', by Ry Cooder (Play Count: 4)
Listening to ''Barrio Vejo'', by Ry Cooder (Play Count: 4)
SIGIR 2008 poster online
April 19, 2008 08:17 Filed in: Papers
Bloggers as Experts, by Krisztian Balog,
Maarten de Rijke and Wouter Weerkamp is available online now. We
address the task of (blog) feed distillation: to
find blogs that are principally devoted to a
given topic. The task may be viewed as an
association finding task, between topics and
bloggers; it resembles the expert finding task,
for which a range of models have been proposed.
We adopt two language modeling-based approaches
to expert finding, and determine their
effectiveness as feed distillation strategies.
The two models capture the idea that a human
will often search for key blogs by spotting
highly relevant posts (the Posting model) or by
taking global aspects of the blog into account
(the Blogger model). The Blogger model
outperforms the Posting model and delivers
state-of-the art performance, out-of-the-box.
Listening to ''Barrio Vejo'', by Ry Cooder (Play Count: 4)
Listening to ''Barrio Vejo'', by Ry Cooder (Play Count: 4)
ACL 2008 paper online
April 19, 2008 06:54 Filed in: Papers
Credibility Improves Topical Blog Post
Retrieval by Wouter Weerkamps and Maarten de
Rijke is available online now. Topical
blog post retrieval is the task of ranking blog
posts with respect to their relevance for a
given topic. To improve topical blog post
retrieval we incorporate textual credibility
indicators in the retrieval process. We consider
two groups of indicators: post level
(determined using information about individual
blog posts only) and blog level
(determined using information from the
underlying blogs). We describe how to estimate
these indicators and how to integrate them into
a retrieval approach based on language models.
Experiments on the TREC Blog track test set show
that both groups of credibility indicators
significantly improve retrieval effectiveness;
the best performance is achieved when combining
them.
Listening to ''Lullaby 4 Nina'', by The Durutti Column (Play Count: 7)
Listening to ''Lullaby 4 Nina'', by The Durutti Column (Play Count: 7)



