TREC 2007 Working Notes papers published
November 08, 2007 03:11
Two more papers have just been published. The
University of Amsterdam at the TREC 2007 Blog
Track, by Breyten Ernsting, Wouter Weerkamp, and
Maarten de Rijke and The University of Amsterdam
at the TREC 2007 Enterprise Track by Krisztian
Balog, Katja Hofmann, Wouter Weerkamp and Maarten de
Rijke appeared in the TREC 2007 Working Notes.
In the first paper, we describe our participation in the TREC 2007 Blog track. In the opinion task we looked at the differences in performance between Indri and our mixture model, the influence of external expansion and document priors to improve opinion finding; results show that an out-of-the-box Indri implementation outperforms our mixture model, and that external expansion on a news corpus is very benificial. Opinion finding can be improved using either lexicons or number of comments as document priors. In our approach to the feed distillation task we integrated time-based and frequency aspects into the retrieval model; we find that time-based retrieval improves results slightly, while frequency-based retrieval results in substantial improvements under the right circumstances.
In the second paper, we describe our participation in the TREC 2007 Enterprise track and detail our language modeling-based approaches. For document search, our focus was on estimating a mixture model using a standard web collection, and on constructing query models by employing blind relevance feedback and using the example documents provided with the topics. We found that settings performing well on a web collection do not carry over to the CSIRO collection, but the use of advanced query models resulted in significant improvements. In expert search, our experiments concerned document representation, identification of candidate experts, and combinations of expert search strategies. We find no significant difference in average precision but observe small overall positive effects of the advanced models, with large differences between individual topics.
iTunes is not playing.
In the first paper, we describe our participation in the TREC 2007 Blog track. In the opinion task we looked at the differences in performance between Indri and our mixture model, the influence of external expansion and document priors to improve opinion finding; results show that an out-of-the-box Indri implementation outperforms our mixture model, and that external expansion on a news corpus is very benificial. Opinion finding can be improved using either lexicons or number of comments as document priors. In our approach to the feed distillation task we integrated time-based and frequency aspects into the retrieval model; we find that time-based retrieval improves results slightly, while frequency-based retrieval results in substantial improvements under the right circumstances.
In the second paper, we describe our participation in the TREC 2007 Enterprise track and detail our language modeling-based approaches. For document search, our focus was on estimating a mixture model using a standard web collection, and on constructing query models by employing blind relevance feedback and using the example documents provided with the topics. We found that settings performing well on a web collection do not carry over to the CSIRO collection, but the use of advanced query models resulted in significant improvements. In expert search, our experiments concerned document representation, identification of candidate experts, and combinations of expert search strategies. We find no significant difference in average precision but observe small overall positive effects of the advanced models, with large differences between individual topics.
iTunes is not playing.
CIKM 2007 paper published
November 08, 2007 00:55
``More Like These'': Growing Entity Classes from
Seeds by Luis Sarmento, Valentin Jijkoun, Maarten
de Rijke and Eugenio Oliviera has now been published
in the proceedings of CIKM 2007. One of the important
lexical aquisition tasks is creating sets of entities
of a specific class from a handful of seed examples.
In this paper we present a corpus-based approach to
the class expansion task. Given a text collection,
for a given set of seed entities we use co-occurence
statistics to define a class membership function that
is used to rank candidate entities for inclusion in
the class. We describe a novel evaluation framework
for this class expansion problem, using data from
Wikipedia. Analysis of the results indicates that the
method improves as the size of the collection
increases, which makes it very appropriate given the
constant growth of avilable text data. The paper is
available here.
iTunes is not playing.
iTunes is not playing.
WI 2007 paper published
November 03, 2007 23:29
Fact Discovery in Wikipedia, by Sisay Fissaha
Adafre, Valentin Jijkoun, and myself, has now been
published in the proceedings of Web Intelligence
2007. In it, we address the task of extracting
focused salient information items, relevant and
important for a given topic, from a large
encyclopedic resource. Specifically, for a given
topic (a Wikipedia article) we identify snippets from
other articles in Wikipedia that contain important
information for the topic of the original article,
without duplicates. We compare several methods for
addressing the task, and find that a mixture of
content-based, link-based, and layout-based features
outperforms other methods, especially in combination
with the use of so-called reference corpora that
capture the key properties of entities of a common
type. These reference corpora will also play a big
role in Sisay's forthcoming PhD thesis. A PDF version
of the paper is available here (opens in a new window).
Listening to ''14 Grass Quit Glade Dub'', by Scientist (Play Count: 23)
Listening to ''14 Grass Quit Glade Dub'', by Scientist (Play Count: 23)



