CIKM 2007 paper published
November 08, 2007 00:55 Filed in: Teaching
``More Like These'': Growing Entity Classes from
Seeds by Luis Sarmento, Valentin Jijkoun, Maarten
de Rijke and Eugenio Oliviera has now been published
in the proceedings of CIKM 2007. One of the important
lexical aquisition tasks is creating sets of entities
of a specific class from a handful of seed examples.
In this paper we present a corpus-based approach to
the class expansion task. Given a text collection,
for a given set of seed entities we use co-occurence
statistics to define a class membership function that
is used to rank candidate entities for inclusion in
the class. We describe a novel evaluation framework
for this class expansion problem, using data from
Wikipedia. Analysis of the results indicates that the
method improves as the size of the collection
increases, which makes it very appropriate given the
constant growth of avilable text data. The paper is
available here.
iTunes is not playing.
iTunes is not playing.



