CIKM 2007 paper published

``More Like These'': Growing Entity Classes from Seeds by Luis Sarmento, Valentin Jijkoun, Maarten de Rijke and Eugenio Oliviera has now been published in the proceedings of CIKM 2007. One of the important lexical aquisition tasks is creating sets of entities of a specific class from a handful of seed examples. In this paper we present a corpus-based approach to the class expansion task. Given a text collection, for a given set of seed entities we use co-occurence statistics to define a class membership function that is used to rank candidate entities for inclusion in the class. We describe a novel evaluation framework for this class expansion problem, using data from Wikipedia. Analysis of the results indicates that the method improves as the size of the collection increases, which makes it very appropriate given the constant growth of avilable text data. The paper is available here.

iTunes is not playing.