WI 2007 paper published

Fact Discovery in Wikipedia, by Sisay Fissaha Adafre, Valentin Jijkoun, and myself, has now been published in the proceedings of Web Intelligence 2007. In it, we address the task of extracting focused salient information items, relevant and important for a given topic, from a large encyclopedic resource. Specifically, for a given topic (a Wikipedia article) we identify snippets from other articles in Wikipedia that contain important information for the topic of the original article, without duplicates. We compare several methods for addressing the task, and find that a mixture of content-based, link-based, and layout-based features outperforms other methods, especially in combination with the use of so-called reference corpora that capture the key properties of entities of a common type. These reference corpora will also play a big role in Sisay's forthcoming PhD thesis. A PDF version of the paper is available here (opens in a new window).

Listening to ''14 Grass Quit Glade Dub'', by Scientist (Play Count: 23)