WI 2007 paper published
November 03, 2007 23:29 Filed in: Teaching
Fact Discovery in Wikipedia, by Sisay Fissaha
Adafre, Valentin Jijkoun, and myself, has now been
published in the proceedings of Web Intelligence
2007. In it, we address the task of extracting
focused salient information items, relevant and
important for a given topic, from a large
encyclopedic resource. Specifically, for a given
topic (a Wikipedia article) we identify snippets from
other articles in Wikipedia that contain important
information for the topic of the original article,
without duplicates. We compare several methods for
addressing the task, and find that a mixture of
content-based, link-based, and layout-based features
outperforms other methods, especially in combination
with the use of so-called reference corpora that
capture the key properties of entities of a common
type. These reference corpora will also play a big
role in Sisay's forthcoming PhD thesis. A PDF version
of the paper is available here (opens in a new window).
Listening to ''14 Grass Quit Glade Dub'', by Scientist (Play Count: 23)
Listening to ''14 Grass Quit Glade Dub'', by Scientist (Play Count: 23)



