One more CIKM 2011 paper online

A second CIKM 2011 paper, Automatic Link Generation with Wikipedia: A Case Study in Annotating Radiology Reports by Jiyin He, Maarten de Rijke, Merlijn Sevenster, Rob van Ommering and Yuechen Qian is now also available online.

Automatically annotating texts with background information has recently received much attention. We conduct a case study in automatically generating links from narrative radiology reports to Wikipedia. Such links help users understand the medical terminology and thereby increase the value of the reports.

Direct applications of existing automatic link generation systems trained on Wikipedia to our radiology data do not yield satisfactory results. Our analysis reveals that medical phrases are often syntactically regular but semantically complicated, e.g., containing multiple concepts or concepts with multiple modifiers. The latter property is the main reason for the failure of existing systems. Based on this observation, we propose an automatic link generation approach that takes into account these properties. We use a sequential labeling approach with syntactic features for anchor text identification in order to exploit syntactic regularities in medical terminology. We combine this with a sub-anchor based approach to target finding, which is aimed at coping with the complex semantic structure of medical phrases. Empirical results show that the proposed system effectively improves the performance over existing systems.

CIKM 2011 paper online

One of our papers for this year’s CIKM is now online: A Probabilistic Method for Inferring Preferences from Clicks by Katja Hofmann, Shimon Whiteson and Maarten de Rijke.

Evaluating rankers using implicit feedback, such as clicks on documents in a result list, is an increasingly popular alternative to traditional evaluation methods based on explicit relevance judgments. Previous work has shown that so-called interleaved comparison methods can utilize click data to detect small differences between rankers and can be applied to learn ranking functions online.

In this paper, we analyze three existing interleaved comparison methods and find that they are all either biased or insensitive to some differences between rankers. To address these problems, we present a new method based on a probabilistic interleaving process. We derive an unbiased estimator of comparison outcomes and show how marginalizing over possible comparison outcomes given the observed click data can make this estimator even more effective.

We validate our approach using a recently developed simulation framework based on a learning to rank dataset and a model of click behavior. Our experiments confirm the results of our analysis and show that our method is both more accurate and more robust to noise than existing methods.