Semantic Search workshop paper online

Our Semantic Search Workshop at WWW 2010 paper Entity Search: Building Bridges Between Two Worlds, by Krisztian Balog, Edgar Meij and Maarten de Rijke, is available online now.

We consider the task of entity search and examine to which extent state-of-art information retrieval (IR) and semantic web (SW) technologies are capable of answering information needs that focus on entities. We also explore the potential of combining IR with SW technologies to improve the end-to- end performance on a specific entity search task. We arrive at and motivate a proposal to combine text-based entity models with semantic information from the Linked Open Data cloud.

Another INEX 2009 paper online

A second INEX 2009, Combining term-based and category-based representations for entity search by Krisztian Balog, Marc Bron, Maarten de Rijke and Wouter Weerkamp is also online now.

In the paper we describe our participation in the INEX 2009 Entity Ranking track. We employ a probabilistic retrieval model for entity search in which term-based and category-based representations of queries and entities are effectively integrated. We demonstrate that our approach achieves state-of-the-art performance on both the entity ranking and list completion tasks.

INEX 2009 paper online

One of our INEX 2009 paper, An exploration of learning to link with Wikipedia: Features, methods and training collection, by Jiyin He and Maarten de Rijke is online now.

We describe our participation in the Link-the-Wiki track at INEX 2009. We apply machine learning methods to the anchor-to-best-entry-point task and explore the impact of the following aspects of our approaches: features, learning methods as well as the collection used for training the models. We find that a learning to rank-based approach and a binary classification approach do not differ a lot. The new Wikipedia collection which is of larger size and which has more links than the collection previously used, provides better training material for learning our models. In addition, a heuristic run which combines the two intuitively most useful features outperforms machine learning based runs, which suggests that a further analysis and selection of features is necessary.

CIVR 2010 paper online

Our CIVR 2010 paper Today's and Tomorrow's Retrieval Practice in the Audiovisual Archive by Bouke Huurnink, Cees Snoek, Maarten de Rijke and Arnold Smeulders is online now.

Content-based video retrieval is maturing to the point where it can be used in real-world retrieval practices. One such practice is the audiovisual archive, whose users increasingly require fine-grained access to broadcast television content. We investigate to what extent content-based video retrieval methods can improve search in the audiovisual archive. In particular, we propose an evaluation methodology tailored to the specific needs and circumstances of the audiovisual archive, which are typically missed by existing evaluation initiatives. We utilize logged searches and content purchases from an existing audiovisual archive to create realistic query sets and relevance judgments. To reflect the retrieval practice of both the archive and the video retrieval community as closely as possible, our experiments with three video search engines incorporate archive-created catalog entries as well as state-of-the-art multimedia content analysis results. We find that incorporating content-based video retrieval into the archive's practice results in significant performance increases for shot retrieval and for retrieving entire television programs. Our experiments also indicate that individual content-based retrieval methods yield approximately equal performance gains. We conclude that the time has come for audiovisual archives to start accommodating content-based video retrieval methods into their daily practice.

NAACL Social Media workshop paper online

Our NAACL 2010 Social Media workshop paper Mining User Experiences from Online Forums: An Exploration by Valentin Jijkoun, Maarten de Rijke, Wouter Weerkamp, Paul Ackermans and Gijs Geleijnse is available online now.

We introduce the task of experience mining. Here, the goal is to gain insights into criteria that people formulate to judge or rate a product or its usage. These criteria can be formulated as the expectations that people have of the product in advance (i.e., the reasons to buy), but can also be expressed as reports of experiences while using the product and comparisons with other products. We focus on the latter: reports of experiences with products. In this paper, we define the task, describe guidelines for manual annotation and analyze linguistic features that can be used in an automatic experience mining system.