Semantic Search workshop paper online
April 24, 2010 06:28 Filed in: Papers
Our Semantic Search Workshop at WWW 2010 paper
Entity Search: Building Bridges Between Two
Worlds, by Krisztian Balog, Edgar Meij and
Maarten de Rijke, is available online now.
We consider the task of entity search and examine to which extent state-of-art information retrieval (IR) and semantic web (SW) technologies are capable of answering information needs that focus on entities. We also explore the potential of combining IR with SW technologies to improve the end-to- end performance on a specific entity search task. We arrive at and motivate a proposal to combine text-based entity models with semantic information from the Linked Open Data cloud.
We consider the task of entity search and examine to which extent state-of-art information retrieval (IR) and semantic web (SW) technologies are capable of answering information needs that focus on entities. We also explore the potential of combining IR with SW technologies to improve the end-to- end performance on a specific entity search task. We arrive at and motivate a proposal to combine text-based entity models with semantic information from the Linked Open Data cloud.
Another INEX 2009 paper online
April 20, 2010 22:45 Filed in: Papers
A second INEX 2009, Combining term-based and
category-based representations for entity search
by Krisztian Balog, Marc Bron, Maarten de Rijke and
Wouter Weerkamp is also online now.
In the paper we describe our participation in the INEX 2009 Entity Ranking track. We employ a probabilistic retrieval model for entity search in which term-based and category-based representations of queries and entities are effectively integrated. We demonstrate that our approach achieves state-of-the-art performance on both the entity ranking and list completion tasks.
In the paper we describe our participation in the INEX 2009 Entity Ranking track. We employ a probabilistic retrieval model for entity search in which term-based and category-based representations of queries and entities are effectively integrated. We demonstrate that our approach achieves state-of-the-art performance on both the entity ranking and list completion tasks.
INEX 2009 paper online
April 20, 2010 22:43 Filed in: Papers
One of our INEX 2009 paper, An exploration of
learning to link with Wikipedia: Features, methods
and training collection, by Jiyin He and Maarten
de Rijke is online now.
We describe our participation in the Link-the-Wiki track at INEX 2009. We apply machine learning methods to the anchor-to-best-entry-point task and explore the impact of the following aspects of our approaches: features, learning methods as well as the collection used for training the models. We find that a learning to rank-based approach and a binary classification approach do not differ a lot. The new Wikipedia collection which is of larger size and which has more links than the collection previously used, provides better training material for learning our models. In addition, a heuristic run which combines the two intuitively most useful features outperforms machine learning based runs, which suggests that a further analysis and selection of features is necessary.
We describe our participation in the Link-the-Wiki track at INEX 2009. We apply machine learning methods to the anchor-to-best-entry-point task and explore the impact of the following aspects of our approaches: features, learning methods as well as the collection used for training the models. We find that a learning to rank-based approach and a binary classification approach do not differ a lot. The new Wikipedia collection which is of larger size and which has more links than the collection previously used, provides better training material for learning our models. In addition, a heuristic run which combines the two intuitively most useful features outperforms machine learning based runs, which suggests that a further analysis and selection of features is necessary.
CIVR 2010 paper online
April 16, 2010 07:13 Filed in: Papers
Our CIVR 2010 paper Today's and Tomorrow's
Retrieval Practice in the Audiovisual Archive by
Bouke Huurnink, Cees Snoek, Maarten de Rijke and
Arnold Smeulders is online now.
Content-based video retrieval is maturing to the point where it can be used in real-world retrieval practices. One such practice is the audiovisual archive, whose users increasingly require fine-grained access to broadcast television content. We investigate to what extent content-based video retrieval methods can improve search in the audiovisual archive. In particular, we propose an evaluation methodology tailored to the specific needs and circumstances of the audiovisual archive, which are typically missed by existing evaluation initiatives. We utilize logged searches and content purchases from an existing audiovisual archive to create realistic query sets and relevance judgments. To reflect the retrieval practice of both the archive and the video retrieval community as closely as possible, our experiments with three video search engines incorporate archive-created catalog entries as well as state-of-the-art multimedia content analysis results. We find that incorporating content-based video retrieval into the archive's practice results in significant performance increases for shot retrieval and for retrieving entire television programs. Our experiments also indicate that individual content-based retrieval methods yield approximately equal performance gains. We conclude that the time has come for audiovisual archives to start accommodating content-based video retrieval methods into their daily practice.
Content-based video retrieval is maturing to the point where it can be used in real-world retrieval practices. One such practice is the audiovisual archive, whose users increasingly require fine-grained access to broadcast television content. We investigate to what extent content-based video retrieval methods can improve search in the audiovisual archive. In particular, we propose an evaluation methodology tailored to the specific needs and circumstances of the audiovisual archive, which are typically missed by existing evaluation initiatives. We utilize logged searches and content purchases from an existing audiovisual archive to create realistic query sets and relevance judgments. To reflect the retrieval practice of both the archive and the video retrieval community as closely as possible, our experiments with three video search engines incorporate archive-created catalog entries as well as state-of-the-art multimedia content analysis results. We find that incorporating content-based video retrieval into the archive's practice results in significant performance increases for shot retrieval and for retrieving entire television programs. Our experiments also indicate that individual content-based retrieval methods yield approximately equal performance gains. We conclude that the time has come for audiovisual archives to start accommodating content-based video retrieval methods into their daily practice.
NAACL Social Media workshop paper online
April 13, 2010 09:04 Filed in: Papers
Our NAACL 2010 Social Media workshop paper Mining
User Experiences from Online Forums: An
Exploration by Valentin Jijkoun, Maarten de
Rijke, Wouter Weerkamp, Paul Ackermans and Gijs
Geleijnse is available online now.
We introduce the task of experience mining. Here, the goal is to gain insights into criteria that people formulate to judge or rate a product or its usage. These criteria can be formulated as the expectations that people have of the product in advance (i.e., the reasons to buy), but can also be expressed as reports of experiences while using the product and comparisons with other products. We focus on the latter: reports of experiences with products. In this paper, we define the task, describe guidelines for manual annotation and analyze linguistic features that can be used in an automatic experience mining system.
We introduce the task of experience mining. Here, the goal is to gain insights into criteria that people formulate to judge or rate a product or its usage. These criteria can be formulated as the expectations that people have of the product in advance (i.e., the reasons to buy), but can also be expressed as reports of experiences while using the product and comparisons with other products. We focus on the latter: reports of experiences with products. In this paper, we define the task, describe guidelines for manual annotation and analyze linguistic features that can be used in an automatic experience mining system.



