QASSIR: Question Answering as Semistructured Information Retrieval

Project Description

People have access to unprecedented amounts of information over the Internet. Making sense of the information, usually in free-text, requires advanced methods for focused information retrieval. This proposal brings together two ways of focused information retrieval: question answering (QA) and semistructured information retrieval. Specifically, we aim to address challenges raised by today's QA systems by recasting the QA task as a semistructured information retrieval task.

XML is the de facto standard for capturing metadata and semantically rich information. XML retrieval holds the promise of providing more focused, and semantically more informed, information access than traditional document retrieval. How can this help QA? By performing the tagging and extraction work required for QA off-line, at indexing time, we create text (and data) collections with multiple annotation layers, against which QA is to be performed as an ``answer retrieval'' task. Documents marked up with multiple annotations may have tag spans that overlap without being nested: they are not legal XML documents, but semistructured documents; retrieving information from such documents is called semistructured information retrieval. We devise collection annotation schemes, query languages for semistructured retrieval, data-driven mappings from questions to queries, and build on recent advances in semistructured retrieval so as to identify contexts that can serve as candidate answers. By tackling QA as a semistructured information retrieval task, we can incorporate multiple data sources and annotation layers in the QA process, all tightly integrated within a single framework. This provides a theoretically transparent model for QA, and addresses a number of practical challenges faced by traditional QA systems.

People

  • Valentin Jijkoun
  • Maarten de Rijke
  • Vacancy for a PhD student
  • Vacancy for a part-time research programmer

Starting Date

  • 2006

Results