ACL-IJCNLP 2009 paper online
May 12, 2009 22:37 Filed in: Papers
A Generative Blog Post Retrieval Model that Uses
Query Expansion based on External Collections by
Wouter Weerkamp, Krisztian Balog and Maarten de Rijke
is available online now. User generated content is
characterized by short, noisy documents, with many
spelling errors and unexpected language usage. To
bridge the vocabulary gap between the user's
information need and documents in a specific user
generated content environment, the blogosphere, we
apply a form of query expansion, i.e., adding and
reweighing query terms. Since the blogosphere is
noisy, query expansion on the collection itself is
rarely effective but external, edited collections are
more suitable. In the paper we propose a generative
model for expanding queries using external
collections in which dependencies between queries,
documents, and expansion documents are explicitly
modeled. Different instantiations of our model are
discussed and make different (in)dependence
assumptions. Results using two external collections
(news and Wikipedia) show that external expansion for
retrieval of user generated content is effective;
besides, conditioning the external collection on the
query is very beneficial, and making candidate
expansion terms dependent on just the document seems
sufficient.
Listening to ''Tears for Affairs'', by Camera Obscura (Play Count: 34)
Listening to ''Tears for Affairs'', by Camera Obscura (Play Count: 34)



