Maarten de Rijke

Information retrieval

IR positions at the University of Amsterdam

We have several positions open at the interface of information retrieval, language technology and artificial intelligence, at different levels at the University of Amsterdam.

Feel free to contact me with any questions.

SIGIR 2018 papers online

The SIGIR 2018 papers that I contributed to are online now:

  • Alexey Borisov, Martijn Wardenaar, Ilya Markov, and Maarten de Rijke. A Click Sequence Model for Web Search. In SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval. ACM, July 2018. Bibtex, PDF
    @inproceedings{borisov-click-2018,
    Author = {Borisov, Alexey and Wardenaar, Martijn and Markov, Ilya and de Rijke, Maarten},
    Booktitle = {SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval},
    Date-Added = {2018-04-12 05:45:13 +0000},
    Date-Modified = {2018-04-12 05:46:05 +0000},
    Month = {July},
    Publisher = {ACM},
    Title = {A Click Sequence Model for Web Search},
    Year = {2018}}
  • Wanyu Chen, Fei Cai, Honghui Chen, and Maarten de Rijke. Attention-based Hierarchical Neural Query Suggestion. In SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval. ACM, July 2018. Bibtex, PDF
    @inproceedings{chen-attention-based-2018,
    Author = {Chen, Wanyu and Cai, Fei and Chen, Honghui and de Rijke, Maarten},
    Booktitle = {SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval},
    Date-Added = {2018-04-11 23:31:34 +0000},
    Date-Modified = {2018-04-11 23:32:43 +0000},
    Month = {July},
    Publisher = {ACM},
    Title = {Attention-based Hierarchical Neural Query Suggestion},
    Year = {2018}}
  • Paul Groth, Laura Koesten, Philipp Mayr, Maarten de Rijke, and Elena Simperl. DATA:SEARCH’18 – Searching Data on the Web. In SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 2018. Bibtex, PDF
    @inproceedings{groth-data-2018,
    Author = {Groth, Paul and Koesten, Laura and Mayr, Philipp and de Rijke, Maarten and Simperl, Elena},
    Booktitle = {SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval},
    Date-Added = {2018-05-05 10:58:53 +0000},
    Date-Modified = {2018-05-05 11:00:44 +0000},
    Publisher = {ACM},
    Title = {DATA:SEARCH'18 -- Searching Data on the Web},
    Year = {2018}}
  • Harrie Oosterhuis and Maarten de Rijke. Ranking for Relevance and Display Preferences in Complex Presentation Layouts. In SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval. ACM, July 2018. Bibtex, PDF
    @inproceedings{oosterhuis-ranking-2018,
    Author = {Oosterhuis, Harrie and de Rijke, Maarten},
    Booktitle = {SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval},
    Date-Added = {2018-04-12 05:41:46 +0000},
    Date-Modified = {2018-04-12 05:42:28 +0000},
    Month = {July},
    Publisher = {ACM},
    Title = {Ranking for Relevance and Display Preferences in Complex Presentation Layouts},
    Year = {2018}}
  • Zhaochun Ren, Xiangnan He, Dawei Yin, and Maarten de Rijke. Information Discovery in E-commerce. In SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 2018. Bibtex, PDF
    @inproceedings{ren-information-2018,
    Author = {Ren, Zhaochun and He, Xiangnan and Yin, Dawei and de Rijke, Maarten},
    Booktitle = {SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval},
    Date-Added = {2018-05-05 10:53:55 +0000},
    Date-Modified = {2018-05-05 10:55:19 +0000},
    Publisher = {ACM},
    Title = {Information Discovery in E-commerce},
    Year = {2018}}
  • Christophe Van Gysel and Maarten de Rijke. Pytrec_eval: An Extremely Fast Python Interface to trec_eval. In SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval. ACM, July 2018. Bibtex, PDF
    @inproceedings{vangysel-pytrec-2018,
    Author = {Van Gysel, Christophe and de Rijke, Maarten},
    Booktitle = {SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval},
    Date-Added = {2018-04-11 22:02:31 +0000},
    Date-Modified = {2018-04-12 01:12:13 +0000},
    Month = {July},
    Publisher = {ACM},
    Title = {Pytrec\_eval: An Extremely Fast Python Interface to trec\_eval},
    Year = {2018}}
  • Nikos Voskarides, Edgar Meij, Ridho Reinanda, Abhinav Khaitan, Miles Osborne, Giorgio Stefanoni, Kambadur Prabhanjan, and Maarten de Rijke. Weakly-supervised Contextualization of Knowledge Graph Facts. In SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval. ACM, July 2018. Bibtex, PDF
    @inproceedings{voskarides-weakly-supervised-2018,
    Author = {Voskarides, Nikos and Meij, Edgar and Reinanda, Ridho and Khaitan, Abhinav and Osborne, Miles and Stefanoni, Giorgio and Kambadur Prabhanjan and de Rijke, Maarten},
    Booktitle = {SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval},
    Date-Added = {2018-04-12 05:42:50 +0000},
    Date-Modified = {2018-04-12 05:44:27 +0000},
    Month = {July},
    Publisher = {ACM},
    Title = {Weakly-supervised Contextualization of Knowledge Graph Facts},
    Year = {2018}}
  • Xiaohui Xie, Jiaxin Mao, Maarten de Rijke, Ruizhe Zhang, Min Zhang, and Shaoping Ma. Constructing an Interaction Behavior Model for Web Image Search. In SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval. ACM, July 2018. Bibtex, PDF
    @inproceedings{xie-constructing-2018,
    Author = {Xie, Xiaohui and Mao, Jiaxin and de Rijke, Maarten and Zhang, Ruizhe and Zhang, Min and Ma, Shaoping},
    Booktitle = {SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval},
    Date-Added = {2018-04-11 22:25:25 +0000},
    Date-Modified = {2018-04-11 23:00:01 +0000},
    Month = {July},
    Publisher = {ACM},
    Title = {Constructing an Interaction Behavior Model for Web Image Search},
    Year = {2018}}

Hello World: Innovation Center for Artificial Intelligence

Yesterday, ICAI, the national Innovation Center for Artificial Intelligence, was launched. ICAI is a national initiative focused on joint technology development between academia and industry in the area of artificial intelligence.

Artificial intelligence (AI) has become a key technology that is rapidly becoming a disruptor for all economic sectors. Given the impact, AI also generates many societal challenges. It is a proven attractor of investments in countries around the globe and potentially in the Netherlands. And it is likely to be a major change maker for work, today and tomorrow.

The Netherlands needs to better help drive innovation through AI, most importantly by increasing its ability to attract, train and retain top artificial intelligence scientists, connecting them to the business world. Without Dutch business and Dutch data, Dutch AI knowledge cannot be developed. Vice versa, without Dutch AI knowledge, Dutch business faces a serious competitive disadvantage.

The Netherlands has all the required assets to occupy a prominent place in the international AI arena. We have talent, we have world-class research, we have a longstanding tradition in AI education at all levels, and we are one of the world’s top ranked countries in terms of innovation power. ICAI brings these positive forces together in a unique national initiative. Focused on AI innovation through public-private collaborations, ICAI is an open national consortium of academic partners that is based at Amsterdam Science Park and launched by the University of Amsterdam and the VU University Amsterdam.

ICAI’s innovation strategy is organized around industry labs, these are multi-year strategic collaborations between academic and industrial partners with a focus on technology and talent development. Our mantra is that it takes AI innovation talent to make data actionable. By establishing a research lab under the ICAI umbrella, participating companies invest in AI research and innovation, custom made AI training programs, and an ambitious talent pipeline that builds on educational strengths in AI.

ICAI builds on the success of a long-standing tradition of public-private cooperation in research. For companies it is important to absorb knowledge and know-how of AI as it is close to the essential values of business processes and the future perspective of the company. Internationally, this need to cooperate between public and private partners has been recognized and put into action for the Netherlands to follow.

Visit http://icai.ai or contact me for more information.

Now on arXiv: Finding influential training samples for gradient boosted decision trees

Boris Sharchilev, Yury Ustinovsky, Pavel Serdyukov, and I have released a new pre-print on “finding influential training samples for gradient boosted decision trees” on arXiv. In the paper we address the problem of finding influential training samples for a particular case of tree ensemble-based models, e.g., Random Forest (RF) or Gradient Boosted Decision Trees (GBDT). A natural way of formalizing this problem is studying how the model’s predictions change upon leave-one-out retraining, leaving out each individual training sample. Recent work has shown that, for parametric models, this analysis can be conducted in a computationally efficient way. We propose several ways of extending this framework to non-parametric GBDT ensembles under the assumption that tree structures remain fixed. Furthermore, we introduce a general scheme of obtaining further approximations to our method that balance the trade-off between performance and computational complexity. We evaluate our approaches on various experimental setups and use-case scenarios and demonstrate both the quality of our approach to finding influential training samples in comparison to the baselines and its computational efficiency. You can find the paper here.

Now on arXiv: Optimizing interactive systems with data-driven objectives

Ziming Li, Artem Grotov, Julia Kiseleva, Harrie Oosterhuis and I have just released a new preprint on “optimizing interactive systems with data-driven objectives” on arXiv. Effective optimization is essential for interactive systems to provide a satisfactory user experience. However, it is often challenging to find an objective to optimize for. Generally, such objectives are manually crafted and rarely capture complex user needs accurately. Conversely, we propose an approach that infers the objective directly from observed user interactions. These inferences can be made regardless of prior knowledge and across different types of user behavior. Then we introduce: Interactive System Optimizer (ISO), a novel algorithm that uses these inferred objectives for optimization. Our main contribution is a new general principled approach to optimizing interactive systems using data-driven objectives. We demonstrate the high effectiveness of ISO over several GridWorld simulations. Rush over to arXiv to download the paper.

ICLR 2018 paper on Deep Learning with Logged Bandit Feedback online now

“Deep Learning with Logged Bandit Feedback” by Thorsten Joachims, Adith Swaminathan and Maarten de Rijke, to be published at ICLR 2018, is available online.

In the paper we propose a new output layer for deep neural networks that permits the use of logged contextual bandit feedback for training. Such contextual bandit feedback can be available in huge quantities (e.g., logs of search engines, recommender systems) at little cost, opening up a path for training deep networks on orders of magnitude more data. To this effect, we propose a counterfactual risk minimization approach for training deep networks using an equivariant empirical risk estimator with variance regularization, BanditNet, and show how the resulting objective can be decomposed in a way that allows stochastic gradient descent training. We empirically demonstrate the effectiveness of the method by showing how deep networks – ResNets in particular – can be trained for object recognition without conventionally labeled images.

WWW 2018 paper on Manifold Learning for Rank Aggregation online

“Manifold Learning for Rank Aggregation” by Shangsong Liang, Ilya Markov, Zhaochun Ren, and Maarten de Rijke, which will be published at WWW 2018, is available online now.

In the paper we address the task of fusing ranked lists of documents that are retrieved in response to a query. Past work on this task of rank aggregation often assumes that documents in the lists being fused are independent and that only the documents that are ranked high in many lists are likely to be relevant to a given topic. We propose manifold learning aggregation approaches, ManX and v-ManX, that build on the cluster hypothesis and exploit inter-document similarity information. ManX regularizes document fusion scores, so that documents that appear to be similar within a manifold, receive similar scores, whereas v-ManX first generates virtual adversarial documents and then regularizes the fusion scores of both original and virtual adversarial documents. Since aggregation methods built on the cluster hypothesis are computationally expensive, we adopt an optimization method that uses the top-k documents as anchors and considerably reduces the computational complexity of manifold-based methods, resulting in two efficient aggregation approaches, a-ManX and a-v-ManX. We assess the proposed approaches experimentally and show that they signi cantly outperform the state-of-the-art aggregation approaches, while a-ManX and a-v-ManX run faster than ManX, v-ManX, respectively.

JASIST paper “The birth of collective memories: Analyzing emerging entities in text streams” online

“The birth of collective memories: Analyzing emerging entities in text streams” by David Graus, Daan Odijk and Maarten de Rijke, to be published in the Journal of the Association for Information Science and Technology is online now at this location.

In the paper we study how collective memories are formed online. We do so by tracking entities that emerge in public discourse, that is, in online text streams such as social media and news streams, before they are incorporated into Wikipedia, which, we argue, can be viewed as an online place for collective memory. By tracking how entities emerge in public discourse, that is, the temporal patterns between their first mention in online text streams and subsequent incorporation into collective memory, we gain insights into how the collective remembrance process happens online. Specifically, we analyze nearly 80,000 entities as they emerge in online text streams before they are incorporated into Wikipedia. The online text streams we use for our analysis comprise of social media and news streams, and span over 579 million documents in a time span of 18 months. We discover two main emergence patterns: entities that emerge in a “bursty” fashion, that is, that appear in public discourse without a precedent, blast into activity and transition into collective memory. Other entities display a “delayed” pattern, where they appear in public discourse, experience a period of inactivity, and then resurface before transitioning into our cultural collective memory.

WSDM 2018 paper on Why People Search for Images using Web Search Engines online

“Why People Search for Images using Web Search Engines” by Xiaohui Xie, Yiqun Liu, Maarten de Rijke, Jiyin He, Min Zhang and Shaoping Ma is online now at this location. It will be published at WSDM 2018.

What are the intents or goals behind human interactions with image search engines? Knowing why people search for images is of major concern to Web image search engines because user satisfaction may vary as intent varies. Previous analyses of image search behavior have mostly been query-based, focusing on what images people search for, rather than intent-based, that is, why people search for images. To date, there is no thorough investigation of how different image search intents affect users’ search behavior.

In this paper, we address the following questions: (1) Why do people search for images in text-based Web image search systems? (2) How does image search behavior change with user intent? (3) Can we predict user intent effectively from interactions during the early stages of a search session? To this end, we conduct both a lab-based user study and a commercial search log analysis.

We show that user intents in image search can be grouped into three classes: Explore/Learn, Entertain, and Locate/Acquire. Our lab-based user study reveals different user behavior patterns under these three intents, such as rst click time, query reformulation, dwell time and mouse movement on the result page. Based on user interaction features during the early stages of an image search session, that is, before mouse scroll, we develop an intent classi er that is able to achieve promising results for classifying intents into our three intent classes. Given that all features can be obtained online and unobtrusively, the predicted intents can provide guidance for choosing ranking methods immediately after scrolling.

IRJ paper “Neural information retrieval: at the end of the early years” online

Our Information Retrieval Journal paper “Neural information retrieval: at the end of the early years” by Kezban Dilek Onal, Ye Zhang, Ismail Sengor Altingovde, Md Mustafizur Rahman, Pinar Karagoz, Alex Braylan, Brandon Dang, Heng-Lu Chang, Henna Kim, Quinten McNamara, Aaron Angert, Edward Banner, Vivek Khetan, Tyler McDonnell, An Thanh Nguyen, Dan Xu, Byron C. Wallace, Maarten de Rijke, and Matthew Lease is available online now at this location.

A recent “third wave” of neural network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language processing. Because these modern NNs often comprise multiple interconnected layers, work in this area is often referred to as deep learning. Recent years have witnessed an explosive growth of research into NN-based approaches to information retrieval (IR). A significant body of work has now been created. In this paper, we survey the current landscape of Neural IR research, paying special attention to the use of learned distributed representations of textual units. We highlight the successes of neural IR thus far, catalog obstacles to its wider adoption, and suggest potentially promising directions for future research.

« Older posts

© 2018 Maarten de Rijke

Theme by Anders NorenUp ↑