Video Search Engines

Cees G.M. Snoek and Arnold W.M. Smeulders
University of Amsterdam
Science Park 107
1098 XG Amsterdam, The Netherlands
{cgmsnoek, ArnoldSmeulders}@uva.nl

News

Sponsors




Course Description

In this half-day CVPR 2010 course we discuss the problems of video search, present methods how to achieve state-of-the-art performance, and indicate how to obtain improvements in the near future. We give an overview of the developments and future trends in the field on the basis of the TRECVID competition -- the leading competition for video search engines run by NIST -- where we have consistently scored a top-three performance over the last five years.

The scientific topic of video search is dominated by five major challenges:

The semantic gap is bridged by forming a dictionary of visual concept detectors. The largest ones to date consist of hundreds of concepts excluding concept-tailored algorithms. It would simply take too long to achieve. Instead, we come closer to the ideal of one computer vision algorithm tailored automatically to the purpose at hand by employing example data to learn from. We discuss the advantages and limitations of a machine learning approach from examples. We show for what type of concept the approach is likely to succeed or fail. In compensation for the absence of concept-specific (geometric or appearance) models, we emphasize the importance of a good feature sets. They form the basis of the observational model by all possible color, shape, texture or structure invariant features help to characterize the concept at hand. Apart from good features, the other essential component is state-of- the-art machine learning in order to get the most out of the learning data.

We integrate the features and machine learning aspects into a complete concept-based video search engine, which has successfully competed in TRECVID. The system includes computer vision, machine learning, information retrieval, and human-computer interaction. We follow the video data as they flow through the computational processes. Starting from fundamental visual features, covering local shape, texture, color, motion and the crucial need for invariance. Then, we explain how invariant features can be used in concert with kernel-based supervised learning methods to arrive at a concept detector. We discuss the important role of fusion on a feature, classifier, and semantic level to improve the robustness and general applicability of detectors. We end our component-wise decomposition of video search engines by explaining the complexities involved in delivering a limited set of uncertain concept detectors to an inpatient user. For each of the components we review state-of-the-art solutions in literature, each having different characteristics and merits.

Comparative evaluation of methods and systems is imperative to appreciate progress. We discuss the data, tasks, and results of TRECVID, the leading benchmark. In addition, we discuss the many derived community initiatives in creating annotations, baselines, and software for repeatable experiments. We conclude the course with our perspective on the many challenges and opportunities ahead for the computer vision and pattern recognition community.

Lecture Topics

The technical content of our short course on video search engines is organized as follows:

Lecture Material

The lecture slides, including pointers to data sets, software, video's, as well as several general references are available here.

Several relevant papers are listed on our publication server.

Instructors Bios

Cees G.M. Snoek received the M.Sc. degree in business information systems (2000) and the Ph.D. degree in computer science (2005) both from the University of Amsterdam, The Netherlands, where he is currently a senior researcher at the Intelligent Systems Lab Amsterdam. He was a Visiting Scientist at Informedia, Carnegie Mellon University, USA in 2003. His research interests focus on multimedia signal processing and analysis, statistical pattern recognition, content-based information retrieval, social media retrieval, and large-scale benchmark evaluations, especially when applied in combination for video retrieval. He has published over 70 refereed book chapters, journal and conference papers in these fields, and serves on the program committee of several conferences. Dr. Snoek is a lead researcher of the award-winning MediaMill Semantic Video Search Engine, which is a consistent top performer in the yearly NIST TRECVID evaluations. He is initiator and co-organizer of the annual VideOlympics, and was the local chair of the 2007 ACM International Conference on Image and Video Retrieval. He is a lecturer of post-doctoral courses given at international conferences and European summer schools. He is a member of ACM and IEEE. Dr. Snoek received a young talent (VENI) grant from the Netherlands Organization for Scientific Research in 2008.

Arnold W.M. Smeulders graduated from Technical University of Delft in physics in 1977 (M.Sc.) and in 1982 from Leyden University in medicine (Ph.D.) on the topic of visual pattern analysis. In 1994, he became full professor in multimedia information analysis at the University of Amsterdam. He has an interest in cognitive vision, content-based image retrieval, the picture-language question as well as in systems for the analysis of video. He has written over 250 papers in refereed journals and conferences. He received a Fulbright grant at Yale University in 1987, and a visiting professorship at the City University Hong Kong in 1996, and again at Tsukuba Japan in 1998. In 2000, he was elected fellow of International Association of Pattern Recognition. He was associated editor of IEEE Transactions PAMI. Currently he is associated editor of the International Journal for Computer Vision as well as the IEEE Transactions Multimedia. He is a member of the steering committee of the IEEE's International Conference on Multimedia and Expo series. He participates in the DELOS and MUSCLE networks of excellence of the EU. He was keynote speaker and chairman of the program committee of conferences including the IEEE Multimedia conference in Florence in 1999, ICIP 2000, CVPR in 2001 and CIVR in 2004 in Dublin. He was general chair of ICME2005 in Amsterdam. In 1996, he was treasurer of the Faculty and director of the Informatics Institute at the University of Amsterdam. Currently, he is scientific director of the Intelligent Systems Lab Amsterdam of 65 staff members, the MultimediaN national public-private partnership of 30 institutions and companies, and of the national research school ASCI. He has graduated 32 PhD-students.