Course Description
In this ICCV 2009 tutorial we focus on the challenges in visual search using color, present methods how to achieve state-of-the-art performance, and indicate how to obtain improvements in the near future. Moreover, we give an overview of the latest developments and future trends in the field of visual search based on the Pascal VOC and TRECVID benchmarks -- the leading benchmarks for image and video retrieval.
The scientific topic of coloring visual search is dominated by four major challenges:
- the semantic gap between a visual concept and its lingual representation;
- the sensory gap between an object and it many appearances due to the accidental sensing conditions;
- the model gap between the amount of notions in the world and the capacity to learn them;
- the interface gap between the tiny window the screen offers to the amount of data;
We integrate the features and machine learning aspects into a complete concept-based video search engine, which has successfully competed in TRECVID. The system includes computer vision, machine learning, information retrieval, and human-computer interaction. We follow the video data as they flow through the computational processes. Starting from fundamental visual features, covering local shape, texture, color, motion and the crucial need for invariance. Then, we explain how (color) invariant features can be used in concert with kernel-based supervised learning methods to arrive at a concept detector. We discuss the important role of fusion on a feature, classifier, and semantic level to improve the robustness and general applicability of detectors. We end our component-wise decomposition of video search engines by explaining the complexities involved in delivering a limited set of uncertain concept detectors to an inpatient user. For each of the components we review state-of-the-art solutions in literature, each having different characteristics and merits.
Comparative evaluation of methods and systems is imperative to appreciate progress. We discuss the data, tasks, and results of Pascal VOC and TRECVID, the leading benchmarks. In addition, we discuss the many derived community initiatives in creating annotations, baselines, and software for repeatable experiments. We conclude the course with our perspective on the many challenges and opportunities ahead for the computer vision community.
Lecture Topics
The technical content of our tutorial on coloring visual search is organized as follows:- Introduction
- Tutorial objectives,
- Problem statement: social, business, and scientific,
- Course organization: color fundamentals, visual concept detection, image and video retrieval, evaluation.
- Color interest point and region detection
- Invariance and color constancy: the sensory and semantic gap,
- Color models and fusion: representation,
- Interest points: color vs intensity,
- Region descriptors: SIFT and color SIFT.
- Visual Concept Detection
- Concept detection: compact feature representations, kernel-based supervised learning, the model gap,
- Feature fusion: synchronization, normalization, transformation, and concatenation,
- Classifier fusion: supervised and unsupervised methods,
- Semantic fusion: graphical models, data mining, and ontologies,
- Search engine architectures: component optimization, process-optimization.
- Image and Video Retrieval
- Large-scale concept detection: annotation efforts, detector performance,
- Translating queries to detectors: textual, visual, semantic, and their combination,
- Interacting with the user through the interface gap: browsing and learning.
- Evaluation
- Benchmarks: data, tasks, and results,
- Benchmark criticism: broad-domain applicability, repeatability, VideOlympics showcase,
- Resources: annotations, baselines, and software,
- Demonstration of the MediaMill Semantic video search engine.
- Conclusion
- Concluding remarks: achievements and discussion,
- Future work: challenges and opportunities for the computer vision community.
Lecture Material
The lecture slides, including pointers to data sets, software, video's, as well as several general references are available here. Many related publications by the tutorial instructors can be found on the publication server of the Intelligent Systems Lab Amsterdam.
Instructors Bios
Cees G.M. Snoek received the M.Sc. degree in business information systems (2000) and the Ph.D. degree in computer science (2005) both from the University of Amsterdam, The Netherlands, where he is currently a senior researcher at the Intelligent Systems Lab Amsterdam. He was a Visiting Scientist at Informedia, Carnegie Mellon University, USA in 2003. His research interests focus on multimedia signal processing and analysis, statistical pattern recognition, content-based information retrieval, social media retrieval, and large-scale benchmark evaluations, especially when applied in combination for video retrieval. He has published over 60 refereed book chapters, journal and conference papers in these fields, and serves on the program committee of several conferences. Dr. Snoek is a lead researcher of the award-winning MediaMill Semantic Video Search Engine, which is a consistent top performer in the yearly NIST TRECVID evaluations. He is initiator and co-organizer of the annual VideOlympics, and was the local chair of the 2007 ACM International Conference on Image and Video Retrieval. He is a lecturer of post-doctoral courses given at international conferences and European summer schools. He is a member of ACM and IEEE. Dr. Snoek received a young talent (VENI) grant from the Netherlands Organization for Scientific Research in 2008.
Theo Gevers is an Associate Professor of Computer Science at the University of Amsterdam, The Netherlands and an ICREA Research Professor at the Computer Vision Center (UAB), Barcelona, Spain. At the University of Amsterdam he is a teaching director of the MSc of Artificial Intelligence. He currently holds a VICI-award (for excellent researchers) from the Dutch Organisation for Scientific Research. His main research interests are in the fundamentals of content-based image retrieval, colour image processing and computer vision specifically in the theoretical foundation of geometric and photometric invariants. He is co-chair of the Internet Imaging Conference (SPIE 2005, 2006), co-organizer of the First International Workshop on Image Databases and Multi Media Search (1996), the International Conference on Visual Information Systems (1999, 2005), the Conference on Multimedia & Expo (ICME, 2005), and the European Conference on Colour in Graphics, Imaging, and Vision (CGIV, 2012). He is guest editor of the special issue on content-based image retrieval for the International Journal of Computer Vision (IJCV 2004) and the special issue on Colour for Image Indexing and Retrieval for the journal of Computer Vision and Image Understanding (CVIU 2004). He has published over 100 papers on colour image processing, image retrieval and computer vision. He is program committee member of a various number of conferences, and an invited speaker at major conferences. He is a lecturer of post-doctoral courses given at various major conferences (CVPR, ICPR, SPIE, CGIV). He is member of the IEEE.
Arnold W.M. Smeulders graduated from Technical University of Delft in physics in 1977 (M.Sc.) and in 1982 from Leyden University in medicine (Ph.D.) on the topic of visual pattern analysis. In 1994, he became full professor in multimedia information analysis at the University of Amsterdam. He has an interest in cognitive vision, content-based image retrieval, the picture-language question as well as in systems for the analysis of video. He has written over 250 papers in refereed journals and conferences. He received a Fulbright grant at Yale University in 1987, and a visiting professorship at the City University Hong Kong in 1996, and again at Tsukuba Japan in 1998. In 2000, he was elected fellow of International Association of Pattern Recognition. He was associated editor of IEEE Transactions PAMI. Currently he is associated editor of the International Journal for Computer Vision as well as the IEEE Transactions Multimedia. He is a member of the steering committee of the IEEE's International Conference on Multimedia and Expo series. He participates in the DELOS and MUSCLE networks of excellence of the EU. He was keynote speaker and chairman of the program committee of conferences including the IEEE Multimedia conference in Florence in 1999, ICIP 2000, CVPR in 2001 and CIVR in 2004 in Dublin. He was general chair of ICME2005 in Amsterdam. In 1996, he was treasurer of the Faculty and director of the Informatics Institute at the University of Amsterdam. Currently, he is scientific director of the Intelligent Systems Lab Amsterdam of 65 staff members, the MultimediaN national public-private partnership of 30 institutions and companies, and of the national research school ASCI. He has graduated 32 PhD-students.



