Visual recognition and reconstruction in the three-dimensional world
Add to Google Calendar
The ability to interpret the semantic of objects and actions, their individual geometric attributes as well as their spatial and temporal relationships within the environment is essential for an intelligent visual system and extremely valuable in numerous applications. In visual recognition, the problem of categorizing generic objects is a highly challenging one. Single objects vary in appearances and shapes under various photometric (e.g. illumination) and geometric (e.g. scale, view point, occlusion, etc.) transformations. Largely due to the difficulty of this problem, most of the current research in object categorization has focused on modeling object classes in single (or nearly single) views. But our world is fundamentally 3D and it is crucial that we design models and algorithms that can handle such appearance and pose variability. In the first part of the talk I introduce a novel framework for learning and recognizing 3D object categories and their poses. Our approach is to capture a compact model of an object category by linking together diagnostic parts of the objects from different viewing points. The resulting model is a summarization of both the appearance and geometry information of the object class. Unlike earlier attempts for 3D object categorization, our framework requires minimal supervision and has the ability to synthesize unseen views of an object category. Our results on categorization show superior performances to state-of-the-art algorithms on the largest dataset up to date. In the second part, I present a new framework for modeling the overall geometrical and temporal organization of scenes. This is done by learning the typical distribution of spatial and temporal relationships among elements in scenes. Our model is extremely compact and can be learned in an unsupervised fashion. Experiments demonstrate that the added ability of modeling such spatial and temporal relationships is useful in several recognition tasks, such as scene/object categorization and human action classification. I will conclude the talk with final remarks on the relevance of the proposed research for a number of applications in mobile vision.
Silvio Savarese is an Assistant Professor of Electrical Engineering at the University of Michigan, Ann Arbor. He earned his PhD in Electrical Engineering from the California Institute of Technology in 2005. He joined the University of Illinois at Urbana-Champaign from 2005 to 2008 as a Beckman Institute Fellow. In 2002 he was a recipient of the Walker von Brimer Award for outstanding research initiative. His research interests include computer vision, object and scene recognition, shape representation and reconstruction, human visual perception and visual psychophysics.