Computer Vision and Multimodal AI

The Computer Vision and Multimodal AI group does fundamental research on extracting information from visual data, such as images, videos and their combination with modalities such as audio, speech, text, touch, etc. Our research extends from the development of novel vision sensors, the extraction of knowledge from sensed data, to the understanding of multimodal signals for human-human and human-machine communication. These tools support applications in robotics, medicine, security, search, augmented reality, human computer interaction, and emerging forms of embodied AI; our research program is enriched by collaboration with these areas and beyond.

Overlapping Application Domains