Loading Events

Dissertation Defense

Connecting Sight and Sound through Space, Time and Language

Ziyang Chen
WHERE:
3316 EECS BuildingMap
SHARE:
Ziyang Chen Defense Photo

PASSCODE: 6wy3Z8

 

Sight and sound are interconnected modalities that shape our perception of the world. While semantic and temporal correspondences between vision and audio have been widely studied, many other audio-visual correlations remain underexplored. In this talk, we examine these underexplored correspondences through space, time, and language, demonstrating how they can be leveraged self-supervisedly. We first investigate the geometric consistency between visual and spatial audio to learn sound localization and camera rotation jointly and explore how ambient sounds can be used to predict 3D scene structure. Next, we introduce a video-guided sound generation framework that learns semantic and temporal associations across audio, video, and text. Finally, we study the visual correspondence between images and spectrograms, creating visual spectrograms that resemble images and can also be played as sound using diffusion models.

 

CHAIR: Professor Andrew Owens