Loading Events

Faculty Candidate Seminar

Understanding and Enhancing Deep Neural Networks with Automated Interpretability

Tamar Rott ShahamPostdocMassachusetts Institute of Technology
WHERE:
3725 Beyster Building
SHARE:
Tamar Rott Shaham

Zoom link for remote attendees

Meeting ID: 999 4631 3276 Passcode: 123123

Abstract: Deep neural networks are becoming incredibly sophisticated; they can generate realistic images, engage in complex dialogues, analyze intricate data, and execute tasks that appear almost human-like. But how do such models achieve these abilities?

In this talk, I will present a line of work that aims to explain behaviors of deep neural networks. This includes a new approach for evaluating cross-domain knowledge encoded in generative models, tools for uncovering core mechanisms in large language models, and their behavior under fine-tuning. I will show how to automate and scale the scientific process of interpreting neural networks with the Automated Interpretability Agent, a system that autonomously designs experiments on models’ internal representations to explain their behaviors. I will demonstrate how such understanding enables mitigating biases and enhancing models’ performance. The talk will conclude with a discussion of future directions, including developing universal interpretability tools and extending interpretability methods to automate scientific discovery.

Bio: Tamar Rott Shaham is a postdoctoral researcher at MIT CSAIL in Prof. Antonio Torralba’s lab. She earned her PhD from the ECE faculty at the Technion, supervised by Prof. Tomer Michaeli. Tamar has received several awards, including the ICCV 2019 Best Paper Award (Marr Prize), the Google WTM Scholarship, the Adobe Research Fellowship, the Rothchild Postdoctoral Fellowship, the Vatat-Zuckerman Postdoctoral Scholarship, and the Schmidt Postdoctoral Award.

Organizer

Stephanie Jones

Faculty Host

Lu Wang