Dissertation Defense
Understanding and Identifying Challenges in Design of Safety-Critical AI Systems
This event is free and open to the publicAdd to Google Calendar
As AI systems proliferate into our society, ensuring their safe and reliable deployment has become exceedingly important. To that end, current regulatory frameworks have grounded themselves in risk regulation, assuming potential harms from AI systems can be predicted and mitigated. The goal of this dissertation is to challenge this design choice. We first demonstrate that the unpredictable nature of model capabilities—where unexpected behaviors can suddenly emerge—render preemptive risk assessments inadequate. Second, we discuss limitations of fine-tuning protocols, the current de-facto strategy for mitigating vulnerabilities identified in a model. We show such protocols learn minimal transformations of base capabilities that are insufficient to guarantee safety beyond the data distribution used for fine-tuning. Lastly, we explore how minor input modifications can drastically alter a model’s output, relating this behavior with Bayesian hypothesis selection and hence arguing that establishment of safe use standards for modern, exceedingly open-ended models may be difficult. Overall, the contributions of this dissertation suggest regulation of AI systems requires exploration of more nuanced regulation paradigms that go beyond mere risk regulation.
Chair: Professor Robert Dick