Faculty Candidate Seminar
Machine Learning Powered Query Optimization
This event is free and open to the publicAdd to Google Calendar
Zoom link, passcode: 955781
Abstract: Database management systems (DBMSes) depend on query optimizers to transform a user’s declarative query into an efficient execution plan. Query optimizers are critical because a bad query plan can be orders of magnitude slower than the optimal plan. Modern query optimizers are complex and expensive to maintain, as they integrate a wide range of hand-tuned heuristics and manually-engineered cost models which must be updated for every new capability added to the DBMS. I will present two recent approaches to query optimization that leverage deep reinforcement learning to simultaneously improve query performance and decrease maintenance burden. The first approach, Neo (VLDB 19), combines tree convolution neural networks with a novel value iteration technique to fully replace a traditional query optimizer, yielding as much as 2x improvements after just 36 hours of training on stable workloads. The second approach, Bao (SIGMOD 21), targets dynamic workloads, and learns to “steer” an existing query optimizer by training an agent via a contextual multi-armed bandit framework. More broadly, both Neo and Bao highlight the huge potential impact of applying machine learning to systems problems, giving us a glimpse of what a fully learned system could do, as well as highlighting several potential pitfalls along the way.
Bio: Ryan Marcus is a postdoc at MIT, where he researches learned systems. Ryan focuses on the potential of machine learning to underpin the next generation of data management systems, especially query optimization, data storage, and indexing. Before MIT, Ryan received his PhD from Brandeis University, where he studied machine learning techniques for automating cloud data management systems. Ryan is also a scientist at Intel Labs, an avid World of Warcraft player, and generally amenable to every kind of snack you could imagine.