Faculty Candidate Seminar
Co-designing Distributed Systems and Storage Stacks for Improved Reliability
This event is free and open to the publicAdd to Google Calendar
Zoom link, passcode: 350349
Abstract: Distributed storage systems form the core of modern cloud services. Like many systems software, these systems are built using layering: designers layer distributed protocols (e.g., Paxos, 2PC) upon local storage stacks. Such layering abstracts details about the local storage stack to the layers above, easing development. I will show that such black-box layering, unfortunately, masks vital information, resulting in poor reliability. I will then demonstrate that reliability can be significantly improved by co-designing these layers. In the first half of the talk, I will show how local storage-layer faults in one node can lead to serious vulnerabilities such as global data loss, corruption, and unavailability in many widely used systems. I then present CTRL, a new foundation that uses the co-design approach to avoid such problems, improving reliability. I implement CTRL in two practical systems and show that CTRL greatly improves resiliency to storage faults while incurring little performance overhead.
Bio: Ram Alagappan is a postdoctoral researcher at the VMware Research Group. He earned his Ph.D., working with Professors Andrea Arpaci-Dusseau and Remzi Arpaci-Dusseau at the University of Wisconsin – Madison. His work has been published at top systems venues and has won three best paper awards (FAST 17, 18, and 20). His dissertation also won an honorable mention for the UW CS Best Dissertation. His open-source frameworks have had a practical impact: these tools have exposed more than 80 severe vulnerabilities across 20 widely used systems. Ideas from his work have been adopted by a financial database to make it more robust.