#### Communications and Signal Processing Seminar

# Leveraging Coding for Distributed Computing

**Abstract**

Large scale computing clusters that process huge amounts of data are ubiquitous in both industry and

academia. The widespread usage of these clusters presents several advantages over traditional computing paradigms. However, they also present newer challenges where coding-theoretic methods are poised to make a major impact. For example, such clusters are well known to suffer from the problem of “stragglers” (slow or failed nodes in the system). A qualitatively different problem arises in distributed computing frameworks such as MapReduce which intertwine node computation and network communication. For several classes of jobs, the network communication time required for the shuffle-phase of MapReduce can be a substantial part of the overall job execution time.

This talk will consist of two parts. In the first part I will provide a brief overview of my group’s work on using innovative code constructions for addressing both these issues. Both problems require examining coding theoretic ideas from different angles. In the second part, I will discuss in detail our recent work on straggler mitigation for distributed matrix computations. Matrix computations are a key component of various algorithms within machine learning and scientific computing. Prior work in this area has largely treated stragglers as equivalent to erasures in coding theory. However, in the distributed computation context, one often has slow (rather than failed) workers and efficient techniques for exploiting partial stragglers are needed. Another equally important issue is that matrix computations occur over the reals which introduce crucial considerations such as the numerical precision of the recovered result. For example, Reed-Solomon codes (which enjoy the best node failure tolerance threshold) face severe numerical stability issues in this context, owing to the high condition number of the associated Vandermonde matrices that need to be inverted for decoding. I will discuss our recent work on using universally decodable matrices (UDMs) and random convolutional codes for these problems. UDMs are a systematic way to leverage partial computations by the stragglers. Random convolutional codes allow for fast decoding while enjoying an optimal tolerance threshold. In both cases we obtain numerically robust solutions.

**Biography**

Aditya Ramamoorthy is a Professor of Electrical and Computer Engineering and (by courtesy) of Mathematics at Iowa State University. He received his B. Tech. degree in Electrical Engineering from the Indian Institute of Technology, Delhi in 1999 and the M.S. and Ph.D. degrees from the University of California, Los Angeles (UCLA) in 2002 and 2005 respectively. He was a systems engineer at Biomorphic VLSI Inc. till 2001. From 2005 to 2006 he was with the data storage signal processing group at Marvell Semiconductor Inc. His research interests are in the areas of network information theory, channel coding and signal processing for bioinformatics and nanotechnology. Dr. Ramamoorthy served as an editor for the IEEE Transactions on Communications from 2011 — 2015. He is currently serving as an associate editor for the IEEE Transactions on Information Theory. He is the recipient of the 2012 Iowa State University’s Early Career Engineering Faculty Research Award, the 2012 NSF CAREER award, and the Harpole-Pentair professorship in 2009 and 2010.