Communications and Signal Processing Seminar
Optimally Weighted PCA for High-Dimensional Heteroscedastic Data
Add to Google Calendar
Modern applications increasingly involve high-dimensional and heterogeneous data, e.g., datasets formed by combining numerous measurements from myriad sources. Principal Component Analysis (PCA) is a classical method for reducing dimensionality by projecting data onto a low-dimensional subspace capturing most of their variation, but it does not robustly recover underlying subspaces in the presence of heteroscedastic noise. Specifically, PCA suffers from treating all data samples as if they are equally informative. We will discuss the consequences of this on performance, which lead us naturally to consider weighting PCA in such a way that we give less influence to samples with larger noise variance. Doing so better recovers underlying principal components, but precisely how to choose the weights turns out to be an interesting problem "” Surprisingly, we show that whitening the noise by using inverse noise variance is sub-optimal. Our analysis provides expressions for the asymptotic recovery of underlying low-dimensional components from samples with heteroscedastic noise in the high-dimensional regime. We derive optimal weights and characterize the performance of optimally weighted PCA. Joint work with David Hong and Jeff Fessler.
Laura Balzano is an assistant professor in Electrical Engineering and Computer Science at the University of Michigan. She is an Intel Early Career Faculty Fellow, a 3M Non-tenured Faculty Awardee, and an Army Research Office Young Investigator. She received a BS from Rice University in Electrical and Computer Engineering, MS from the University of California in Los Angeles in Electrical Engineering, and PhD from the University of Wisconsin in Electrical and Computer Engineering. She received the Outstanding MS Degree of the year award from the UCLA EE Department, and the Best Dissertation award from the University of Wisconsin ECE Department. Her PhD was supported by a 3M fellowship. Her main research focus is on modeling with big, messy data "” highly incomplete or corrupted data, uncalibrated data, and highly heterogeneous data "” and its applications in networks, environmental monitoring, and computer vision. Her expertise is in statistical signal processing, matrix factorization, and optimization.