Findig Low-Rank Structure in Messy Data
In order to draw inferences from large, high-dimensional datasets, we often seek simple structure that model the phenomena represented in those data. Low-rank linear structure is one of the most flexible and efficient such models, allowing efficient prediction, inference, and anomaly detection. In this talk we will cover at a high level some results in optimization and high-dimensional probability from the last several years that show how to identify low-rank structure in high-dimensional data despite missing and corrupted data. Additionally, we will discuss two new directions for finding low-rank structure in messy real-world data. In the first, we observe every entry of the matrix through a single unknown monotonic transformation. This is common in calibration and quantization problems. We show that matrix completion is still possible in this context and demonstrate a simple algorithm with guarantees. In the second, our vector observations are heteroscedastic, ie, corrupted by one of several noise variances. This is common in problems like sensor networks or medical imaging, where different measurements of the same phenomenon are taken with different quality sensing (eg high or low radiation). We prove recovery results for principal component analysis (PCA) in this context. We show that recovery for a fixed average noise variance is maximized when the noise variances are equal, implying that while average noise variance is often a convenient measure of the overall quality of the data, it gives an overly optimistic estimate of PCA performance.
Professor Balzano's research projects are in statistical signal processing, matrix factorization, and optimization, particularly dealing with large and messy data. She has worked in the areas of online algorithms, non-convex formulations for matrix factorization, compressed sensing and matrix completion, network inference, and sensor networks. Her interests are theoretical, however, her favorite mathematical problems are motivated by fascinating and important engineering problems.