Leveraging Information Theory to Practical Machine Learning: Minimum Description Length Regularization for Online Learning
Classical online learning techniques enforce a prior distribution on the objective to be optimized in order to induce model sparsity. Such prior distributions are chosen with mathematical convenience in mind, but not necessarily for being the best priors. The Minimum Description Length (MDL) principle is usually used with two pass strategies, one for feature selection, and a second one for optimization with the selected features.
An approach inspired by the Minimum Description Length principle is proposed for adaptively selecting and regularizing features during online learning based on their usefulness in improving the objective. The approach eliminates noisy or useless features from the optimization process, leading to improved loss. By utilizing the MDL principle, this approach enables an optimizer to reduce the problem dimensionality to the subspace of the feature space for which the smallest loss is obtained. The approach can be tuned for trading off between model sparsity and accuracy. Empirical results on large scale practical real-world systems demonstrate how it improves such tradeoffs. Huge model size reductions can be achieved with no loss in performance relative to standard techniques, while moderate loss improvements (which can translate to large regret improvements) are achieved with moderate size reductions. The results also demonstrate that overfitting is mitigated by this approach. Analysis shows that the approach can achieve the loss of optimizing with the best feature subset.
Gil Shamir received the B.Sc. (Cum Laude), and M.Sc. degrees from the Technion, Israel "“ Institute of Technology, Haifa, Israel in 1990 and 1997, respectively, and the Ph.D. degree from the University of Notre Dame, Notre Dame, IN, U.S.A. in 2000, all in electrical engineering.
From 1990 to 1995 he participated in research and development of signal processing and communication systems. From 1995 to 1997 he was with the Electrical Engineering Department at the Technion "“ Israel Institute of Technology, as a graduate student and teaching assistant. From 1997 to 2000 he was a Ph.D. student and a research assistant in the Electrical Engineering Department at the University of Notre Dame, and then a post-doctoral fellow until 2001. During his tenure at Notre Dame he was a fellow of the Center for Applied Mathematics of the university. Between 2001 and 2008 he was with the Electrical and Computer Engineering Department at the University of Utah, and between 2008 and 2009 with Seagate Research. Since 2009 he has been with Google. His main research interests include information theory, machine learning, coding and communication theory. Dr. Shamir received an NSF CAREER award in 2003.