BoostCluster: Boosting Clustering by Pairwise Constraints
Add to Google Calendar
Data clustering is an important task in many disciplines. A large number of studies have attempted to improve data clustering by using the side information that is often encoded as pairwise constraints.
However, these studies focus on designing special clustering algorithms that can effectively exploit the pairwise constraints. We present a boosting framework for data clustering, termed as BoostCluster, that is able to iteratively improve the accuracy of any given clustering algorithm by exploiting the pairwise constraints. The key challenge in designing a boosting framework for data clustering is how to influence an arbitrary clustering algorithm with the side information since clustering algorithms by definition are unsupervised. The proposed framework addresses this problem by dynamically generating new data representations at each iteration that are, on the one hand, adapted to the clustering results at previous iterations by the given algorithm, and on the other hand consistent with the given side information. Our empirical study shows that the proposed boosting framework is effective in improving the performance of a number of popular clustering algorithms (K-means, partitional SingleLink, spectral clustering), and its performance is comparable to the state-of-the-art algorithms for data clustering by side information.
Dr. Rong Jin is an Assistant Professor in the Computer and Science Engineering Dept. of Michigan State University since 2003. His research is mainly focused on statistical machine learning and its application to information retrieval. In the past, he has published over eighty conference and journal articles on the related topics. Dr. Jin holds a B.A. in Engineering from Tianjin University, an M.S. in Physics from Beijing University, and an M.S. and Ph.D. in Computer Science from Carnegie Mellon University. He received the NSF career award in 2006.