Locating High-density Clusters with Noisy Queries

Chen Cao, Shifeng Chen, Changqing Zou, Jianzhuang Liu


Semi-supervised learning (SSL) relies on a few labeled samples to explore data’s intrinsic structure through pairwise smooth transduction. The performance of SSL mainly depends on two folds: (1) the accuracy of labeled queries, (2) the integrity of manifolds in data distribution. Both of these qualities would be poor in real applications as data often consist of several irrelevant clusters and discrete noise. In this paper we propose a novel framework to simultaneously remove discrete noise and locate the high-density clusters. Experiments demonstrate that our algorithm is quite effective to solve several problems such as non-feedback image re-ranking and image co-segmentation.



Figure 1. Experiments on two-circle toy data. (a) LGC [13] using initial noisy queries. (b) Label diagnosis [10]. (c) Spectral filter [7]. (d) Our method. The goal is to extract the inner circle (112 points) via noisy queries. Blue markers in (a) indicate initial queries, green markers in (b)(c)(d) indicate purified queries by each method, red markers indicate the experimental results (top 112 ranked points) of each method, and the “cross” markers in (d) indicate globally removed noise by our method.



Figure 2. Three re-ranking results in INRIA database. Top 10 images using (1) the search engine (2) our re-ranking approach. Non-class images are in red box.



Figure 3. Four image pairs in MSRC database to show our co-segmentation performance.




Ÿ  References:

ü C. Cao, S. Chen, C. Zou, and J. Liu, “Locating High-density Clusters with Noisy Queries,” Proc. Int'l Conf. Pattern Recognition (ICPR), 2012. [pdf]