Robust Continuous Clustering /

Filename Size Date modified Message
6.1 KB
initial commit
42 B
added lfs attr
1.2 KB
initial commit
3.8 KB
initial commit
5.9 KB
initial commit
3.8 KB edited
533 B
initial commit

Robust Continuous Clustering


This is a MATLAB implementation of the RCC and RCC-DR algorithms presented in the following paper (paper):

Sohil Atul Shah and Vladlen Koltun. Robust Continuous Clustering. Proceedings of the National Academy of Sciences (PNAS), 2017.

If you use this code in your research, please cite our paper.

  title={Robust continuous clustering},
  author={Shah, Sohil Atul and Koltun, Vladlen},
  journal={Proceedings of the National Academy of Sciences},
  publisher={National Acad Sciences}

We have further improved upon this work and extended dimensionality reduction using Deep Autoencoders in the following paper (paper) (code):

Sohil Atul Shah and Vladlen Koltun. Deep Continuous Clustering.

The source code and dataset are published under the MIT license. See LICENSE for details. In general, you can use the code for any purpose with proper attribution. If you do something interesting with the code, we'll be happy to know. Feel free to contact us.

We include two external packages in the codebase (CMG and Geometry Processing Toolbox). These packages are under a BSD-style license. See External/README.txt for details.

The MATLAB code provided in this repository can be used to reproduce the accuracy results reported in the paper. The runtime reported in the paper was based on a faster C++ implementation.


One should add the MEX files of CMG package to MATLAB path before running the RCC and RCC-DR algorithms. To do so, in the MATLAB console run the following command.

> cd External/CMG/
> MakeCMG

Running Robust Continuous Clustering

The RCC and RCC-DR program takes three parameters: a file storing the features of the data samples and their edge set, a variable indicating the maximum total iteration and a variable indicating the maximum iteration for each graduated non-convexity level.

We have provided an MNIST dataset file in the Data folder. For example, you can run RCC and RCC-DR from the MATLAB console as follows:

> [clustAssign,numcomponents,optTime,gtlabels,nCluster] = RCC('Data/MNIST.mat', 100, 4);
> [clustAssign,numcomponents,optTime,gtlabels,nCluster] = RCCDR('Data/MNIST.mat', 100, 4);

The other preprocessed datasets can be found in gdrive folder.


To evaluate the cluster assignment using various measures, use evaluate.m from the Toolbox folder. In MATLAB console, run

[ARI,AMI,NMI,ACC] = evaluate(clustAssign,numcomponents,gtlabels,nCluster);

Creating input

The input file is a .mat file that stores features of the 'N' data samples in a matrix format N x D. In the MNIST data provided in the repository, N=70000, D=784. It should also contains edge set stored under variable 'w' in a matrix format numpairs x 2 and a vector of ground truth label to be used for evaluation.

To construct edge set and to create preprocessed input file from the raw feature file, use from the Toolbox folder. Run the python program in console,

python --dataset MNIST.pkl --samples 70000 --prep 'minmax' --k 10 --algo 'mknn'

Note that .pkl file should be placed in the Data folder.

Other Implementation

  1. Python Implementation by Yann Henon