Robust Continuous Clustering
This is a MATLAB implementation of the RCC and RCC-DR algorithms presented in the following paper (paper):
Sohil Atul Shah and Vladlen Koltun. Robust Continuous Clustering. Proceedings of the National Academy of Sciences (PNAS), 2017.
If you use this code in your research, please cite our paper.
The source code and dataset are published under the MIT license. See LICENSE for details. In general, you can use the code for any purpose with proper attribution. If you do something interesting with the code, we'll be happy to know. Feel free to contact us.
The MATLAB code provided in this repository can be used to reproduce the accuracy results reported in the paper. The runtime reported in the paper was based on a faster C++ implementation.
One should add the MEX files of CMG package to MATLAB path before running the RCC and RCC-DR algorithms. To do so, in the MATLAB console run the following command.
> cd External/CMG/ > MakeCMG
Running Robust Continuous Clustering
The RCC and RCC-DR program takes three parameters: a file storing the features of the data samples and their edge set, a variable indicating the maximum total iteration and a variable indicating the maximum iteration for each graduated non-convexity level.
We have provided an MNIST dataset file in the Data folder. For example, you can run RCC and RCC-DR from the MATLAB console as follows:
> [clustAssign,numcomponents,optTime,gtlabels,nCluster] = RCC('Data/MNIST.mat', 100, 4); > [clustAssign,numcomponents,optTime,gtlabels,nCluster] = RCCDR('Data/MNIST.mat', 100, 4);
The other preprocessed datasets can be found in gdrive folder.
To evaluate the cluster assignment using various measures, use evaluate.m from the Toolbox folder. In MATLAB console, run
[ARI,AMI,NMI,ACC] = evaluate(clustAssign,numcomponents,gtlabels,nCluster);
The input file is a .mat file that stores features of the 'N' data samples in a matrix format N x D. In the MNIST data provided in the repository, N=70000, D=784. It should also contains edge set stored under variable 'w' in a matrix format numpairs x 2 and a vector of ground truth label to be used for evaluation.
To construct edge set and to create preprocessed input file from the raw feature file, use edgeConstruction.py from the Toolbox folder. Run the python program in console,
python edgeConstruction.py --dataset MNIST.pkl --samples 70000 --prep 'minmax' --k 10 --algo 'mknn'
Note that .pkl file should be placed in the Data folder.
- Python Implementation by Yann Henon