This wiki is intended to introduce you to understading, obtaining and running our implementation of canopy clustering algorithm.
Canopy clustering introduction
Our variation of canopy clustering focuses on efficient clustering of points in multi-dimensional pearson correlation space. The basic notion of the heuristic is choosing a point(seed point) at random, and upon establishing it's distance to all the other points, single out those that are within a specified canopy distance. A median profile of those points is then calculated creating a canopy centroid. The canopy creation process is then repeated perpetually(canopy walk) from the previously created centroid until the distance between previous centroid is small enough. All points from the last canopy are then marked, and will not become seed points again. The above will be repeated for all possible seed points.