Note: We are aware of issues with running the Windows version. Some users report that it seems to work with Windows 7. However, this seems to be hardware-dependent. We're hoping to figure out the problem at some point.
pClust is a fast and accurate protein clustering software suite. The current GUI version available here is v1.01.
The GUI version was used to cluster up to a little over 1 million protein sequences, but it required a 4-core machine with lots of memory and took more than 10 hours to run. It is meant to be used more routinely for projects involving up to 500 thousand sequences. 120 thousand sequences takes less than 10 minutes on an 8GB Windows desktop.
On the Downloads page (little cloud to the left) you can find the following files:
- pClust installer for Windows
- pClust manual
- pClust tutorial slides
- Copyright notices for pClust and related software
- pClust v. 1.01 source code for Windows
- pClust installer for MacOSX
- R program for creating membership matrices from the pClust output file
An R program can be downloaded that will create membership matrices (full and 0-1) from the cluster membership file (the final output from pClust). An example is included with the code which contains two input files, a sample cluster membership file of 28 organisms and a file with the accumulative lengths of the 28 organisms in the order in which they were processed (order is important because otherwise there's no way to match sequences with organisms). The membership files created by the R program are .csv files. Use the 0-1 matrix to create a Manhattan distance matrix which can then be used in a network software program such as the one we used visone.
pClust is released under BSD-3-Clause license.
When using pClust, please cite the following:
- Lockwood, S., Brayton, K. A., and Broschat, S. L. (2016). Comparative genomics reveals multiple pathways to mutualism for tick-borne pathogens. BMC Genomics.
- Daily, J., Kalyanaraman, A., Krishnamoorthy, S., and Vishnu, A. (2015). A work stealing based approach for enabling scalable optimal sequence homology detection. Journal of Parallel and Distributed Computing, 79:132-142.
- Lu, H., Halappanavar, M., and Kalyanaraman, A. (2015). Parallel heuristics for scalable community detection. Parallel Computing, 47:19-37.