HTTPS SSH

Note: We are aware of issues with running the Windows version. Some users report that it seems to work with Windows 7. However, this seems to be hardware-dependent. We're hoping to figure out the problem at some point.

pClust is a fast and accurate protein clustering software suite. The current GUI version available here is v1.01.

Performance

The GUI version was used to cluster up to a little over 1 million protein sequences, but it required a 4-core machine with lots of memory and took more than 10 hours to run. It is meant to be used more routinely for projects involving up to 500 thousand sequences. 120 thousand sequences takes less than 10 minutes on an 8GB Windows desktop.

Downloads

On the Downloads page (little cloud to the left) you can find the following files:

Postprocessing

An R program can be downloaded that will create membership matrices (full and 0-1) from the cluster membership file (the final output from pClust). An example is included with the code which contains two input files, a sample cluster membership file of 28 organisms and a file with the accumulative lengths of the 28 organisms in the order in which they were processed (order is important because otherwise there's no way to match sequences with organisms). The membership files created by the R program are .csv files. Use the 0-1 matrix to create a Manhattan distance matrix which can then be used in a network software program such as the one we used visone.

Tutorials

Video tutorials are available at the YouTube channel BCB@WSU. The step-thru screenshots are in pClust tutorial slides and pClust manual contains more details about the pClust software.

License

pClust is released under BSD-3-Clause license.

Citation

When using pClust, please cite the following:

  • Lockwood, S., Brayton, K. A., and Broschat, S. L. (2016). Comparative genomics reveals multiple pathways to mutualism for tick-borne pathogens. BMC Genomics.
  • Daily, J., Kalyanaraman, A., Krishnamoorthy, S., and Vishnu, A. (2015). A work stealing based approach for enabling scalable optimal sequence homology detection. Journal of Parallel and Distributed Computing, 79:132-142.
  • Lu, H., Halappanavar, M., and Kalyanaraman, A. (2015). Parallel heuristics for scalable community detection. Parallel Computing, 47:19-37.