More sequences input, much less it will cluster with refseq207
Issue #64
new
Dear authors:
I test vcontact2 using part of my sequences or the whole sequences. (Though I would expect some small difference of the results) the results differed greatly. More sequences input, much less my sequences would cluster with refseq207.
Is there something wrong with my code:
time vcontact2 --raw-proteins cat_virome.faa
--rel-mode 'Diamond'
--proteins-fp cat_virome_map.csv
--db 'ProkaryoticViralRefSeq207-Merged'
--pcs-mode MCL --vcs-mode ClusterONE
-t 60
--c1-bin /data/db/MAVERICLab-vcontact2-34ae9c466982/bin/cluster_one-1.0.jar
--output-dir est90_only.vContact2-refseq207
Thank you!
Comments (1)
-
reporter - Log in to comment
For example, if input 1000 sequences, 30 sequences could cluster with at least one genome in refseq in a subsectet of 700 sequcences . if input 10000 sequences, only 15 sequences could cluster with at least one genome in refseq in the same subsectet of 700 sequcences. if input 250000 sequences, 2 sequences only. What is more, the 2 sequences are not all in 15, the 15 are not all in 30!