DIAMOND # of reported alignments

Issue #62 new
Stephen Nayfach created an issue

I was testing vContact2 and noticed that DIAMOND is being used with default parameters to only report a max of 25 targets per query protein sequence. Is there a way to change this? This value is too low when clustering a large number of closely related genomes. I’d suggest reporting all alignments passing a specified evalue and coverage thresholds.

#CPU threads: 8
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: test
#Target sequences to report alignments for: 25

Comments (2)

  1. Ben Bolduc

    Thank you for mentioning this. We’ll be incorporating your feedback in the next update with two (well, one) new parameters: “--reported-alignments” and “--pc-evalue”. “--pc-evalue” is replacing “--blast-evalue”, as it was only used for blastp.

    vConTACT2 was thoroughly benchmarked using BLASTP, as mentioned in its publication. We switched to Diamond later as it was shown to be much faster and provided nearly identical (but not exact) results in the final VC network. If you run the updated vContact2 with these new parameters, please do let us know how it turned out.

    -Ben

  2. Stephen Nayfach reporter

    Sure! I also might suggest changing the default to a high value like 10000. Otherwise you may be missing alignments for genes shared between genomes, especially when the dataset contains many similar viruses. Applied to a gut virome dataset, changing the default from 25 to 10000 increased the number of genome-to-genome connections and significantly reduced the overall number of clusters identified.

  3. Log in to comment