Non-square matrix of Szymkiewicz-Simpson dissimilarity with KMA

Issue #76 new
Former user created an issue

Dear developers,

Thanks for developing this tool!

I was trying to use Szymkiewicz-Simpson dissimilarity with KMA for comparing 83 different sequences. However, I got a non-square distance matrix with the first sequence not compared with the rest of the sequences. I can have a workaround by creating a first pseudo-sequence. However, is it normal, or did I understand something wrongly?

Below are the codes I have used:

kma index -i sequence.fa -o kma_db/seq
kma dist -t_db kma_db/seq -o kma_dist/seq -f 4 -d 2048 -t 20 

Thanks, Yu

Comments (2)

  1. ptlcc

    Dear Yu

    For the symmetrical dissimilarity measures (i.e. when d(x,y) = d(y,x)) KMA will output a lower triangular distance matrix in phylip format. This is to save space and avoid redundancy, as the upper and lower part will be identical.

    Depending on the sequences in “sequence.fa“, you might want to include “-Sparse -“ in the indexing step. This will include the reverse complement of the sequences when computing the distance matrix. Which does not really matter if the sequences are genes with the same reading direction.

    Best,
    Philip

  2. Yu Yang

    Dear Philip,

    Thanks for your super fast response and thanks for the clarification and suggestion!

    Much appreciated!

    Yu

  3. Log in to comment