KMA only outputs a single alignment

Issue #3 closed
Adam Rivers created an issue

I have a test fastq file of 125,000 interleaved metagenomics reads that generates alignments to 42 reference genomes when I run it on the CCmetagen server: https://cge.cbs.dtu.dk//cgi-bin/webface.fcgi?jobid=5E3DB53E00005FFB8AD4C041

When I run the same file locally using I am using KMA-1.2.21 with the ncbi_nt_no_env_11jun2019 database only one alignment appears. My command was:

kma -int ../rqc_data/reads/FRCS-D1-R1-238._sample.rqc.fq -o test1 -t_db ../../gbru_fy20_rice_methane/reference_db/ncbi_nt_no_env_11jun2019/ncbi_nt_no_env_11jun2019 -t 76 -1t1 -and -apm f -mem_mode

The file used to generate the issue is attched.

Thanks for looking into it.

Comments (5)

  1. ptlcc

    Hi Adam

    The CCMetagen webserver reads your fastq sample as single end reads instead of interleaved paired end reads. I will add a note about this to the webserver.

    When you analyse the sample using the paired end information KMA will split the input reads over several templates, which means that the individual templates are no longer significantly overrepresented. Seemingly there are too few fragments to use the paired end information properly. You can adjust this by lowering the threshold of including templates by setting the option “-mrs” to e.g. 0.01.

    Best,

    Philip

  2. Adam Rivers reporter

    Okay, thanks. Does this mean that only 42 reads are being aligned out of 125,000? If so, this seems quite low relative to Diamond, etc. Can you suggest parameter changes in increase recall?

  3. ptlcc

    Each alignment in the *.aln file contains the consensus sequence of each template reaching the thresholds. This means that each of these alignments contains all the reads that aligned to that template, where the bases are determined using majority voting.

  4. Log in to comment