Add some documentation on the various options

Issue #23 resolved
Anders Goncalves da Silva created an issue

Hello.

Thank you for KMA.

I am struggling to get a good sense of the effect of the different kmer lengths when indexing the database, and whether some options might be mutually exclusive (for instance, indexing with a kmer of 11, but setting the k_t to 21)? It works, but does it have an effect?

Also, the pe_mode options. I am guessing force means that only reads that map with their pairs are accepted, and reward gets a boost (not sure by how much), and the u means ignore? Is that correct?

Finally, -t (or threads). The default says 1, but when running on my local machine there are three threads that are spawned. I assume one is for reading/decompressing the FASTQ, one for analysis, and one for writing. So, should this be additional threads, in a similar fashion to samtools?

And, one last element, in the paper you mention trimming as a first step. But, the documentation does not mention what parameters are used for trimming, and what effect it may have.

Thanks again.

Anders.

Comments (3)

  1. ptlcc

    Hi Anders

    I have updated the “KMAspecification.pdf” for the options you mention.

    To answer your questions:

    Using -k 11 -k_t 21 when indexing will set the k-mer size to 21 for identifying templates, and 11 when aligning.

    You are right by the force pairing, the reward for pairing is specified by the option “-per” (default 7), Unite prefers keeping the reads together, but does neither give a penalty nor reward for it.

    The threads depends on the version and options used. With the latest version (1.3.9) and the flag “-status“ it more less follows what you describe.

    The trimming parameters are specified with: -ml (minimum length), -mp (minimum phred score at leading and trailing bases), -eq (minimum read quality), -5p (constant trimming of bases from the 5') and -3p (constant trimming of bases from the 3'). The effect of trimming depends on the type of data you have and how samples have been treated in the lab.

    Best,
    Philip

  2. Anders Goncalves da Silva reporter

    Thank you for the response Philip. Much appreciated! I’ll check out version 1.3.9.

    I did find one combination of k, k_i, and k_t for a specific sample/DB that consistently causes a seg fault (only for this specific sample). I’ll put a package together so you can reproduce it.

  3. Log in to comment