Add some documentation on the various options
Hello.
Thank you for KMA.
I am struggling to get a good sense of the effect of the different kmer lengths when indexing the database, and whether some options might be mutually exclusive (for instance, indexing with a kmer of 11, but setting the k_t to 21)? It works, but does it have an effect?
Also, the pe_mode options. I am guessing force means that only reads that map with their pairs are accepted, and reward gets a boost (not sure by how much), and the u means ignore? Is that correct?
Finally, -t (or threads). The default says 1, but when running on my local machine there are three threads that are spawned. I assume one is for reading/decompressing the FASTQ, one for analysis, and one for writing. So, should this be additional threads, in a similar fashion to samtools?
And, one last element, in the paper you mention trimming as a first step. But, the documentation does not mention what parameters are used for trimming, and what effect it may have.
Thanks again.
Anders.
Comments (3)
-
-
reporter Thank you for the response Philip. Much appreciated! I’ll check out version 1.3.9.
I did find one combination of k, k_i, and k_t for a specific sample/DB that consistently causes a seg fault (only for this specific sample). I’ll put a package together so you can reproduce it.
-
- changed status to resolved
- Log in to comment
Hi Anders
I have updated the “KMAspecification.pdf” for the options you mention.
To answer your questions:
Using -k 11 -k_t 21 when indexing will set the k-mer size to 21 for identifying templates, and 11 when aligning.
You are right by the force pairing, the reward for pairing is specified by the option “-per” (default 7), Unite prefers keeping the reads together, but does neither give a penalty nor reward for it.
The threads depends on the version and options used. With the latest version (1.3.9) and the flag “-status“ it more less follows what you describe.
The trimming parameters are specified with: -ml (minimum length), -mp (minimum phred score at leading and trailing bases), -eq (minimum read quality), -5p (constant trimming of bases from the 5') and -3p (constant trimming of bases from the 3'). The effect of trimming depends on the type of data you have and how samples have been treated in the lab.
Best,
Philip