kma index error

morteneneberg reporter

edited description

2021-09-29T07:13:30+00:00

morteneneberg reporter

changed status to resolved

seems that you need to add an "_" so that the command is

kma_index -i path/to .......

2021-09-29T07:15:57+00:00

ptlcc

It is an old version of KMA you are running, which does not include most the of the updates performed in the last years.
I would recommend to update it.

Best,
Philip

2021-09-29T08:57:41+00:00

morteneneberg reporter

Dear Philip,

We had memory issues with the installed version and updated to the newest, running with

-NI -Sparse TG

‌

Finalizing the build, the following messages were printed:

# Templates key-value pairs:    1711887037.

# Total time used for DB indexing: 14693.47 s.

# Compressing templates
# Preparing compressed DB.
# Calculating relative indexes.
# Compression overflow.
# Finalizing indexes.
# Dumping compressed DB
# Template database created.

# Total time used for DB compression: 20089.82 s.

Does the ‘compression overflow’ mean that the database build was not finalized??

‌

Sincerely,

Morten

2021-10-01T05:44:58+00:00

ptlcc

Hi Morten

It means that the number of inferred taxis exceeded what could be stored in an unsigned integer, which causes KMA to store them in a long unsigned integer.
So the build is complete and valid. If something unexpected or potentially unintended happened it will be printed to stderr without a preceding '#'. If an error occurred the exit code will be different from 0 too, where an exit code above 1 usually results in an invalid index.

Best,
Philip

2021-10-01T06:47:24+00:00

morteneneberg reporter

Thanks a lot Philip. I am testing different mappers for performance on short erronous Nanopore reads (cell free DNA). Which settings for kma would you suggest for such a task?

The reads are generated with badread (rrwick) and consist of 99.5% human reads (~170bp) and 0.5% bacterial reads (~70bp) at a mean quality of 93%. Other than kma I test Centrifuge, minimap2, bowtie2 and kraken2 with the most recent refseq complete genomes release (bacterial, viral, fungi, human). If you have suggestions for classifiers that I did not include but should consider please dont hesitate to post them. I should also mention that the bacterial coverage will be very low.

I hope that my question is OK

Kind regards,

Morten

2021-10-01T14:08:32+00:00

ptlcc

Hi Morten

For that I would use something like:
-mem_mode -bc 0.7 -bcNano -mrs 0 -ID 0 -ef -1t1 -ca

For the testing KrakenUniq would be a good addition. You might have to lower the k-mer size for Kraken2 and KrakenUniq. For Centrifuge, Minimap2 and Bowtie2 there might be some mapping quality and other scoring thresholds you might have to lower.

Best,
Philip

2021-10-04T04:53:37+00:00

morteneneberg reporter

Hi Philip,

Thank you for the help!

Kind regards,

Morten

2021-10-04T06:58:06+00:00

Comments (8)