kma index error

Issue #44 resolved
morteneneberg created an issue

Hello!

I am trying to use KMA for metagenomic classification and want to index a database consisting of bacterial, viral and fungi Refseq Complete Genomes as well as the human genome. However, I am prompted with an error when running the index:

Invalid option:        index 

This is my code:

module load KMA/2018-Nov-12-foss-2018a

kma index -i path/to/inputfile/file.fna -k 20 -o kmadb208/kmadb208

module purge

Comments (8)

  1. ptlcc

    It is an old version of KMA you are running, which does not include most the of the updates performed in the last years.
    I would recommend to update it.

    Best,
    Philip

  2. morteneneberg reporter

    Dear Philip,

    We had memory issues with the installed version and updated to the newest, running with

    -NI -Sparse TG
    

    Finalizing the build, the following messages were printed:

    # Templates key-value pairs:    1711887037.
    
    # Total time used for DB indexing: 14693.47 s.
    
    # Compressing templates
    # Preparing compressed DB.
    # Calculating relative indexes.
    # Compression overflow.
    # Finalizing indexes.
    # Dumping compressed DB
    # Template database created.
    
    # Total time used for DB compression: 20089.82 s.
    

    Does the ‘compression overflow’ mean that the database build was not finalized??

    Sincerely,

    Morten

  3. ptlcc

    Hi Morten

    It means that the number of inferred taxis exceeded what could be stored in an unsigned integer, which causes KMA to store them in a long unsigned integer.
    So the build is complete and valid. If something unexpected or potentially unintended happened it will be printed to stderr without a preceding '#'. If an error occurred the exit code will be different from 0 too, where an exit code above 1 usually results in an invalid index.

    Best,
    Philip

  4. morteneneberg reporter

    Thanks a lot Philip. I am testing different mappers for performance on short erronous Nanopore reads (cell free DNA). Which settings for kma would you suggest for such a task?

    The reads are generated with badread (rrwick) and consist of 99.5% human reads (~170bp) and 0.5% bacterial reads (~70bp) at a mean quality of 93%. Other than kma I test Centrifuge, minimap2, bowtie2 and kraken2 with the most recent refseq complete genomes release (bacterial, viral, fungi, human). If you have suggestions for classifiers that I did not include but should consider please dont hesitate to post them. I should also mention that the bacterial coverage will be very low.

    I hope that my question is OK

    Kind regards,

    Morten

  5. ptlcc

    Hi Morten

    For that I would use something like:
    -mem_mode -bc 0.7 -bcNano -mrs 0 -ID 0 -ef -1t1 -ca

    For the testing KrakenUniq would be a good addition. You might have to lower the k-mer size for Kraken2 and KrakenUniq. For Centrifuge, Minimap2 and Bowtie2 there might be some mapping quality and other scoring thresholds you might have to lower.

    Best,
    Philip

  6. Log in to comment