Using CCMetagen with custom database

Issue #46 resolved
Former user created an issue

I am trying to use a custom database consisting of all bacterial full length 16S sequences from SILVA 138 clustered at 97% assigned with de novo taxonomy when needed. I have the sample_out_kma.res from kma with the template looking like this: #Template ASV43690.1459 (.1459 corresponding to the length of the sequence). When running CCMetagen.py i get the following:

Traceback (most recent call last): File "/shared-nfs/TBNJ/BIOWIDE_2.0/scripts/envs/bin/CCMetagen.py", line 274, in <module> df = fParseKMA.populate_w_tax(df, ref_database, st, gt, ft, ot, ct, pt) File "/shared-nfs/TBNJ/BIOWIDE_2.0/scripts/envs/lib/python3.6/site-packages/ccmetagen/fParseKMA.py", line 90, in populate_w_tax match_info.Lineage = split_match[2] IndexError: list index out of range

Can i fix this by adding the taxonomy to the the sequence headers, as they only contains the taxid?

Best regards, Thomas

Comments (3)

  1. ptlcc

    Hi Thomas

    The taxids have to follow the same format as for the NCBI taxonomy browser.
    The format of the header should look like this:

    43690|ASV43690.1459 taxonomy description

    For CCMetagen to recognise it, where “43690“ is the taxid.
    You can alter this in *.name file, without having to reindex the entire thing.

    Best,
    Philip

  2. Log in to comment