Using CCMetagen with custom database
I am trying to use a custom database consisting of all bacterial full length 16S sequences from SILVA 138 clustered at 97% assigned with de novo taxonomy when needed. I have the sample_out_kma.res from kma with the template looking like this: #Template ASV43690.1459 (.1459 corresponding to the length of the sequence). When running CCMetagen.py i get the following:
Traceback (most recent call last): File "/shared-nfs/TBNJ/BIOWIDE_2.0/scripts/envs/bin/CCMetagen.py", line 274, in <module> df = fParseKMA.populate_w_tax(df, ref_database, st, gt, ft, ot, ct, pt) File "/shared-nfs/TBNJ/BIOWIDE_2.0/scripts/envs/lib/python3.6/site-packages/ccmetagen/fParseKMA.py", line 90, in populate_w_tax match_info.Lineage = split_match[2] IndexError: list index out of range
Can i fix this by adding the taxonomy to the the sequence headers, as they only contains the taxid?
Best regards, Thomas
Comments (3)
-
-
Hi Philip,
Thanks for the quick response.
Will do.
-Thomas
-
- changed status to resolved
- Log in to comment
Hi Thomas
The taxids have to follow the same format as for the NCBI taxonomy browser.
The format of the header should look like this:
For CCMetagen to recognise it, where “43690“ is the taxid.
You can alter this in *.name file, without having to reindex the entire thing.
Best,
Philip