different results when using DBs with identical sequences
Hi all
I ve been using kma for a few weeks now and I am extremely happy with its speed and performance however there is a small inconsistency that is bothering me
Problem:
I am using the fastq reads of a metagenome (name: N11) to map it against a virulence database called VFDB (http://www.mgc.ac.cn/VFs/main.htm). The sequences that I would like to check for their presence are the “stx” from Escherichia pathogens. Stx sequences exist both in VFDB and a custom made database that I made by downloading Escherichia genomes available on NCBI that carry stx genes.
When mapping the N11 sample to these 2 databases even though at least 3 of the stx genes are identical (100%) I am getting inconsistent results: only VFDB positively identifies stx sequences in the N11 sample.
When I then pulled out the stx sequences from both db and created a new db only with them kma gave 0 hits.
Therefore my question is: why can KMA positively identify the stx sequences from VFDB but not from my db or the merged custom db that only has the few stx genes? Could this be a txt format problem? Are there characters that are not allowed?
For the record I do get some hits with my custom db and no errors so it doesnot seem to be a general problem with the db. Also all databases were created using the same command $kma_index -in *.fas -db .db (where “.fas“ was each time the corresponding fasta file)
I would be happy to send a link with the files if sb wants to check for themselves (I just dont want to do it publicly cause these are unpublished data)
Thanks,
P
Comments (6)
-
-
reporter hey Philip
Thanks for the quick reply
- All databases are genes - I never use entire genomes
- here is the exact command that I use (against any db) $kma -ipe *fq.gz -o /output/sth_sq -t_db ~/VFDB -mem_mode -ef -1t1 -cge -nf -t 8
KMA-1.3.11
Again, thanks for looking into this
P
-
Hi Panos
I that case I will need some data to reproduce the error.
Best,
Philip -
reporter Thanks Philip
Is it OK if I send you a link to some files in an email?
P
-
Sure.
-
- changed status to resolved
Fixed invalid frag_raw output, and fixed Issue
#29→ <<cset 96f5b4e667a4>>
- Log in to comment
Hi Panos
I am glad to hear that you are generally happy with KMA.
Before you transfer the data I have a three questions:
Best,
Philip