The output files

Issue #46 resolved
Kaixin h created an issue

Hi, I will appreciate it very much if you could have a look at my confusion!

What’s the difference between pheno_table.txt and pheno_table_[a_specific_species].txt ?

It seems that regarding E.coli, in the output "PointFinder_prediction.txt", there is always a "1" mark under "unknown colistin" for all samples. This happens even when there is no Resistant indicated in the "pheno_table.txt". I am wondering if you have noticed this?

Does the output PointFinder_results.txt contain all the mutations including those without known knowledge with AMR? If not, can a user get those?

For ResFinder, i.e. with --acquired, is --s a required parameter? It seems a must, right? Because the following command produces different outcome:

python3 run_resfinder.py -ifa tests/data/test_isolate_05.fa -o running_test/test111 --min_cov 0.6 -t 0.8 --acquired --db_path home_to/resfinder/db_resfinder ---blastPath /usr/bin/blastn

python3 run_resfinder.py -ifa tests/data/test_isolate_05.fa -o running_test/test1 -s 'Escherichia coli' --min_cov 0.6 -t 0.8 --acquired --db_path home_to/resfinder/db_resfinder---blastPath /usr/bin/blastn

Thanks!

Comments (12)

  1. CGE Helpdesk

    Dear Kaxin H.,

    Thank you for your interest in ResFinder.

    The difference between pheno_table.txt and pheno_table_species.txt is that the latter includes the AMR specific for a certain species while the pheno_table is the general file.

    Regarding the “1” mark beneath "unknown colistin": when I ran 

    python3 run_resfinder.py -ifa tests/data/test_isolate_05.fa -o test3 --min_cov 0.6 -t 0.8 --acquired -s "escherichia coli" --point

    it gives me “0”. The samples you tested, are they in tests/data/?

    The PointFinder_results.txt contains the chromosomal point mutations known to cause antibiotic resistance in the chosen species. The mutations with unknown phenotype can be found in the PointFinder_results.txt in which the PMID will say "unpublished" .

    To run ResFinder with the - -acquired option does not depend on the --species argument. The only difference is that specifying the species will produce a pheno_table_species.txt along with the same files as running it without as specified species. The --species argument is however a prerequisite to run PointFinder with --point.

    Best,

    Karen, CGE Helpdesk

  2. Kaixin h reporter

    Dear Karen,

    Thanks for your reply!

    By “ pheno_table is the general file”, do you mean the prediction is based on prior knowledge of that SNP/gene for all species? If it set AMR for any species, it will be marked as 1?

    I just notice that for the sample provided in tests/data/ works well; While for samples such as 1328438.3.fna, 1328434.3.fna, 562.22700.fna

    I downloaded them from ftp://ftp.patricbrc.org/genomes/

    BTW, if I use the following command it will use blast instead of kma:

    python3 run_resfinder.py -ifa tests/data/test_isolate_05.fa -o running_test/new --min_cov 0.6 -t 0.8 --acquired -s "escherichia coli" --point --kmaPath

    I am wondering why it’s designed to use blast for ifa files, and kma for ifq file?

  3. CGE Helpdesk

    Dear Kaxin H.,

    What I meant was that the pheno_table.txt contains any AMR and is not species specific.

    If a point mutation is detected in your sample, the antibiotic that the given species is resistant to will be ‘flagged’ with a “1” in the PointFinder_prediction.txt file. I checked that the test_isolate_05.fa is in concordance with the PointFinder_results.txt.

    When using the command, you wrote, an error message run_resfinder.py: error: argument -k/--kmaPath: expected one argument is printed because --kmaPath requires a following argument for the kma path. If you try writing an arbitrary word after this option, you will find that the program run a BLAST as it should.

    I hope this clarified your questions.

    Best regards,
    Karen, CGE Helpdesk

  4. Kaixin h reporter

    Dear Karen,

    Thanks for your quick reply!

    1. I still not understand what it means by “is not species-specific“. AMR means resistance to a specific antibiotic, it is of course not species-specific. Am I understanding right here?

    By “not generally“, you first align the target sequence to a relevant(species-specific) reference database and then use prior knowledge to decide the phenotype. Could you please specify the difference if “generally“ is used? Thanks!

    2. Sorry for my blur description in my first comment. Please allow me to describe it again.

    Using  the test_isolate_05.fa , the outcome is in concordance. While I use some other genomic data from PATRIC, it is not. For example, I uploaded some results here: https://github.com/augustkx/ResFinder_results . I would appreciate it very much if you could have a look.

    3. Sorry for the incomplete command, I just didn't paste the whole command. Yes , the exact command I use was:

    python3 run_resfinder.py -ifa tests/data/test_isolate_05.fa -o running_test/new --min_cov 0.6 -t 0.8 --acquired -s "escherichia coli" --point --kmaPath /home_to_the_path/resfinder/cge/kma/kma

    It works, but with blast, not kma.

    So, my question is : why it’s designed to use blast for ifa files, and kma for ifq file?

    4. what does the warning mean? Is it related to unknown mutations?

    WARNING: Missing features from phenotype database:

    Best regards,

    Kaixin

  5. CGE Helpdesk

    Dear Kaixin,

    1. Some species have antimicrobial resistances intrinsically, while other species can be resistant to that antimicrobial if they have a gene which is not intrinsic of that species.

    Sorry, I’m not sure I understand what you are asking here: “Could you please specify the difference if “generally“ is used”?
    2. I would suggest you submit the jobs you are referring to to the CGE webserver https://cge.cbs.dtu.dk/services/ResFinder/  and send us the link to your job via email food-cgehelp@dtu.dk tagging it with this Issue number 46. Then we will see if the two files are in concordance, which is the problem with the 0 and 1’s for E. coli, right?
    3. ResFinder is designed to use KMA to search for resistance genes in raw data as this is an efficient method to align big datasets.
    4. Is this warning related to the PATRIC data you talked about in section 2)?

    Best,
    Karen, CGE Helpdesk

  6. Kaixin h reporter

    Dear Karen,

    1. So regarding PointFinder, it means mutation on mobilizable genes?
    2. Yes, 0 and 1’s for E. coli in PointFinder_results.txt. Email sent.

    4. Yes. It also exists regarding other species, e.g. MT. Something like:

    # WARNING: Missing features from phenotype database:

    # Feature_ID    Region  Database        Hit

    embC_981_l      embC

    gyrA_21_q       gyrA

    gyrA_95_t       gyrA

    gyrA_668_d      gyrA

    rpoC_594_e      rpoC

    Best,Kaixin

  7. Alfred Ferrer Florensa

    Dear Kaixin,

    Karen has asked me to answer some of the questions of this conversation.

    • I think there was a bit of confusion when you asked about “all the mutations including those without known knowledge with AMR? If not, can a user get those?“, maybe you were asking about the command line option -u: -u, --unknown_mut Show all mutations found even if in unknown to the resistance database
    • As you said, AMR is not species specific. However, from ResFinder4 (https://pubmed.ncbi.nlm.nih.gov/32780112/), we are incorporating resistance profiles. pheno_table.txt will be a profile of resistance of the AMR genes found in your sample; pheno_table_[species] will also include the known antimicrobials resistances species specific.
    • About the use of KMA on assembled data. KMA (https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2336-6) is an aligner that works with raw reads, it is not design for aligning assembled data. For that reason, as your data is assembled already (.fa), ResFinder is switching to Blast, which is designed to align assembled data.
    • About the issue with PointFinder_predictions.txt, we will take a look at it. It might be an error and we are sorry about it. However, I would recommend to use the other files, as they contain the information of the “_predictions.txt” file too.
    • The warnings of the missing features is something getting improved. It is about some mutations we lack information. But they will be reported anyways in all the results.

    I hope this answers your questions.

    Best,

    Alfred

  8. Kaixin h reporter

    Dear Alfred,

    Yes, I was asking the option -u, --unknown, I see now. And thanks for the clear explanation. Thank you!

    Best,

    Kaixin

  9. Peter Cock

    I was also briefly confused by seeing “unknown colistin” in PointFinder_prediction.txt but that is fact not a space, but a tab - you’re seeing a 1 under “unknown”:

    $ cut -f 1-4 PointFinder_prediction.txt
    nalidixic acid  ciprofloxacin  unknown  colistin
    0               0              1        0
    

  10. Peter Cock

    The above discussion initially led me to expect pheno_table_escherichia_coli.txt would have more entries than pheno_table.txt but the opposite seems to be the case:

    $ wc -l pheno_table*.txt
      71 pheno_table_escherichia_coli.txt
     150 pheno_table.txt
     221 total
    

    Looking just at the data lines (those with a tab):

    $ grep -P "\t" -c pheno_table*.txt
    pheno_table_escherichia_coli.txt:54
    pheno_table.txt:132
    

    Does this mean the E. coli specific file ignores things typically found in the species like mdf(A)?

  11. CGE Helpdesk

    Dear Peter,

    Please have a look at Issue #69. I hope you will find the answer regarding the size and content of the pheno_table files.

    Best,
    Karen, CGE Helpdesk

  12. Log in to comment