Inconsistency in "match" or my misunderstanding of it?
Hi ResFinder team!
I think I don’t understand the term “Match” in pheno_table.txt. My understanding is that Match=2 and Match=3 both mean 100% identity:
# The 'Match' column stores one of the integers 0, 1, 2, 3.
# 0: No match found
# 1: Match < 100% ID AND match length < ref length
# 2: Match = 100% ID AND match length < ref length
# 3: Match = 100% ID AND match length = ref length
# If several hits causing the same resistance are found,
# the highest number will be stored in the 'Match' column.=
But in some of my results, I get Match=2 in pheno_table.txt and <100% identity in Resfinder_results_tab.txt. Here is an example, focusing on tetracyclines:
pheno_table.txt results
# Antimicrobial Class WGS-predicted phenotype Match Genetic background
tetracycline tetracycline Resistant 2 tetA(46) (tetA(46)_HQ652506), tetB(46) (tetB(46)_HQ652506), tetA(60) (tetA(60)_KX000272), tet(M) (tet(M)_X75073)
doxycycline tetracycline Resistant 2 tetA(46) (tetA(46)_HQ652506), tetB(46) (tetB(46)_HQ652506), tetA(60) (tetA(60)_KX000272), tet(M) (tet(M)_X75073)
minocycline tetracycline Resistant 1 tet(M) (tet(M)_X75073)
tigecycline tetracycline Resistant 2 tetA(46) (tetA(46)_HQ652506), tetB(46) (tetB(46)_HQ652506), tetA(60) (tetA(60)_KX000272)
All tet genes in my Resfinder_results_tab.txt results
Resistance gene Identity Alignment Length/Gene Length Coverage Position in reference Contig Position in contig Phenotype Accession no.
tetA(46) 92.70 1725/1725 97.33 1..1726 NA NA..NA Warning: gene is missing from Notes file. Please inform curator. HQ652506
tetB(46) 81.35 1637/1737 86.36 1..1638 NA NA..NA Warning: gene is missing from Notes file. Please inform curator. HQ652506
tetA(60) 88.62 1704/1740 93.68 1..1705 NA NA..NA Warning: gene is missing from Notes file. Please inform curator. KX000272
tet(M) 92.66 1922/1920 92.86 1..1923 NA NA..NA Tetracycline resistance X75073
(All identities for tet genes are <100% - hence why I must be misunderstanding “Match” in pheno_table.txt.)
A related question: for tetA(46) and tetB(46), these have the same accession ID (HQ652506) but different reference gene lengths (1725, 1737 bp, respectively). Why?
And what does the “Warning: gene is missing from Notes file. Please inform curator,” mean exactly? Is there a form somewhere where I should input these when I find them? Does it mean there is less confidence about these genes?
Thanks for your help!
Comments (3)
-
-
reporter Ok, thank you so much! My confusion regarding tetA(46) and tetB(46) was more that they had the same accession ID (
HQ652506
). But after reading the associated paper, I see that this is a heterodimeric ABC transporter, tetAB(46), where both genes are required for tetracycline resistance. So it makes sense now. Thanks again! -
reporter - changed status to resolved
Help text for defining "Match" in pheno_table.txt was incorrect. ResFinder team has indicated that it has been fixed.
- Log in to comment
Dear Jessica,
Thank you very much for your interest in ResFinder.
To answer your questions:
1) match types: I understand the confusion and this is due to the text being wrong. The match = 2, means that Identity is below 100% and the match length = ref length. The texted is now fixed.
2) length of tetA and tetB: an accession number can contain multiple genes as is the case here. Genes tetA and tetB are two different genes and therefor it is possible to have different lengths.
3) the warning: Resfinder contains a notes.txt file which should include all genes in ResFinder but we are missing some. So we ask the users to report them to us when they come by them so we can fill them in. The absence do not effect the results.
If you find additional genes please send the output to food-cgehelp@dtu.dk and we will ensure they are added.
I hope that clears everything up and please do not hesitate if you have further questions.
Best regards,
Maja, CGE Helpdesk