'Repr_taxonomy' in db_infos/Host_Genomes.tsv
Dear Simon,
I found that the taxonomic classifications of host genomes are annotated in GTDB format.
Could you please offer the NCBI-format taxonomic annotations for these host genomes?
Thanks a lot!
Comments (5)
-
repo owner -
reporter Or could you offer the nucleotide sequence of these genomes?
I cannot get accessible to those host genomes whose IDs do not begin with ‘GCA’ and ‘GCF’. These genomes seem to come from IMG/M. However, they are reclustered, and I cannot retrieve them in IMG/M by their ID (for example, either ‘3300026280_17’ or ‘3300026280’ can only retrieve the sample they are present but not their MAGs).
-
repo owner This is because the genome you mentioned (“3300026280_17”) is a MAG from the GEM dataset. In the file “Host_Genomes.tsv”, column # 2 tells you what is the origin of the genome. GTDB genomes come with an NCBI id. IMG genomes come with an IMG Taxon_oid. GEM genomes come with a MAG id, and are available at https://portal.nersc.gov/GEM/ (see https://www.nature.com/articles/s41587-020-0718-6).
Hope this helps !
-
reporter It really helps! I just found that, too.
-
reporter - changed status to resolved
- Log in to comment
Hi,
To limit the complexity (and size) of the “Host_Genomes.tsv” file, we do not plan to include the NCBI annotation. However, you can use the GTDB website to link GTDB taxa to NCBI taxa (https://gtdb.ecogenomic.org).
Best,
Simon