Can pointfinder results contain the location of the mutation in the assembly?

Issue #104 new
Jonathan Abrahams created an issue

Hi,

I can see in the blast outputs in the pointfinder results directory that there is the information on the location of the resistance gene in the assembly. I have written my own hack-y script to extract this data and report the resistance mutation locus in the input assembly, rather than just the position of the reistance mutation within the gene.

Are there plans to integrate this as a feature in the near future? My own script is likely to get easily tripped up, especially for mutations in promoter regions.

Thanks for your very quick responses so far and continued help,

Jonathan

Comments (2)

  1. Edison Alain von Matt

    Dear Jonathan,

    Sorry for the late notice.

    I tried to implement it, but there seems to be a problem with the reported query position of BLAST when a gene is found on multiple contigs.

    An example is the gene “16S-rrsC” that BLAST finds in the file below:

    https://drive.google.com/file/d/1VwUx6QJnOodj7audKA1PxUwU75KR0Tfr/view?usp=sharing

    pdm run resfinder -ifa 1438712.3.fna -o . -s ecoli --acquired --point
    

    Here, BLAST reports the following hit, among others:

    ID: JMWL01000006 Klebsiella pneumoniae CHS 06 aeebk-supercont1.1.C6, whole genome shotgun sequence. [Klebsiella pneumoniae CHS 06 | 1438712.3]:1..591:16S-rrsC_1_CP053603.1:36.651644

    {….'query_string': 'AAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGG….., 'query_start': 1, 'query_end': 591, …………….}

    So, as I understood it, the reported query sequence should be from the 1-591 on the contig JMWL01000006 Klebsiella pneumoniae CHS 06 aeebk-supercont1.1.C6.

    But the actual contig sequence starts with the following:

    ACAGACCGCCTGCGTGCGCTTTACGCCCAGTAATTCCGATTAAC…..

    The start of the contig sequence is obviously different from the reported query in that range, which frequently happens for gene hits that are found on multiple contigs.

    Thus, I can not accurately report the resistance mutation locus in the input assembly, as the blast position results seem to be inconsistent. Maybe I am missing something - do you have any suggestions on how to handle such cases?

    Best

    Eddy

  2. Log in to comment