PointFinder loses seq_variations and phenotypes when contig name contains a ':'

Issue #103 resolved
Marco van Zwetselaar created an issue

When running PointFinder against an assembly with ':' in a contig name, the generated JSON has an empty seq_variations element and lacks the related phenotypes entries.

The classical tabular output shows the mutations correctly. Also, when turning on --unknown_mut, the mutations end up in the JSON. This suggests that the issue is in discard_unknown_muts() called at run_resfinder:469:

if not conf.unknown_mut:                                                                                                                                                                                                               
    results_pnt = PointFinder.discard_unknown_muts(
        results_pnt=results_pnt, phenodb=res_pheno_db, method=method)

Drilling down discard_unknown_muts() we get to _get_known_mis_matches() (pointfinder.py:413), which starts with this ominous bit of code πŸ•΅οΈβ€β™‚οΈ :

@staticmethod
def _get_known_mis_matches(entry_key, mis_matches, phenodb):
    try:
        gene_ref_id = entry_key.split(":")[2]

Imagine now having gene_ref_id = "NODE105 (length: 24589, cov: 14.366465):14139..16766:gyrA_1_CP073768.1:99.200913" and you see what goes wrong.

A quick (and correct) fix is to instead count from the right, so the code becomes:

        gene_ref_id = entry_key.split(":")[-2]

But the proper way to fix this is to grab the gene_ref_id from the appropriate field of the β€˜hit dict', rather than ply it out of its key. I’m just not sure which field has that gene ref.

β€Œ

Comments (2)

  1. Maja Weiss

    Dear Marco,

    Thank you for finding this error and for the investigation. I have fixed it as you propose as the gene_ref_id could be grabbed from entry['subject_header'] in discard_unknown_muts.
    The fix will be added to the Staging branch and included in the next release.

    β€Œ

  2. Log in to comment