Umap processes chromosomes in a hard-coded way

Issue #2 resolved
Stefan
created an issue

When using FASTA files such as this one:

>scaffold-1
CGAATAACTATAGGATTTTCGGGGAGAACGAAGTAACTTCTTTATTAACGACGAACGAGA
TCGGTTAGTTACAGGGAGAAATGACGAACGTACAGCCGATGAGCTACGAACAATCAACGA

The chromosome name is kept during the generation of the kmers:

>scaffold-1CGAATAACTATAGGATTTTCGGGGAGAACGAAGTAACTTCTTTATTAACGACGAACGAGATCGGTTAGTTACAGGGAGAAATGACGAAC
scaffold-1CGAATAACTATAGGATTTTCGGGGAGAACGAAGTAACTTCTTTATTAACGACGAACGAGATCGGTTAGTTACAGGGAGAAATGACGAACG
caffold-1CGAATAACTATAGGATTTTCGGGGAGAACGAAGTAACTTCTTTATTAACGACGAACGAGATCGGTTAGTTACAGGGAGAAATGACGAACGT
affold-1CGAATAACTATAGGATTTTCGGGGAGAACGAAGTAACTTCTTTATTAACGACGAACGAGATCGGTTAGTTACAGGGAGAAATGACGAACGTA
ffold-1CGAATAACTATAGGATTTTCGGGGAGAACGAAGTAACTTCTTTATTAACGACGAACGAGATCGGTTAGTTACAGGGAGAAATGACGAACGTAC
fold-1CGAATAACTATAGGATTTTCGGGGAGAACGAAGTAACTTCTTTATTAACGACGAACGAGATCGGTTAGTTACAGGGAGAAATGACGAACGTACA

This minor issue happens because Umap is hard-coded to check if the string "chr" is present in the chromosome ID, meaning that the chromosome need to include "chr" in its ID, for example

>chr-scaffold-1
CGAATAACTATAGGATTTTCGGGGAGAACGAAGTAACTTCTTTATTAACGACGAACGAGA
TCGGTTAGTTACAGGGAGAAATGACGAACGTACAGCCGATGAGCTACGAACAATCAACGA

Comments (2)

  1. Log in to comment