Overlapping ranges in unified BED file?

Issue #5 new
Stefan
created an issue

Hi,

after running Int8Handler to go from several uint8 files to a complete BED file I got a very weird result: for the same genome I was getting a lower mappability for 150bp compared to 100bp. This is how one of the two files looks like in the first lines:

chr-1   1       194     k100    1       +
chr-1   344     1505    k100    1       +
chr-1   1484    4739    k100    1       +
chr-1   4642    4936    k100    1       +
chr-1   4858    5199    k100    1       +
chr-1   5115    5604    k100    1       +
chr-1   5627    5887    k100    1       +

As you can see there are overlapping ranges of uniquely mapping regions: the range in line 2 ends at position 1505 and range in line 3 start at position 1484. This means that when I was calculating the sum of the different ranges I was getting a higher number for 100bp just because I had more shorter ranges with more overlaps. I'm not sure if I did something wrong or there's an issue with the code, but in case I can provide some files to reproduce the error.

Comments (2)

  1. Mehran Karimzadeh

    Hi Stefan,

    BED output of Int8Handler, also known as single-read mappability, is binary.

    Overlaps in this BED file do not indicate higher or lower mappability.

    Its purpose is to just indicate any region uniquely mappable with at least 1 k-mer.

    For quantitative mappabillity, please use the wiggle or bedGraph output (also known as multi-read mappability).

    I am sorry that I haven't made this point clear in the documentation.

    I will emphasize this in the README file as this can be very confusing.

    Thanks a lot for reporting this.

    Best,

    Mehran

  2. Log in to comment