Wiki

Clone wiki

bx-python / Examples / Bzip2WithIndexedMAFs

Example: Bzip2 with indexed MAFs

Starting from a bzip2 compressed MAF file:

henduck% ls -al chrY.maf.bz2
-rw-r--r--   1 james  james  12646982 Mar 22 23:03 chrY.maf.bz2

First generate the table file:

henduck% time ~/work/seek-bzip2/bzip-table < chrY.maf.bz2 > chrY.maf.bz2t
~/work/seek-bzip2/bzip-table < chrY.maf.bz2 > chrY.maf.bz2t  4.74s user 0.03s system 99% cpu 4.774 total

Now generate the index (just on hg18 here):

henduck% time maf_build_index.py -s hg18 chrY.maf.bz2
maf_build_index.py -s hg18 chrY.maf.bz2  7.37s user 0.25s system 95% cpu 7.985 total

We now have three files (compressed maf, translation table, and index)

henduck% ls -al chrY*
-rw-r--r--   1 james  james  12646982 Mar 22 23:03 chrY.maf.bz2
-rw-r--r--   1 james  james       965 Mar 23 18:08 chrY.maf.bz2t
-rw-r--r--   1 james  james    224185 Mar 23 18:09 chrY.maf.index

Now extract some blocks from the compressed file using random access:

henduck% time echo "hg18.chrY 10000000 15000000" | maf_extract_ranges_indexed.py chrY.maf.bz2 > out.maf
echo "hg18.chrY 10000000 15000000"  0.00s user 0.00s system 47% cpu 0.003 total
maf_extract_ranges_indexed.py chrY.maf.bz2 > out.maf  1.37s user 0.29s system 99% cpu 1.659 total

Count how many alignment bocks came back:

henduck% maf_count.py < out.maf
1964

Wow, only 1.5 seconds to grab 1964 alignments out of the middle of a compressed MAF file!

See IO/SeekingInBzip2Files for more on technical details.

Updated