Dear authors of bx_python,
thank you for sharing this very useful piece of software! I attempted to
use the galaxy scripts from command line to index the MAF blocks of the
100 mammal alignments from UCSC (hg19_100way).
I succeed with fly (dm3_15way), but for the big alignment I run into a
size-restriction. The MAFs for the largest chromosome are very big
(chr1.maf is ~22GB with LZO compression). Trying to build an index with
maf_build_index.py fails with:
File "/home/marjens/galaxy/cluster_env_for_galaxy/bin/maf_build_index.py", line 83, in <module> if __name__ == "__main__": main() File "/home/marjens/galaxy/cluster_env_for_galaxy/bin/maf_build_index.py", line 80, in main indexes.write( out ) File "/home/marjens/galaxy/cluster_env_for_galaxy/lib/python2.7/site-packages/bx/interval_index_file.py", line 332, in write write_packed( f, ">I", base ) File "/home/marjens/galaxy/cluster_env_for_galaxy/lib/python2.7/site-packages/bx/interval_index_file.py", line 463, in write_packed f.write( pack( pattern, *vals ) ) struct.error: 'I' format requires 0 <= number <= 4294967295
I need this to create stitched FASTA sequences later, using
Again, this works for fly, but fails for the much larger 100way. The
indices for smaller chromosomes can be built and are all below (but
getting close to) 4GB file size. I believe that the base/offset of a
bin-index inside the index file is the culprit. But replacing this in
the open() and write() with a 'Q' instead of 'I' breaks the binary
format. My insight into this is limited (and of course also time, as usual).
However, since such huge alignments are used in publications (and for
instance work in the UCSC genome browser multiz view) I assume that this
problem has already been solved one way or the other. I would be very
happy if you could give me a hint as to how to solve or circumvent this
Thank you very much and best regards.