read a bigwig file is slow

fidel_ramirez avatarfidel_ramirez created an issue

Hi,

I noticed a large execution time difference between a perl script reading a bigwig file and a python script using bx BigWigFile.

The following code shows the problem:

"""
usage: %prog bigwig_file.bw  < bed_file.bed 
"""
from bx.intervals.io import GenomicIntervalReader
from bx.bbi.bigwig_file import BigWigFile
import numpy as np
import time
import sys

bw = BigWigFile( open( sys.argv[1] ) )
ll = []
for interval in GenomicIntervalReader( sys.stdin ):
    start = time.time()
    bw.query(interval.chrom, interval.start, interval.end, 20 ) 
    total = time.time() - start
    ll.append(total)

print np.mean(ll)

In my case the mean values are around 0.02 seconds, while the perl counterpart takes less than 0.001 sec. In practical terms this means to wait for several minutes using python compared to a couple of seconds in perl

I profiled the above code and most of the running time is spend on the method read_and_unpack() of bx/misc/binary_file.py. For each call to the BigWigFile query() method there are 10.000 calls to read_and_unpack() which seems excessive to me; however, I could not explain why so many calls are needed by browsing the source code.

Otherwise thanks for the great code,

Fidel

Comments (2)

  1. Anonymous

    Hi Brent,

    Yes, the Perl code I am comparing to also provides summary information. Here is an example copied from the perl library:

    use Bio::DB::BigWig 'binMean';
    
    my $wig  = Bio::DB::BigWig->new(-bigwig=>'ExampleData/dpy-27-variable.bw',
                                        -fasta=>'/tmp/elegans.fa');
    
    # Statistical summaries using "bin" feature type
    # Fetch 10 intervals across region 5M=>6M on chromosome I
    my @bins = $wig->features(-seq_id=>'I',
                               -start  => 5_000_000,
                               -end    => 6_000_000,
                                 -type=>'bin:10');
    for my $b (@bins) {
       my $start = $b->start;
       my $end   = $b->end;
       print "$start..$end, mean = ",$b->mean,"\n";
    }
    

    I haven't compiled or created cython code, therefore I don't know whether I am using a cythonized version or not. However, I have the impression that the current algorithm could be optimized.

  2. Log in to comment
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.