Issue #8 resolved

Numpy MemoryError appears when working with small bin-size

Anonymous created an issue

I found that when working with small bin sizes (e.g. resolution 50000bp) the error appears:

Code:

import matplotlib.pyplot as plt
import numpy as np

from mirnylib import genome
from mirnylib import h5dict
from mirnylib import plotting
from hiclib import binnedData
from hiclib import fragmentHiC

genome_name='mm10'
genome_db = genome.Genome('../../fasta/'+genome_name, readChrms=['#', 'X'])

domain_res = 50000

fragment_dataset_filename='../../data/sample/Serov/fragment_dataset_Fib.hdf5'
heatmap_filepath='../../data/sample/Serov/heatmap-res-'+str(domain_res/1000)+'KB_Fib.hdf5'

###################Step 1 - create 50kb-binned heatmap################

# Create a HiCdataset object.
fragments = fragmentHiC.HiCdataset(
    filename=fragment_dataset_filename,
    genome=genome_db,
    maximumMoleculeLength=500,
    mode='r')

# Load the parsed reads into the HiCdataset. The dangling-end filter is applied
# at this stage, with maximumMoleculeLength specified at the initiation of the 
# object.

#fragments.parseInputData(
#    dictLike='../../data/sample/Serov/mapped_reads_fib.hdf5')

#fragments.filterRsiteStart(offset=5)
#fragments.filterDuplicates()
#fragments.filterLarge()
#fragments.filterExtreme(cutH=0.005, cutL=0)

fragments.saveHeatmap(heatmap_filepath, domain_res)

Resulting log:

----> New dataset opened, genome mm10, filename = ../../data/sample/Serov/fragment_dataset_Fib.hdf5
________________________________________________________________________________
[Memory] Calling mirnylib.genome.run_func...
run_func(set(['#', 'X']), 'gap.txt', 'chr%s.fa', '_getGCBin', 50000)
________________________________________________________run_func - 78.6s, 1.3min
________________________________________________________________________________
[Memory] Calling mirnylib.genome.run_func...
run_func(set(['#', 'X']), 'gap.txt', 'chr%s.fa', '_getUnmappedBasesBin', 50000)
_________________________________________________________run_func - 5.0s, 0.1min
Traceback (most recent call last):
  File "042_domains_search_Dixon.py", line 39, in <module>
    fragments.saveHeatmap(heatmap_filepath, domain_res)
  File "/home/minja/HiC/mirnlab-hiclib/src/hiclib/fragmentHiC.py", line 1329, in saveHeatmap
    heatmap = self.buildAllHeatmap(resolution, countDiagonalReads, useWeights)
  File "/home/minja/HiC/mirnlab-hiclib/src/hiclib/fragmentHiC.py", line 784, in buildAllHeatmap
    counts = np.bincount(label, minlength=numBins ** 2)
MemoryError

How do you think, is it possible to fix the error?

Best, Minja

Comments (3)

  1. mimakaev

    Hi Minja,

    Working with high resolution heatmaps is tricky, as a 50kb-binned heatmap takes at minimum: (3*10^9 / 50kb)^2 * 4 bytes = 12 GB; plus some supplementary data. I never work with more than 100kb heatmaps.

    Your issue in in the Genome class though; I never had problems creating a 10kb-binned genome. Do you have a 32-bit or a 64-bit system? How much RAM do you have?

  2. Log in to comment