Memory Error

Issue #4 on hold
Catherine Risley created an issue

When it starts the step of loading the clusters it gives me this:
Traceback (most recent call last):
File "/sw/eb/software/vContact2/2019.07.31-intel-2018b-Python-3.6.6/bin/vcontact", line 4, in <module>
import('pkg_resources').run_script('vcontact2==0.9.10', 'vcontact')
and then finally ends with 'Memory Error'
I have updated all of my python packages and used -v in my command to give me more information, but I can't seem to solve the issue.

Comments (6)

  1. Ben Bolduc

    Hi Catherine,

    Could you post the full output, either here or with a file attachment? More often than not, memory errors are caused by honest-to-goodness insufficient memory. If you’re running vContact2 on a machine with “sufficient” memory (maybe 48 GB or more) or you have a small dataset (~500 contigs), you can try re-running vContact2 using the legacy inputs. These are vestiges from the conversion of vContact1 → vContact2.

    vcontact --contigs vConTACT_contigs.csv --pcs vConTACT_pcs.csv --pc-profiles vConTACT_profiles.csv <rest-of-arguments>
    

    These three files are automatically generated as intermediate files. They should have been generated immediately preceding the “Loading the clusters, please be patient” step.

    I suspect this is a bug, whose byproduct is this memory error, when machines with just less than the required memory hit, because during this step vContact uses ~50% more memory than it should as it re-loads some data. A future version of the tool won’t have this problem, but it’ll take some time to re-code the offending area.

    Give the legacy inputs a shot and let me know how it goes!

    Cheers,

    Ben

  2. Catherine Risley reporter

    Hey Ben!

    Thanks for getting back to me. I'll be sure to try the legacy inputs and let you know how it goes. I have attached the full error message file. Let me know if you need to know anything else.

    Thanks, Catherine

  3. Bubba Brooks

    I ran into the same memory error @Catherine Risley posted on 2019-08-30. I then switched this job to a machine with 96 CPU and 768 Gb RAM and was met with the below error (for the full log, please see here). Is there a recommended upper limit of input proteins? This dataset is fairly large at ~1500 assembled human fecal metagenomes and viral features enriched using VIBRANT resulting in ~9M input viral proteins for vContact2.

    Files produced:

    total 36G
    -rw-rw-r-- 1 ubuntu ubuntu 2.4G Jan 26 13:19 merged.faa
    -rw-rw-r-- 1 ubuntu ubuntu 2.6G Jan 26 13:20 merged.dmnd
    -rw-rw-r-- 1 ubuntu ubuntu  16G Jan 26 22:26 merged.self-diamond.tab
    -rw-rw-r-- 1 ubuntu ubuntu 8.9G Jan 26 22:29 merged.self-diamond.tab.abc
    -rw-rw-r-- 1 ubuntu ubuntu 5.1G Jan 26 23:29 merged.self-diamond.tab.mci
    -rw-rw-r-- 1 ubuntu ubuntu 232M Jan 26 23:29 merged.self-diamond.tab_mcxload.tab
    -rw-rw-r-- 1 ubuntu ubuntu 166M Jan 27 01:14 merged.self-diamond.tab_mcl20.clusters
    -rw-rw-r-- 1 ubuntu ubuntu 738M Jan 31 15:01 vConTACT_proteins.csv
    -rw-rw-r-- 1 ubuntu ubuntu  13M Jan 31 15:01 vConTACT_contigs.csv
    -rw-rw-r-- 1 ubuntu ubuntu  29M Jan 31 15:01 vConTACT_pcs.csv
    -rw-rw-r-- 1 ubuntu ubuntu 236M Jan 31 15:01 vConTACT_profiles.csv
    -rw-rw-r-- 1 ubuntu ubuntu  25M Jan 31 15:02 merged_df.csv
    

    Error:

    ------------------------Calculating Similarity Networks-------------------------
    Traceback (most recent call last):
      File "/home/ubuntu/miniconda3/envs/vContact2/bin/vcontact", line 739, in <module>
        main(options)
      File "/home/ubuntu/miniconda3/envs/vContact2/bin/vcontact", line 585, in main
        args.mod_sig, args.mod_shared_min)
      File "/home/ubuntu/miniconda3/envs/vContact2/lib/python3.7/site-packages/vcontact/pcprofiles.py", line 71, in __init__
        self.ntw = self.network(self.matrix, self.singletons, thres=sig, max_sig=max_sig, threads=self.threads)
      File "/home/ubuntu/miniconda3/envs/vContact2/lib/python3.7/site-packages/vcontact/pcprofiles.py", line 150, in network
        final_results = list(chain.from_iterable([r.get() for r in results]))
      File "/home/ubuntu/miniconda3/envs/vContact2/lib/python3.7/site-packages/vcontact/pcprofiles.py", line 150, in <listcomp>
        final_results = list(chain.from_iterable([r.get() for r in results]))
      File "/home/ubuntu/miniconda3/envs/vContact2/lib/python3.7/multiprocessing/pool.py", line 657, in get
        raise self._value
      File "/home/ubuntu/miniconda3/envs/vContact2/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks
        put(task)
      File "/home/ubuntu/miniconda3/envs/vContact2/lib/python3.7/multiprocessing/connection.py", line 206, in send
        self._send_bytes(_ForkingPickler.dumps(obj))
      File "/home/ubuntu/miniconda3/envs/vContact2/lib/python3.7/multiprocessing/connection.py", line 393, in _send_bytes
        header = struct.pack("!i", n)
    struct.error: 'i' format requires -2147483648 <= number <= 2147483647
    

  4. Bubba Brooks

    This thread solved my issue. I also subset my data so that it’s ~half the size, though I think the overcommit mode was the main culprit for me (on Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-1057-aws x86_64)).

  5. Ben Bolduc

    Thanks for letting me know - especially the link. I haven’t had an opportunity to look deeply into this as of late. I suspected a hard memory limit, but overcommits could be a factor here. I won’t suggest re-running the data with the overcommit mode and not subsetting your data, but I’ll keep a note to try reproducing this error once I get my hands on a dataset this large.

    The thread mentions that - if memory is truly an issue - when enabling always overcommit, a true memory error will be thrown.

    If overcommit is truly the issue here, it’ll be left to a superuser to enable on their machine….

    Thanks again for the info. I’ll close shortly unless Catherine has been able to solve the issue.

  6. Ben Bolduc

    Until the Reporter has issue resolved via enabling overcommit or subsampling.

    Will make a note to include under "Known Bugs."

  7. Log in to comment