I am trying to use CHiCAGO to analyse C-HiC data, but I have run into a problem that I cannot overcome on my own: when processing some .chinput files of size > 8GB it runs out of memory before runChicago.R has finished.
As a simple sketch of my pipeline I can refer to "Issue
#12: correct workflow for weight recalibration by Elisabetta Manduchi" that is:
considering 2 biological replicates and would like to use them to recalibrate weights. I use the runChicago.R wrapper and would like to clarify if the below is the correct (and most efficient) workflow:
- Run separately runChicago.R on each of the 2 replicates, that is two runs: in one run the input is only rep1.chinput and in the other run the input is only rep2.chinput
- Get the 2 separate rds objects, rep1.rds and rep2.rds and run fitDistCurve.R with --input rep1.rds,rep2.rds
- Update the settings File with the new weights from (2)
- Use the updated settings File to do one more run of runChicago.R this time providing the comma-separated list rep1.chinput,rep2.chinput
I have no problem in most of the cases, but when my total size of the .chinput files on step 4 is > 8GB my job fails. I attached some of the logs with the typical error message. Consider that I have up to 120GB of RAM (24 CPU) available when I launch runChicago.R (attached plot with RAM usage). I also attached the script with the exact line code for running CHiCAGO that I use.
I tried to provide all the info you need to help me, but don't hesitate to contact me if something is missing or unclear. It would be great if we could use CHiCAGO for C-HiC data in our institute.
Thank you in advance Best, Marco