run Chicago Error

Hello,

I am using Chicago to analyze 5 capture HiC and have gotten the following error across all samples:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

use 'source("https://bioconductor.org/biocLite.R")' or 'source("http://bioconductor.org/biocLite.R")' to update 'BiocInstaller' after library("utils")

***runChicago.R

Loading required package: methods Warning message: package 'argparser' was built under R version 3.3.0

Loading the Chicago package and dependencies...

Loading required package: data.table

Welcome to CHiCAGO - version 1.1.8 If you are new to CHiCAGO, please consider reading the vignette through the command: vignette("Chicago"). NOTE: Default values of tlb.minProxOEPerBin and tlb.minProxB2BPerBin changed as of Version 1.1.5. No action is required unless you specified non-default values, or wish to re-run the pipeline on old chicagoData objects. See news(package="Chicago") Warning message: package 'Chicago' was built under R version 3.3.0 Warning: neither --en-feat-files nor --en-feat-list provided. Feature enrichments will not be computed

Setting the experiment...

Locating <baitmapfile>.baitmap in input_files... Found capture_targets_DpnII.baitmap Locating <rmapfile>.rmap in input_files... Found DpnII.sorted.rmap Locating <nperbinfile>.npb in input_files... Found capture_targets_DpnII.npb Locating <nbaitsperbinfile>.nbpb in input_files... Found capture_targets_DpnII.nbpb Locating <proxOEfile>.poe in input_files... Found capture_targets_DpnII.poe Checking the design files... Read 7227576 rows and 4 (of 4) columns from 0.214 GB file in 00:00:03

Reading input_files/Hi-C_capture.chinput Processing input... minFragLen = 150 maxFragLen = 40000 Filtered out 212087 interactions involving other ends < minFragLen or > maxFragLen. minNPerBait = 250 Filtered out 183586 baits with < minNPerBait reads.

Removed interactions with fragments adjacent to baits. Filtered out 0 baits without proximal non-Bait2bait interactions

Warning: directory chicago/Hi-C_capture exists and will be reused.

Starting chicagoPipeline...

*** Running normaliseBaits...

Normalising baits... Reading NPerBin file... Computing binwise means...

*** Running normaliseOtherEnds...

Preprocessing input... Computing trans-counts... Filtering out 1 other ends with top 0.01% number of trans-interactions Binning... Computing total bait counts... Reading NBaitsPerBin file... Read 7227576 rows and 76 (of 76) columns from 1.116 GB file in 00:00:23 Computing scaling factors... Computing binwise means... Computing normalised counts... Post-processing...

*** Running estimateTechnicalNoise...

Estimating technical noise based on trans-counts... Binning baits based on observed trans-counts... Defining interaction pools and gathering the observed numbers of trans-counts per pool... Computing the total number of possible interactions per pool... Preparing the data..... Processing fragment pools.. Plotting... Post-processing the results...

*** Running estimateDistFun...

*** Running estimateBrownianComponent...

s_i factors found - estimating Brownian component... Reading ProxOE file... Read 968660192 rows and 3 (of 3) columns from 20.670 GB file in 00:03:12 Sampling the dispersion... Getting consensus dispersion estimate...

*** Running getPvals...

Calculating p-values...

*** Running getScores...

Read 7227576 rows and 4 (of 4) columns from 0.214 GB file in 00:00:03 Calculating p-value weights... Calculating scores...

Saving the Chicago object...

Plotting examples...

Error in sample.int(length(x), size, replace, prob) : cannot take a sample larger than the population when 'replace = FALSE' Calls: plotBaits -> sample -> sample.int In addition: Warning messages: 1: In min(diff(x.unique)) : no non-missing arguments to min; returning Inf 2: In min(diff(x.unique)) : no non-missing arguments to min; returning Inf 3: In estimateBrownianComponent(cd) : subset > number of baits in data, so used the full dataset.

4: In estimateBrownianComponent(cd) : We're using the whole data set to calculate dispersion. There's no reason to sample repeatedly in this case, so overriding brownianNoise.samples to 1. Execution halted MUGQICexitStatus:1

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

I have some questions:

1- I read through the other issues that this might be due to the fact that there are less than 16 baits but my capture file contains 245931 captured regions. What is the problem exactly and how can I fix it?

2- It says "minFragLen = 150 maxFragLen = 40000": Is that the inter tag distance?

Is so, why not extend the maxFragLen to the full length of chromosome1 and increase minFragLen to over 1000bp to avoid contiguous regions?

3- "Filtered out 212087 interactions involving other ends < minFragLen or > maxFragLen." Is that not a lot to filter out? what proportion of interactions do we expect to filter out?

4- It says: "Filtered out 183586 baits with < minNPerBait reads." That seems then that my "minNPerBait = 250" is too high for the coverage of my experiments I assume? Even with all that filtering, I should still have more than 16 baits left. so why am I getting that error?

Thank you!

Comments (6)