runChicago.R: The Brownian OE factors look weird

Create issue
Issue #64 new
Adrija Kalvisa created an issue

Hi all,

I am trying to identify interactions using the defaults of bam2chicago.sh and runChicago.R with hicup.bam files as an input.

One of diagnostic plots looks weird - it does not increase as it is supposed to according to the Chicago vignette:

  1. Brownian other end factors: The adjustment made to the mean Brownian read count, estimated in the pools of other ends. (“tlb” refers to the number of trans-chromosomal reads that the other end accumulates in total. “B2B” stands for a “bait-to-bait” interactions).
  • The red bars should generally increase in height from left to right. (This does not seem to be fulfilled)
  • The blue bars should be higher than the red bars on average, and should also increase in height from left to right. (this looks correct)

The output of runChicago.R:

***runChicago.R

Loading the Chicago package and dependencies...

Loading required package: data.table
Registered S3 methods overwritten by 'ggplot2':
method from
[.quosures rlang
c.quosures rlang
print.quosures rlang

Welcome to CHiCAGO - version 1.12.0
If you are new to CHiCAGO, please consider reading the vignette through the command: vignette("Chicago").
NOTE: Default values of tlb.minProxOEPerBin and tlb.minProxB2BPerBin changed as of Version 1.1.5. No action is required unless you specified non-default values, or wish to re-run the pipeline on old chicagoData objects. See news(package="Chicago")
Setting the experiment...

Locating <baitmapfile>.baitmap in /localhome/bric/qlr900/analysis/02_MLL/data/pcHiC/chicago/designDir/...
Found Digest_mm10_HindIII_with_0393571_probes_mm10.baitmap
Locating <rmapfile>.rmap in /localhome/bric/qlr900/analysis/02_MLL/data/pcHiC/chicago/designDir/...
Found Digest_mm10_HindIII.rmap
Locating <nperbinfile>.npb in /localhome/bric/qlr900/analysis/02_MLL/data/pcHiC/chicago/designDir/...
Found Digest_mm10_HindIII.npb
Locating <nbaitsperbinfile>.nbpb in /localhome/bric/qlr900/analysis/02_MLL/data/pcHiC/chicago/designDir/...
Found Digest_mm10_HindIII.nbpb
Locating <proxOEfile>.poe in /localhome/bric/qlr900/analysis/02_MLL/data/pcHiC/chicago/designDir/...
Found Digest_mm10_HindIII.poe
Checking the design files...

Reading chicago/chinput/pcHiC-veh-rep1/pcHiC-veh-rep1.chinput
Processing input...
minFragLen = 150 maxFragLen = 40000
Filtered out 566356 interactions involving other ends < minFragLen or > maxFragLen.
minNPerBait = 250
Filtered out 277 baits with < minNPerBait reads.

Removed interactions with fragments adjacent to baits.
Filtered out 0 baits without proximal non-Bait2bait interactions

Warning: directory /localhome/bric/qlr900/analysis/02_MLL/data/pcHiC/chicago/chicago_interactions/pcHiC-veh-rep1 exists and will be reused.

Starting chicagoPipeline...

*** Running normaliseBaits...

Normalising baits...
Reading NPerBin file...
Computing binwise means...

*** Running normaliseOtherEnds...

Preprocessing input...
Computing trans-counts...
Filtering out 76 other ends with top 0.01% number of trans-interactions
Binning...
Computing total bait counts...
Reading NBaitsPerBin file...
Computing scaling factors...
Computing binwise means...
Computing normalised counts...
Post-processing...

*** Running estimateTechnicalNoise...

Estimating technical noise based on trans-counts...
Binning baits based on observed trans-counts...
Defining interaction pools and gathering the observed numbers of trans-counts per pool...
Computing the total number of possible interactions per pool...
Preparing the data.....
Processing fragment pools....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Plotting...
Post-processing the results...

*** Running estimateDistFun...

*** Running estimateBrownianComponent...

s_i factors found - estimating Brownian component...
Reading ProxOE file...
Sampling the dispersion...
Sampling the dispersion...
Sampling the dispersion...
Sampling the dispersion...
Sampling the dispersion...
Getting consensus dispersion estimate...

*** Running getPvals...

Calculating p-values...
Approximating 108 very small p-values.

*** Running getScores...

Calculating p-value weights...
Calculating scores...

Saving the Chicago object...

Plotting examples...

Exporting peak lists...

My guess that this happens because there are too many fragments whose length does not fulfill the minFragLen and maxFragLen cutoff values:

minFragLen = 150 maxFragLen = 40000
Filtered out 566356 interactions involving other ends < minFragLen or > maxFragLen.

So I would like to modify the settings file in a way that was suggested in issue #17 , setting the minFragLen to 0. However, this differs from the recommended settings for HindIII:

minFragLen: the min fragment length cutoff (no default: recommended 75 for DpnII, 150 for HindIII)

maxFragLen: the max fragment length cutoff (no default: recommended 1200 for DpnII, 40000 for HindIII)

How is the recommended fragment length determined for each restriction enzyme?

Best regards,

Adrija

Edit: The chicago versions I am using:

conda list | grep chicago

bioconductor-chicago 1.12.0 r36_1 bioconda
chicagotools 1.2.0 2 bioconda

Comments (4)

  1. Adrija Kalvisa reporter

    An update: It seems that for some replicates the upward trend is present in at least some baits. It does not get affected by setting minFragLen to 0. Does this trend have to be pronounced? What happens if it is not obvious?

  2. Mikhail Spivakov

    You are filtering out lots of fragments. What restriction enzyme are you using? If it’s a four-cutter, we’ve just optimised non-default settings for DpnII - I suggest you try them out.

  3. Adrija Kalvisa reporter

    Hi Mikhail, thanks, I use HindIII. Setting the minFragLen to 0 reduces the number of filtered baits from 500 000 to 20 000 but does not affect the s_i levels in brownian OE factor plots.

  4. Log in to comment