Chicago works with each two of three replicates, but not with all three together

Issue #48 resolved
Former user created an issue

Hello everybody,

maybe you can help out here. We want to analyze Capture-C data and our 15 h time point works nicely, but when we go for the 0h time point the script produces an error. If we only use replicates 10 and 30 -> works, 30 and 40 -> works, 10 and 40 -> works. 10, 30, 40 -> does not work.

Here comes the output:

[1] "Setting up experiment" [1] "Reading files and merging replicates" Reading .../0h/DG75_10.chinput Processing input... minFragLen = 150 maxFragLen = 40000 Filtered out 95013 interactions involving other ends < minFragLen or > maxFragLen. minNPerBait = 250 Filtered out 0 baits with < minNPerBait reads.

Removed interactions with fragments adjacent to baits. Filtered out 4 baits without proximal non-Bait2bait interactions

Reading .../0h/DG75_40.chinput Processing input... minFragLen = 150 maxFragLen = 40000 Filtered out 81250 interactions involving other ends < minFragLen or > maxFragLen. minNPerBait = 250 Filtered out 0 baits with < minNPerBait reads.

Removed interactions with fragments adjacent to baits. Filtered out 4 baits without proximal non-Bait2bait interactions

Reading .../0h/DG75_30.chinput Processing input... minFragLen = 150 maxFragLen = 40000 Filtered out 112434 interactions involving other ends < minFragLen or > maxFragLen. minNPerBait = 250 Filtered out 0 baits with < minNPerBait reads.

Removed interactions with fragments adjacent to baits. Filtered out 4 baits without proximal non-Bait2bait interactions

Merging samples... Computing merged scores...

Computing sample scaling factors...

Reading NPerBin file... Computing normalisation scores... [1] "Starting CHiCAGO pipeline"

*** Running normaliseBaits...

Normalising baits... Reading NPerBin file... Computing binwise means...

*** Running normaliseOtherEnds...

Preprocessing input... Computing trans-counts... Filtering out 153 other ends with top 0.01% number of trans-interactions Binning... Error in approx(cum, xx, xout = (1:g) * nnm/g, method = "constant", rule = 2, : zero non-NA points Calls: chicagoPipeline -> normaliseOtherEnds -> .addTLB -> cut2 -> approx In addition: Warning message: In min(diff(x.unique)) : no non-missing arguments to min; returning Inf Execution halted

Thanks a lot

Comments (8)

  1. Alexanderino

    Hm session info was missing…here it comes:

    R version 3.6.1 (2019-07-05)

    Platform: x86_64-pc-linux-gnu (64-bit)

    Running under: Ubuntu 18.04.3 LTS

    Matrix products: default

    BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1

    LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

    locale:

    [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8

    [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8

    [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C

    [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

    attached base packages:

    [1] parallel stats4 stats graphics grDevices utils datasets

    [8] methods base

    other attached packages:

    [1] GenomicInteractions_1.16.0 InteractionSet_1.10.0

    [3] SummarizedExperiment_1.12.0 DelayedArray_0.8.0

    [5] BiocParallel_1.16.6 matrixStats_0.54.0

    [7] Biobase_2.42.0 GenomicRanges_1.34.0

    [9] GenomeInfoDb_1.18.2 IRanges_2.16.0

    [11] S4Vectors_0.20.1 BiocGenerics_0.28.0

    [13] Chicago_1.1.8 data.table_1.12.2

    loaded via a namespace (and not attached):

    [1] ProtGenerics_1.14.0 bitops_1.0-6

    [3] bit64_0.9-7 RColorBrewer_1.1-2

    [5] progress_1.2.0 httr_1.4.0

    [7] tools_3.6.1 backports_1.1.4

    [9] R6_2.4.0 rpart_4.1-15

    [11] Hmisc_4.2-0 DBI_1.0.0

    [13] lazyeval_0.2.2 Gviz_1.26.5

    [15] colorspace_1.4-1 nnet_7.3-12

    [17] tidyselect_0.2.5 gridExtra_2.3

    [19] prettyunits_1.0.2 bit_1.1-14

    [21] curl_3.3 compiler_3.6.1

    [23] htmlTable_1.13.1 rtracklayer_1.42.2

    [25] scales_1.0.0 checkmate_1.9.1

    [27] Delaporte_6.3.0 stringr_1.4.0

    [29] digest_0.6.18 Rsamtools_1.34.1

    [31] foreign_0.8-72 XVector_0.22.0

    [33] base64enc_0.1-3 dichromat_2.0-0

    [35] pkgconfig_2.0.2 htmltools_0.3.6

    [37] ensembldb_2.6.8 BSgenome_1.50.0

    [39] htmlwidgets_1.3 rlang_0.3.4

    [41] rstudioapi_0.10 RSQLite_2.1.1

    [43] acepack_1.4.1 dplyr_0.8.0.1

    [45] VariantAnnotation_1.28.13 RCurl_1.95-4.12

    [47] magrittr_1.5 GenomeInfoDbData_1.2.0

    [49] Formula_1.2-3 Matrix_1.2-17

    [51] Rcpp_1.0.1 munsell_0.5.0

    [53] stringi_1.4.3 MASS_7.3-51.4

    [55] zlibbioc_1.28.0 plyr_1.8.4

    [57] grid_3.6.1 blob_1.1.1

    [59] crayon_1.3.4 lattice_0.20-38

    [61] Biostrings_2.50.2 splines_3.6.1

    [63] GenomicFeatures_1.34.7 hms_0.4.2

    [65] knitr_1.22 pillar_1.3.1

    [67] igraph_1.2.4 biomaRt_2.38.0

    [69] XML_3.98-1.19 glue_1.3.1

    [71] biovizBase_1.30.1 latticeExtra_0.6-28

    [73] gtable_0.3.0 purrr_0.3.2

    [75] assertthat_0.2.1 ggplot2_3.1.1

    [77] xfun_0.6 AnnotationFilter_1.6.0

    [79] survival_2.44-1.1 tibble_2.1.1

    [81] GenomicAlignments_1.18.1 AnnotationDbi_1.44.0

    [83] memoise_1.1.0 cluster_2.1.0

  2. Mikhail Spivakov

    Thanks for the report. There seem to be some sparsity issues in your data, which seem to not be adequately accounted for by the existing Chicago code. The error is triggered by the cut2 command at either line 1103 or line 1124. It’s trying to split other ends into bins by the number of trans-chromosomal interactions (with baits) they are engaged into, but potentially gets an empty data table as input or something like that. Ideally, I’d like to ask you to put a breakpoint to these two lines, check what data cut2 receives as input there and let me know. If this kind of debugging is beyond your R capabilities, we’ll need you to share a reproducible sample of your data with us for further investigation.

  3. Mikhail Spivakov

    (PS. note that the .addTLB function within which this code rests is called twice, once for binning baits and then for binning other ends based on the number of trans-interactions they engage in, and the error in your case is triggered in the second instance).

  4. Alexanderino

    Thanks for the great support Mikhail!

    So, if i got you correctly, here is the output for

    cutsB2B = cut2(transLenB2B[abs(distSign) <= s$maxLBrownEst]$transLength,
    m = minProxB2BPerBin, onlycuts = TRUE)

    transLenB2B

    otherEndID transLength isBait2bait distSign
    1: 7127626 4 TRUE 231180718

    minProxB2BPerBin

    [1] 2500

    s$maxLBrownEst

    [1] 1500000

    and deeper to the cut2 function

    cuts <- approx(cum, xx, xout = (1:g) * nnm/g, method = "constant",
    rule = 2, f = 1)$y

    cum

    numeric(0)

    xx

    numeric(0)

    g

    1

    nnm

    0

  5. Mikhail Spivakov

    Ok, so this is in line with what I expected - a sparsity issue. For some reason, you seem to only have a single other end of bait-to-bait interactions presented to cut2, and its minimum interaction distance with other baits exceeds the proximal distance range. As a result, cut2 gets an empty vector to split and triggers an error.

    This is the first time this happens, so highly unusual. You probably need to investigate why no other bait 2 bait interactions are visible. Is this because your capture design is such that all baits are very far apart from each other? And/or is this because the replicates where the analysis fails haven't got that much signal anyway? Or are these interactions getting filtered out by Chicago at some stage as outliers?

    There are a couple of potential ways to just make Chicago run on your data - albeit none are guaranteed. One is to just pool the bam files for all your replicates and filter out duplicates. Hopefully this will make the data less sparse and resolve this problem.

    Alternatively, you can try disabling the special treatment of bait2bait interactions (cd@settings$adjBait2bait=FALSE). In this case however you'll need to regenerate the aux files with the python script using the same setting. And also in this case, I'd treat any detected significant bait2bait interactions with caution.

  6. Alexanderino

    Thanks a lot for your support Mikhail! (came also via mail)

    What i finally did:

    Background:

    First of all we used our baits on two cells lines. One cell line includes the DNA for a virus and so the baits so. The other cell line does not have viral DNA, still the baits are there.
    The file causing the problem stems from the cell line without viral DNA. So this may well have been the reason to cause problem. The bait number 7127626, which i found in the transLenB2B vector, is of viral origin.

    So i got rid of the baits in the *.baitmap file, recalculated all the files and let Chicago run again. I got the same error, but this time the transLenB2B was empty.

    → Didn’t solve the problem.

    Some years ago, i wrote my own script to analyze these kind of data. From these data i know that there are several genes, which also caused problems in my script. There was nearly no or no signal for most of these genes. I also excluded these from the *.baitmap and recalculated everything again and now it works.

    So the problem seems to origin in baits, which cause only little signal.

    I did not try pool the bam file. I started to try the cd@settings$adjBait2bait=FALSE setting (which needs to be set when you run: makeDesignFiles.py removeb2b=FALSE),

    but didn’t go on with this, because getting rid of the problematic baits worked.

    Maybe it helps anyone some day

  7. Mikhail Spivakov

    Great, thanks for the report! I suspect Chicago could be instructed to remove these baits via the minNPerBait setting, but either way - it’s worked and I’m happy to hear that.

  8. Log in to comment