Calculating p-values [update in Delaporte package breaks compatibility]

Issue #6 resolved
Jinrui Xu created an issue

Hi Mikhail,

I have been trying to run Chicago through the mESC data (only the replicate1, t2) used in your paper with default chicago parameters. But I got the following error right after sampling the dispersion.

Calculating p-values... Error in pdelap(N - 1L, alpha, beta = Bmean/alpha, lambda = Tmean, lower.tail = FALSE, : All parameters must be strictly greater than 0.

My R code is pasted here

library(Chicago) dataPath <- "/home/fas/gerstein/jx98/projects/enhancer_ntw/hicup/rst/mESC_reps" testDesignDir <- file.path(dataPath, "") dir(testDesignDir)

testDataPath <- file.path(dataPath, "") dir(testDataPath)

files <- c( file.path(testDataPath, "mESC_rep1_t2.chinput") )

cd <- setExperiment(designDir = testDesignDir) cd <- readAndMerge(files=files, cd=cd) cd <- chicagoPipeline(cd)

*.chinput

samplename=mESC_rep1_t2 bamname=mESC_rep1_t2_1_2.hicup baitmapfile=/home/fas/gerstein/jx98/projects/enhancer_ntw/hicup/rst/mESC_rep1_t2/mESC.baitmap_4col.txt digestfile=/home/fas/gerstein/jx98/projects/enhancer_ntw/hicup/rst/mESC_rep1_t2/m

ESC.rmap baitID otherEndID N otherEndLen distSign 100012 18554 1 1767 NA 100012 18839 1 24535 NA 100012 24086 1 1030 NA 100012 34364 1 1340 NA 100012 38013 1 3705 NA 100012 41026 1 17237 NA 100012 41897 1 10740 NA 100012 56587 1 2730 NA 100012 71339 1 1054 NA 100012 78995 1 2693 -63904272 100012 79081 1 3861 -63622542 100012 79132 1 1155 -63487043 100012 80276 1 1199 -60098213 100012 81908 1 1456 -55133484 100012 83099 1 967 -51699229

Is the problem likely due to data (bam file) or other reasons (Hi-cup and other input files)? Can it be solved by adding a small value to parameters in your future release?

How to combine hicup bam files of different technical and biological replicates for Chicago analysis?

Thanks!

Comments (22)

  1. Mikhail Spivakov

    Thanks for your message. Would you mind clarifying if you've used the .chinput file from mESC_chinput_final.tgz at osf.io/nemc6/ or you've started from the bam file (or even the raw reads)?

  2. Jinrui Xu reporter

    Hi Mikhail, I started from raw reads, hicupped them to bam, then used bam2chicago to generate the input file. Jinrui

  3. Mikhail Spivakov

    I suggest you first replicate the analysis starting from the .chinput file we shared (as above) to see if you run into the same problem.

  4. Mikhail Spivakov

    You can also - to start with - simply compare the .chinput file we provide with that you've generated to see if you spot any obvious problems.

  5. Jinrui Xu reporter

    Thank you for the the suggestions. I am trying them out.

    Your input file is much larger than mine because I guess you combine all technical replicates for each of the two mESC replicates. Could you tell me how did you combine the technical repeats? I think the p-value calculation problem is due to small number of reads. Thanks.

    Jinrui

  6. Jinrui Xu reporter

    I take your input files, but have the same issue.

    *** Running getPvals...

    Calculating p-values... Error in pdelap(N - 1L, alpha, beta = Bmean/alpha, lambda = Tmean, lower.tail = FALSE, : All parameters must be strictly greater than 0.

    I simply copy all your files into DesignFolder4Chicago, then run the same script. Thanks.

  7. Jinrui Xu reporter

    Sure, thanks.

    R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)

    locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
    [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
    [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
    [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
    [9] LC_ADDRESS=C LC_TELEPHONE=C
    [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

    attached base packages: [1] stats graphics grDevices utils datasets methods base

    other attached packages: [1] Chicago_1.1.1 data.table_1.9.6

    loaded via a namespace (and not attached): [1] Rcpp_0.12.5 matrixStats_0.50.2 lattice_0.20-33
    [4] Delaporte_2.3.0 Hmisc_3.17-4 MASS_7.3-45
    [7] chron_2.3-47 grid_3.2.2 plyr_1.8.4
    [10] gtable_0.2.0 acepack_1.3-3.3 scales_0.4.0
    [13] ggplot2_2.1.0 latticeExtra_0.6-28 rpart_4.1-10
    [16] Formula_1.2-1 splines_3.2.2 RColorBrewer_1.1-2 [19] foreign_0.8-66 munsell_0.4.3 survival_2.38-3
    [22] colorspace_1.2-6 cluster_2.0.3 nnet_7.3-12
    [25] gridExtra_2.2.1

  8. Mikhail Spivakov

    Thanks. I can see that the Delaporte package has updated compared with our setup (we use v2.2.3) as you can see here: https://static-content.springer.com/esm/art%3A10.1186%2Fs13059-016-0992-2/MediaObjects/13059_2016_992_MOESM3_ESM.pdf. One thing that may have changed is the treatment of cases where the parameters are zero (we recently discussed the treatment of this situation with the package author).

    Can you please catch the values of alpha, Bmean and Tmean that triggered the error in pdelap? If they are indeed zero (which they are allowed to be) we'll get back to the author and ask to fix the problem asap.

    Many thanks!

  9. Mikhail Spivakov

    PS. I've checked and this behaviour has indeed been updated in the new version of Delaporte, and this breaks our code.
    We'll work on a solution asap.

    Thanks for the heads-up!

  10. Jinrui Xu reporter

    I do not know how to check the values. Could you give me some tips? here are the objects in the R session.

    ls() [1] "cd" "dataPath" "files" "testDataPath" [5] "testDesignDir"

  11. Jinrui Xu reporter

    It is not straightforward to me how to remove dup reads of Hi-c (pair-end). Could you like me know what software and options used for this task? Thanks. Jinrui

  12. Log in to comment