Error in get(binCol) : invalid first argument

Issue #2 resolved
Noboru Sakabe created an issue
I am trying to run it on some data I have and I'm running into an error in the R code.
I was able to run successfully the Python scripts. I post below the output of the R code and my session info. I'd be glad to share my input files to help tracking the error.
I thank you in advance for your help!

preprocessing

python ~/programs/hic/chicago/chicagoTools/makeNBaitsPerBinFile.py python ~/programs/hic/chicago/chicagoTools/makeNPerBinFile.py python ~/programs/hic/chicago/chicagoTools/makeProxOEFile.py

~/programs/hic/chicago/chicagoTools/bam2chicago.sh /data/hi-c/1.hicup.bam probes.baitmap MboI.rmap 1

R code

library(Chicago)

writeLines(capture.output(sessionInfo()), "session.txt")

cd <- setExperiment(designDir = "./", settingsFile = "~/programs/hic/chicago/PCHiCdata/inst/extdata/sGM12878Settings/sGM12878.settingsFile")

files <- c( file.path("1/1.chinput") )

cd <- readAndMerge(files=files, cd=cd)

cd <- chicagoPipeline(cd)

R output

Loading required package: data.table Welcome to CHiCAGO - version 0.2.9 If you are new to CHiCAGO, please consider reading the vignette through the command: vignette("Chicago"). Locating <baitmapfile>.baitmap in ./... Found probes.baitmap Locating <rmapfile>.rmap in ./... Found MboI.rmap Locating <nperbinfile>.npb in ./... Found MboI.npb Locating <nbaitsperbinfile>.nbpb in ./... Found MboI.nbpb Locating <proxOEfile>.poe in ./... Found MboI.poe Reading 1/1.chinput Processing input... minFragLen = 150 maxFragLen = 40000 Filtered out 22618 interactions involving other ends < minFragLen or > maxFragLen. minNPerBait = 250 Filtered out 68142 baits with < minNPerBait reads.

Removed interactions with fragments adjacent to baits. Filtered out 0 baits without proximal non-Bait2bait interactions

Merging samples... Computing merged scores...

Computing sample scaling factors...

Reading NPerBin file... Computing normalisation scores... Warning messages: 1: In fread(file) : Starting data input on line 2 and discarding line 1 because it has too few or too many items to be column names or data: ## samplename=1 bamname=ind-10-R1_2.hicup baitmapfile=probes.baitmap_4col.txt digestfile=MboI.rmap 2: In [.data.table(x, , :=(isAllB2BProx, { : RHS 1 is length 1 (greater than the size (0) of group 1). The last 1 element(s) will be discarded. 3: In fread(s$nperbinfile) : Starting data input on line 2 and discarding line 1 because it has too few or too many items to be column names or data: # minFragLen=150 maxFragLen=40000 maxLBrownEst=1500000 binsize=20000 removeb2b=True removeAdjacent=True rmapfile=./MboI.rmap baitmapfile=./probes.baitmap

*** Running normaliseBaits...

Normalising baits... Reading NPerBin file... Error in get(binCol) : invalid first argument Calls: chicagoPipeline ... normaliseBaits -> .normaliseFragmentSets -> [ -> [.data.table -> get In addition: Warning messages: 1: In fread(s$nperbinfile) : Starting data input on line 2 and discarding line 1 because it has too few or too many items to be column names or data: # minFragLen=150 maxFragLen=40000 maxLBrownEst=1500000 binsize=20000 removeb2b=True removeAdjacent=True rmapfile=./MboI.rmap baitmapfile=./probes.baitmap 2: In [.data.table(x, , :=(binCol, do.call(paste0, list("bin", : RHS 1 is length 1 (greater than the size (0) of group 1). The last 1 element(s) will be discarded. Execution halted

Session info

R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.3 LTS

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets base

other attached packages: [1] Chicago_0.2.9 data.table_1.9.6

loaded via a namespace (and not attached): [1] Rcpp_0.12.1 Formula_1.2-1 cluster_2.0.3
[4] magrittr_1.5 splines_3.2.2 MASS_7.3-44
[7] munsell_0.4.2 colorspace_1.2-6 lattice_0.20-33
[10] stringr_1.0.0 plyr_1.8.3 tools_3.2.2
[13] nnet_7.3-11 grid_3.2.2 gtable_0.1.2
[16] Delaporte_2.2-3 latticeExtra_0.6-26 matrixStats_0.15.0 [19] survival_2.38-3 digest_0.6.8 gridExtra_2.0.0
[22] RColorBrewer_1.1-2 reshape2_1.4.1 ggplot2_1.0.1
[25] acepack_1.3-3.3 rpart_4.1-10 stringi_1.0-1
[28] methods_3.2.2 scales_0.3.0 Hmisc_3.17-0
[31] foreign_0.8-66 chron_2.3-47 proto_0.3-10
session.txt (END)

Comments (7)

  1. Mikhail Spivakov

    Dear Noboru

    Thanks for reporting this issue - we will certainly look into it.

    Thanks for offering to share the input files. Can you please let me know your email address, so we can send you the credentials of the FTP to upload them?

    Best wishes, Mikhail

  2. Jonathan Cairns

    Hi Nobaru,

    I had a look at your example:

    1) The code uses the CHiCAGO vignette's settings file - this is not appropriate for general analysis so I would suggest changing the setExperiment() line to:

    cd <- setExperiment(designDir = "./")
    

    2) readSample() features a filtering step, which requires each bait to have at least 250 reads. Unfortunately, none of the baits in your data satisfy this requirement - most of them have under 100 reads. As a result, they are all filtered out, leading to the error. The error is not very informative, so I will fix this in the next version of CHiCAGO.

    There are a couple of ways to address the issue, arranged from most preferred to least preferred:

    • If you have more technical replicates of sample 1.bam, then pool them together (ideally, prior to running them through HiCUP).

    • If you don't have any technical replicates of 1.bam, but you do have biological replicates, then you could try pooling the biological replicates together.

    • As an absolute last resort, you can relax the filtering requirement by changing the minNPerBait setting in the setExperiment() function. However, I would strongly recommend against doing this for human samples - in our experience, it is unlikely that you will get any significant interactions from such an analysis, and any that you do get will be extremely unreliable.

    Hope this is helpful. Thanks very much for providing feedback and helping us to improve CHiCAGO - let us know if you encounter any further problems!

    Jonathan

  3. Noboru Sakabe reporter

    Thanks for your help!

    1) I used the additional settings because my sample was also small.

    2) Is there a way to obtain those stats (< 100 reads per bait)? Maybe you could generate a report about certain parameters that you consider important in a good Hi-C dataset?

    I will pool other samples and see how it goes.

    Thank you!

  4. Jonathan Cairns

    1) Ah - I see; apologies for the lack of clarity in that section of the vignette, I will reword it.

    2) I was toying around with this and I've realized there's an issue with your baitmap file, which may be why the number of reads per bait is low. For a given bait, the fourth column of its entry in the .baitmap file should match the fourth column of its entry in the .rmap file. In other words, if example.baitmap contains:

    #!
    chr18 510000 511000 757 GeneName
    

    then example.rmap should contain a line

    #!
    chr18 510000 511000 757
    

    Currently, your baitIDs are numbered by the order they occur in .baitmap. If you change these IDs to match those in .rmap then, with any luck, the number of reads per bait should rise dramatically.

    Ideally, Chicago should check for mismatched .rmap and .baitmap; I will add this feature.

    A quick and dirty way to check the number of reads per bait, pre Chicago filtering:

    raw <- fread("1/1.chinput")
    x <- raw[,sum(N), by=baitID]
    x
    

    (This is not quite correct as is miscounts bait-to-bait interactions, but it should at least tell you if the number of reads is of the correct order of magnitude.)

    Thanks for your suggestion; we will discuss including such a feature in future CHiCAGO versions.

    Let me know if changing the .baitmap file solves the problem!

    Thanks,

    Jonathan

  5. Noboru Sakabe reporter

    Hi, Jonathan and Mikhail, thanks for your help with Chicago! Indeed, I didn't match .rmap and .baitmap ID's and that was the cause of the errors I was getting. I fixed this problem and Chicago ran to completion.

    In addition to checking if .baitmap is a subset of .rmap, I suggest other error checking:

    1. Check if the .baitmap file contains 5 columns: preprocessing works fine in a 4 column file and it takes hours to finish, but the R script will crash with this error "There are fewer columns in the baitmapfile than expected. Check that this file lists both the IDs and names for each baited fragment, and that the corresponding columns are specified in baitmapFragIDcol and baitmapGeneIDcol, respectively."

    2. Check if chromosome names match in .baitmap, .rmap and .bam files. At first I used your .rmap file and your chromosome names are 1, 2, 3, etc while mine are chr1, chr2, chr3. The error I got wasn't easy to trace back to this problem.

    Other suggestions to make Chicago easier to use:

    1. Group all the preprocessing scripts into one since they all use the same input files.

    2. Group the R script with bam2chicago.sh.

    Thanks for distributing Chicago!

  6. Mikhail Spivakov

    Happy to hear this has fixed it - and thanks for suggestions. We'll certainly look into them!

    Sent from Samsung Mobile

  7. Log in to comment