./bam2chicago.sh errors

Issue #7 resolved
Olivia Leavy created an issue

Hi,

I have been trying to use the ./bam2chicago.sh shell to create .chinput files.

When stating the following:

./bam2chicago.sh C51_0002_R1_2_hicup_captured.bam baitendBre80locus1nt.baitmap chr22RestrictedMapnt.rmap Bre80locus1 [nodelete]

I get the error:

Checking rmap and baitmap files... Error! Baitmap entry at line 1 not found in rmap. (Check that <baitmapfile> and <rmapfile> are given as the 2nd and 3rd arguments respectively, and not the other way round). Error! Baitmap entry at line 2 not found in rmap. (Check that <baitmapfile> and <rmapfile> are given as the 2nd and 3rd arguments respectively, and not the other way round). Error! Baitmap entry at line 3 not found in rmap. (Check that <baitmapfile> and <rmapfile> are given as the 2nd and 3rd arguments respectively, and not the other way round). Error! Baitmap entry at line 4 not found in rmap. (Check that <baitmapfile> and <rmapfile> are given as the 2nd and 3rd arguments respectively, and not the other way round).

Etc for each line in the baitmap file.

I have checked and the lines are the same and that the baitmap lines are in the rmap file. I have also made sure that the files are tab-seperated. I am not sure why I am getting this error.

Comments (18)

  1. Mikhail Spivakov

    Thanks for your report. We've never encountered this problem, so could you perhaps share the .baitmap and .rmap files with us so we could check?

  2. Mikhail Spivakov

    The problem is in this awk script within bam2chicago.sh. You can execute it as a standalone command in shell:

    awk 'BEGIN{                                
       print "Checking rmap and baitmap files..." 
       ok=1
    }{
     if(FNR==NR){
       if(!id[$4]){
         id[$4]=1;
       }else{
         print "Error! Duplicated fragment IDs found in rmap file at line "FNR"."; 
         ok=0;
       }
       if(NF != 4){
         print "Error! Wrong number of columns in rmap file at line "FNR", should be 4.";
       } 
       rmap[$1"_"$2"_"$3"_"$4] = 1;
     }else{
       if(!bid[$4]){
         bid[$4]=1;
       }else{
         print "Error! Duplicated fragment IDs found in baitmap file at line "FNR".";
         ok=0;
       }
       if(NF < 4){
         print "Error! Wrong number of columns in baitmap file at line "FNR", should be at least 4.";
       } 
       if(!rmap[$1"_"$2"_"$3"_"$4]){
         print "Error! Baitmap entry at line "FNR" not found in rmap. (Check that <baitmapfile> and <rmapfile> are given as the 2nd and 3rd arguments respectively, and not the other way round).";  
         ok=0;
       }
     }
    }END{
     if (ok){print "Rmap and baitmap files checked successfully";}
     else{print "Checking completed with errors"; exit 1;}
    }' chr22RestrictedMapnt.rmap baitendBre80locus1nt.baitmap
    

    For some reason it doesn't recognise fields from these files as equal. We've never had this problem before. As an interim solution, you may just comment this script out in bam2chicago.sh. However, I notice that your numeric IDs are huge, and we've never dealt with them defined this way. What if you just number them from 1 to N in the rmap file and rerun the script?

  3. Olivia Leavy reporter

    Thank you for your help! Ok, so I will remove this awk script from bam2chicago.sh and run the shell.

    I have previously tried numbering the numerical IDs 1 to N as I thought that was the problem, but I still got the same error.

  4. Marco Pinheiro

    Good Afternoon,

    I have been trying to use bam2chicago.sh script to generate .chinput files. I previously had the same problem concerning the awk script and commented that out, as suggested which helped however, I am now running into different problems for my replicates.

    Running this on my first replicate:

         ./chicagoTools/bam2chicago.sh Rep1_read1_2.hicup.bam ../designDir/CaptureBaits.baitmap ../designDir/DpnIIDigest.rmap Rep1
    

    Produces the following error:

                 Error: Type checker found wrong number of fields while tokenizing data line.
    

    Running this on my second replicate:

                 ./chicagoTools/bam2chicago.sh Rep2_read1_2.hicup.bam ../designDir/CaptureBaits.baitmap ../designDir/DpnIIDigest.rmap Rep2
    

    Produces the following error:

                 Error: line number 100 of file Rep2/Rep2_read1_2.hicup_mappedToBaits_baitOnRight.bedpe has 15 fields, but 14 were expected.
    

    Both errors come after the bait fragment intersection to produce bait-to-bait interactions step. I have checked the number of fields in the .bam files and they have 14 fields. I also converted the data to .bedpe for both replicates using bedtools and I am getting 14 fields for each. My .rmap file is numbered from 1 to N and my files are all tab separated. I am unsure what might be causing this problem and would be grateful for any help.

    Thank you, Marco

  5. Mikhail Spivakov

    Dear Marco

    Sorry to hear you're having trouble with this. Perhaps the easiest way to her to the bottom of this would be if you could share a small subset of your data with us that reproduces the problem. We can then look at this on our end.

    Best wishes, Mikhail

  6. Marco Pinheiro

    Good Afternoon,

    Thank you for your reply. As requested please find attached file with a subset from both replicates that caused the mentioned errors. The .rmap and .baitmap files are also included.

    Thank you again for your time and assistance in this.

    Kind regards, Marco

  7. Mikhail Spivakov

    Hi Marco

    Thanks for sending me the files. Mysteriously, on my end I'm also getting the "awk script" issue you've referred to (certainly needs investigating), but with that bit commented out the script completes ok:

    Processing sample test...
    Using bam file Rep1_read1_2.hicup.1000lines.bam
    Using baitmap file CaptureBaits.baitmap
    Using digest map (rmap) file DpnIIDigest.rmap
    Baitmap file contains >4 columns. Checking if CaptureBaits.baitmap_4col.txt exists...
    It doesn't. So trimming the extra columns and saving the result in CaptureBaits.baitmap_4col.txt...
    Intersecting with bait fragments (using min overhang of 0.6)...
    *****WARNING: Query K00311:19:HFJHCBBXX:3:1101:27407:3196 is marked as paired, but it's mate does not occur next to it in your BAM file.  Skipping. 
    Flipping all reads that overlap with the bait on to the right-hand side...
    Intersecting with bait fragments again to produce a list of bait-to-bait interactions that can be used separately; note they will also be retained in the main output...
    Intersecting with restriction fragments (using min overhang of 0.6)...
    Removing reads that failed the min overhang filter...
    Filtered out 0.000000 reads with <60% overlap with a single digestion fragment
    Adding frag length and signed distance from bait; removing self-ligation fragments (if any; not expected with HiCUP input)...
    Pooling read pairs...
    Done! The file to be used for Chicago R package input is test/test.chinput
    

    The errors you're referring to is generated by bedtools called from within the shell script. Perhaps check what exactly you see at those offending lines? Also, fyi we have bedtools v2.25.0 installed. If yours is much earlier, would it make sense to update?

  8. Marco Pinheiro

    Hi Mikhail,

    Thank you for your quick reply. That is vey odd, I tested that file before sending it to you and got the following:

        ../chicagoTools/bam2chicago.sh Rep1_read1_2.hicup.1000lines.bam ../designDir/CaptureBaits.baitmap ../designDir/DpnIIDigest.rmap Rep1-1000lines
        Processing sample Rep1-1000lines...
        Using bam file Rep1_read1_2.hicup.1000lines.bam
        Using baitmap file ../designDir/CaptureBaits.baitmap
        Using digest map (rmap) file ../designDir/DpnIIDigest.rmap
        Baitmap file contains >4 columns. Checking if ../designDir/CaptureBaits.baitmap_4col.txt exists...
        Found ../designDir/CaptureBaits.baitmap_4col.txt
        Intersecting with bait fragments (using min overhang of 0.6)...
        *****WARNING: Query K00311:19:HFJHCBBXX:3:1101:27407:3196 is marked as paired, but it's mate does not occur next to it in your BAM file.  Skipping.
        Flipping all reads that overlap with the bait on to the right-hand side...
        Intersecting with bait fragments again to produce a list of bait-to-bait interactions that can be used separately; note they will also be retained in the main output...
                Error: Type checker found wrong number of fields while tokenizing data line.
    

    Below is the also the error message I got from testing the file from my second replicate that I had sent:

        samtools view -h Rep2_read1_2.hicup.bam | head -n 100000 | samtools view -bs - > Rep2_read1_2.hicup.100000lines.bam
        l-uosxx0eaf8jc:from_HiCUP mfbx9mp5$ ../chicagoTools/bam2chicago.sh Rep2_read1_2.hicup.100000lines.bam ../designDir/CaptureC.baitmap ../designDir/CaptureC.rmap Rep2-100000lines
        Processing sample Rep2-100000lines...
        Using bam file Rep2_read1_2.hicup.100000lines.bam
        Using baitmap file ../designDir/CaptureC.baitmap
        Using digest map (rmap) file ../designDir/CaptureC.rmap
        Baitmap file contains >4 columns. Checking if ../designDir/CaptureC.baitmap_4col.txt exists...
        Found ../designDir/CaptureC.baitmap_4col.txt
        Intersecting with bait fragments (using min overhang of 0.6)...
        *****WARNING: Query K00311:19:HFJHCBBXX:3:1103:26778:7451 is marked as paired, but it's mate does not occur next to it in your BAM file.  Skipping.
        Flipping all reads that overlap with the bait on to the right-hand side...
        Intersecting with bait fragments again to produce a list of bait-to-bait interactions that can be used separately; note they will also be retained in the main output...
        Error: line number 100 of file Rep2-100000lines/Rep2_read1_2.hicup.100000lines_mappedToBaits_baitOnRight.bedpe has 15 fields, but 14 were expected.
    

    I checked my bedtools installation and I am using version 2.26.0.

  9. Mikhail Spivakov

    A long shot - what if you delete ../designDir/CaptureC.baitmap_4col.txt and try again? Failing that, I'd suggest inspecting what this 15th field in line 100 of the bedpe file contains - is this just special characters or there's something meaningful written there that shouldn't be?

  10. Mikhail Spivakov

    Just an update to both Olivia and Marco - I've got to the bottom of the first problem (with the input checking awk script). It's got to do with the fact that one of your input files, but not the other, had some special characters at the ends of each line. It'll take a little time for the updated shell script to make it to to the release, but you can already patch it yourselves:

        print "Checking rmap and baitmap files..." 
        ok=1
     }{
    ### ADD THIS LINE HERE ###
     gsub(/[[:space:]]$/, "", $4);
    #########################
      if(FNR==NR){
        if(!id[$4]){
          id[$4]=1;
    
  11. Marco Pinheiro

    Hi Mikhail,

    Thank you for the patch, I added that and could get through the awk script without a problem.

    I have been working on the extra field problem and noticed that the extra fields were empty and caused by a tab on some lines in the baitOnRight.bedpe files. I ended up modifying the bam2chicago.sh script on line 94 removing the $15 from the print operation. Running the script with this modification has actually worked and when I look at both the intermediate files and .chinput files everything seems correct. I was hoping you could please possibly advise me on if there might be any implications from this modification that you might be aware of or there would be a patch I could add?

    Kind regards, Marco

  12. Olivia Leavy reporter

    Hi Mikhail,

    Thank you for the patch! I shall try and run it now. Thank you for all your help!

    Kind regards, Olivia

  13. Mikhail Spivakov

    I am resolving this, although I'm not sure if Marco's problem is completely solved. Marco, please give us a shout if the issues still remain.

  14. Noboru Sakabe

    I didn't know if I should open another issue or just comment here, so I apologize if I should have opened another issue. I was getting this error "Error: Type checker found wrong number of fields while tokenizing data line." even when running the latest/modified bam2chicago.sh script.

    I got this error when running "Intersecting with bait fragments again to produce a list of bait-to-bait interactions that can be used separately; note they will also be retained in the main output..." It seems that one of the temporary files is not a canonical bed +3 or +9 etc file.

    bedtools >2.25 implements a .bed format checker. When running bedtools 2.25, I got no errors, but 2.26 gave me that error. My solution was to edit bed2chicago.sh and force 2.25.

    Chicago developers may have to force bedtools 2.25 or rewrite the script to get around the format checker.

  15. Mikhail Spivakov

    Hi Noboru,

    Thanks a lot for reporting this issue! I've contacted bedtools authors to see if it's possible to turn off error checking as an option (and if not, whether they'd mind implementing it). In the meanwhile, we'll place a note on our website that bedtools v2.26 isn't compatible with CHiCAGO.

    Best wishes, Mikhail

  16. Log in to comment