About bam2chicago.sh error
Hi,
I am trying to generate .chinput file using the .rmap and .baitmap from CapHiCdata, and the bam file attached. I have generated it using hicup as recommended. When starting the following:
\$ bam2chicago.sh CapHiC_BJ4ES_R1_2.hicup.genome1.bam designDir/S3175602_Covered_intersect_annotate2.baitmap designDir/mm9_MboI_fragment.rmap BJ4ES_genome1
Checking rmap and baitmap files...
Rmap and baitmap files checked successfully
Processing sample BJ4ES_genome1...
Using bam file CapHiC_BJ4ES_R1_2.hicup.genome1.bam
Using baitmap file designDir/S3175602_Covered_intersect_annotate2.baitmap
Using digest map (rmap) file designDir/mm9_MboI_fragment.rmap
Baitmap file contains >4 columns. Checking if designDir/S3175602_Covered_intersect_annotate2.baitmap_4col.txt exists...
Found designDir/S3175602_Covered_intersect_annotate2.baitmap_4col.txt
Intersecting with bait fragments (using min overhang of 0.6)...
Flipping all reads that overlap with the bait on to the right-hand side...
Intersecting with bait fragments again to produce a list of bait-to-bait interactions that can be used separately; note they will also be retained in the main output...
Error: Type checker found wrong number of fields while tokenizing data line.
I am not sure why I am getting this error. I am hoping you could please possibly advise me.
Best regards.
Comments (7)
-
reporter -
reporter Sorry…I used bam2chicago.sh (modified version) with another bam files, following error was occurred.
\$ ./bam2chicago_modified.sh CapHiC_JB2ES_R1_2.hicup.genome2.bam designDir/S3175602_Covered_intersect_annotate2.baitmap designDir/mm9_MboI_fragment.rmap JB2ES_genome2
Checking rmap and baitmap files...
Rmap and baitmap files checked successfully
Processing sample JB2ES_genome2...
Using bam file CapHiC_JB2ES_R1_2.hicup.genome2.bam
Using baitmap file designDir/S3175602_Covered_intersect_annotate2.baitmap
Using digest map (rmap) file designDir/mm9_MboI_fragment.rmap
Baitmap file contains >4 columns. Checking if designDir/S3175602_Covered_intersect_annotate2.baitmap_4col.txt exists...
Found designDir/S3175602_Covered_intersect_annotate2.baitmap_4col.txt
Intersecting with bait fragments (using min overhang of 0.6)...
*****WARNING: Query NB551733:5:HFCHVBGXB:1:11101:7827:8813 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping.
*****WARNING: Query NB551733:5:HFCHVBGXB:1:11101:26809:10628 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping.
*****WARNING: Query NB551733:5:HFCHVBGXB:1:11101:25176:12818 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping.
*****WARNING: Query NB551733:5:HFCHVBGXB:1:11101:22480:13563 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping….
Bam format is same as previous ones. Why I am getting this error? I am hoping you could please possibly advise me.
Best regards.
-
reporter - marked as bug
-
Happy to hear you’ve sorted the first issue. Re second issue - how many error messages like this have you got? If they were all over the place, this simply means that you need to re-sort your file such that mate pairs are kept together (you can do it easily with samtools). If you only have a very small number of such warnings, I’d probably just disregard them.
-
reporter Thank you for reply.
According to your direction, I sorted my bam file using samtools as follows:
samtools sort CapHiC_JB2ES_R1_2.hicup.genome1.bam > CapHiC_JB2ES_R1_2.hicup.genome1_sort.bam
But I got the same error for all lines.
Could you check my bam file, if you have a time?
-
Samtools sort with default parameters won't give you what you want. Please refer to samtools docs for the correct command line. I think it’s -n but please double check.
-
reporter Thanks. So I re-sort my bam file using -n option. Bam file was sorted according to the ID, but was not paired with next line such as other correct files.
\$ samtools sort -n CapHiC_JB2ES_R1_2.hicup.genome1.bam > CapHiC_JB2ES_R1_2.hicup.genome1_sort.bam
\$ samtools view CapHiC_JB2ES_R1_2.hicup.genome1_sort.bam | head
NB551733:5:HFCHVBGXB:1:11101:1070:6925
163
chr7
96981017
42
133M
chr9
22064319
TACTAATACCATGTTATAACAGAATCCCAAGTGTGAGAGAGCATACAGCCTTGCAAGACTGTTGGAAAAGTAGTGGCCCCAGGGGACAGCTAAATTTTAACTAAGACAGGATAGGGAGCAGAGTAATGGGATC
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEE<AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEE
AS:i:-3
XN:i:3
XM:i:3
XO:i:0
XG:i:0
NM:i:3
MD:Z:45N8N10N67
YT:Z:UU
CT:Z:TRANS
XX:Z:G1
NB551733:5:HFCHVBGXB:1:11101:1082:5584
83
chr15
76343273
23
4M3I143M
chr13
3707068
0
GCAAAGATCCTGTAGTATCCACTGACTCCTTCCCTCAGGTCACACTTTCTTCACGACACATCTCATGATGAGCAATCTGGGCTGCCCTGCAGGTGGTGTCTTTGTACATATGCAGAGAAAAGCGAACCCAGGCTTGGACTTTTGTGGATC
/EEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA
AS:i:-36
XN:i:4
XM:i:8
XO:i:1
XG:i:3
NM:i:11
MD:Z:0C2C1T38N0N6N1N32C59
YT:Z:UU
CT:Z:TRANS
XX:Z:G1
NB551733:5:HFCHVBGXB:1:11101:1087:17754
179
chr7
138076682
42
150M
chr1
171895797
TGAGAGGGAGTCTTTTTGGAGGATTTGTTGGGACCTGCTCTGGTTTGGGTATGATTGGAGTGCAAGGTAGAATTTTGGTTCAGTGTTGTAAAAATGGAGGCCTAAGGCAGGTCACACATGATCATAGGACTGCAGGTCTCAGGGAGACTT
//EAAA/EEEEEEEEEEE<<EAAEE<<////E/</EE</EEEE/EEE/A</EA<E<E<6//EEAEA<<E/EE//E6/E<<AA/E/E/AEEAEEEE//A/EEEEEEE/EEEE/EEAEEEEEEEEEEEEEEEAA<EAEE/EEEEEEEAAAAA
AS:i:-7
XN:i:1
XM:i:XO:i:0
XG:i:0
NM:i:3
MD:Z:72G13G33N29
YT:Z:UU
CT:Z:TRANS
XX:Z:G1
NB551733:5:HFCHVBGXB:1:11101:1087:17754
179
chr7
138076682
42
150M
chr17
23701470
TGAGAGGGAGTCTTTTTGGAGGATTTGTTGGGACCTGCTCTGGTTTGGGTATGATTGGAGTGCAAGGTAGAATTTTGGTTCAGTGTTGTAAAAATGGAGGCCTAAGGCAGGTCACACATGATCATAGGACTGCAGGTCTCAGGGAGACTT
//EAAA/EEEEEEEEEEE<<EAAEE<<////E/</EE</EEEE/EEE/A</EA<E<E<6//EEAEA<<E/EE//E6/E<<AA/E/E/AEEAEEEE//A/EEEEEEE/EEEE/EEAEEEEEEEEEEEEEEEAA<EAEE/EEEEEEEAAAAA
AS:i:-7
XN:i:1
XM:i:XO:i:0
XG:i:0
NM:i:3
MD:Z:72G13G33N29
YT:Z:UU
CT:Z:TRANS
XX:Z:G1
I'm not sure what happened here…
- Log in to comment
I tried again to generate .chinput files following opinion in #33 as a guide.
“awk 'BEGIN{ OFS="\t" } { minRight=$13<$3?$13:$3; maxLeft=$12>$2?$12:$2; if($1==$11 && (minRight-maxLeft)/($3-$2)>=0.6){ print $4,$5,$6,$1,$2,$3,$7,$8,$10,$9,$11,$12,$13,$14,$15 } else { print $0 } }' ${samplename}/${bamname}_mappedToBaits.bedpe > ${samplename}/${bamname}_mappedToBaits_baitOnRight.bedpe
I have removed the last $15 from the print statement.”
So, I could generate .chinput files!
Thanks!