Can't work with multi samples

Issue #33 resolved
Livia Moura created an issue

Hello

I tried to run metabat2 with 5 samples (.bam). Every time I try to create a depth file with jgi it returns many warnings (WARNING: bam has improper refpos vs reflen, WARNING: bam has improper refpos + oplen) and it doesn't complete the task followed by segmentation failure or abortion.

To create this bam I already tried: bowtie2 default; bowtie2 --end-to-end; bwa mem defult; bwa mem MAPQ>20 ;

all my .bam files are sorted! (samtools sort bam). they aren't big. Its like 300~500 MB each. I work with 400GB ram. jgi can create a depth file when I work with 2 files (.bam). It breaks with more than that I can do a complete metabat analysis with individuals bams

Do you have any idea what's happening?

Att

Livia

Comments (27)

  1. Don Kang

    Hi Livia,

    Looks like you have some issues in bam files when you produced them using bowtie. There might be a way to work around it, but we need to investigate it first, so it would be the best if you could send us the bam files you had issues. Here is the ftp information:

    Username: ddkang@lbl.gov.59a6f52adb9c3

    Password: hGP3YCU1

    Upload Server: upload.nersc.gov

    Thanks!

  2. Susanne Kraemer

    Hello, I am running into a very similar problem (WARNING: bam has improper refpos vs reflen, WARNING: bam has improper refpos + oplen) using metabat and 12 bam files. The bam files are on average 15GB. The files care converted and sorted from sam files downloaded from JGI. I can run individual files without issue though.

    Any help is greatly appreciated, Susanne

  3. Rob Egan

    Hi Susanne, Can you provide a reference to the bam files that you are using? Some aligners do not output a file conforming perfectly to the SAM / BAM specifications and that is what this warning is suggesting. If it completes okay, you can choose to ignore the warnings (if it is just a few of them), but ideally I can find a way to work with the imperfectly formatted BAM files.

  4. Jeff Froula

    This is Jeff taking over for Rob. When I follow your link you provided, I don't see any sam or bam files. I don't think IMG keeps or even generates those files? could be wrong. Can you give a subset of the fastq files (1102106, 1102112 and 1102109 ) and the command to create your bam files? I can then run metabat to see if I can duplicate the problem.

    Also, check your link to and help me understand where the sam files should be. Thanks

  5. Susanne Kraemer

    Hello Jeff. Sorry for the confusion, I should have been more precise. I did not create the bam files myself from fastq files. If you select a sample from the ones under the link provided and select download it will take you to a page in which contains, among others a file called samplename.scaffolds.sam.gz (e.g. in the lower quarter of https://genome.jgi.doe.gov/pages/dynamicOrganismDownload.jsf?organism=ArcticOcean_MG_C_21), which is the file I downloaded. I expanded the files using gunzip and converted them to bam using the following code samtools view -bS /mnt/test/Artic_SAM/$prefix.scaffolds.sam | samtools sort - ~/Desktop/AO_sorted_bam/$prefix Thanks, Susanne

  6. Jeff Froula

    Hi Susanne,

    Still can't find it. I'm thinking that we're looking at different views, i.e. maybe IMG deleted it? Anyways, I've included a screenshot of what I see when I click the link you supplied above for ArcticOcean_MG_C_21. I only see *.scaffolds.cov which is a table of read coverage per scaffold; this tells me a sam file was created at one point.

    Obviously you had a sam file at one time but you may have to re-produced the ones that gave you trouble. I'm trying to find the IMG guy that ran the pipline that should have created this coverage file so maybe we can figure out what's going on.

    Please see if the sam files are still there.

    Also, could you send me the 3 troublesome sam files and I can run metabat to see if I can re-produce error. It's hard for me to debug this without re-producing error. Are you sure sam files were OK? To test them, maybe run samtools idxstats on the bam files to see if they produce expected output.

  7. Susanne Kraemer

    Hi Jeff,

    I can/t see your screen shot. I sent you one of the page as I see it, including the sam file. Maybe the difference is caused by me being logged in? I did recreate the three troublesome files and also checked their md5sums after download, but am still running into the same issue. I ran samtools idxstats and the output looked nominal, too. I'd be happy to send you my files, how can I do that? They are each about 15-17GB. I'd best send you some that are working as well as the non-working files. Also please note that all files work well on their own, but not in combination.

    screen_shot.jpeg

  8. Jeff Froula

    I see the SAM files now(didn't even notice that I hadn't opened that folder). Great. I'll try metabat and get back to you.

  9. Jeff Froula

    I downloaded 12 sams and the co-assembly from IMG and ran metabat2. I got 174 bins (attached) with no errors.

    what I did , i.e.

    create bams for each sam file separately.

    samtools view -F 0x04 -uS <sam> | samtools sort - <sam.sorted>

    create bins

    metabat -i 1102103.scaffolds.fasta -o bin *.bam

    result

    MetaBAT 2 (v2.12.1) using minContig 2500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, and maxEdges 200. 174 bins (2754432479 bases in total) formed.

    If you give me your email (I'm now seeing Susanne Kraemer <

  10. Susanne Kraemer

    Hi Jeff, I think I can only see part of your answer, the lower part is cut off. Might it be that the way you convert the sam files ('-F 0x04 -uS') makes a difference in the resulting sorted bam file?

  11. Susanne Kraemer

    I'm reconverting my sam files to see if I can solve this issue on my side as well. However, it would be great if you could send me your results from this run as well. Best, Susanne

  12. Susanne Kraemer

    I'm still getting the same error but now think that this might have to do with the creation of the depth file. Did you utilize jgi_summarize_bam_contig_depths to create it? Cheers, Susanne

  13. Jeff Froula

    I didn't need jgi_summarize_bam_contig_depths (ran exactly the commands I posted in Oct.14th email). In my case, which may for some reason be different than yours, I ran "metabat" (which for me is symlinked to "metabat2") then you only need to supply assembly and bam files. However, I looked at the documentation again and I see what you mean. It says to run jgi_summarize BEFORE metabat if you don't run "runMetaBat.sh".

    So we need to start from scratch. Please send me the exact commands you used to download metabat and run it. I need to reproduce your errors.

    I will try downloading from scratch and running everything to make sure I don't get errors.

  14. Susanne Kraemer

    Hi Jeff,

    thanks. It might be an issue of the bam files not being mapped to the same reference. We are looking into this issue and I'll let you know if this solves it.

    Best, Susanne

  15. Jeff Froula

    I gave you some bad info....If you just run (metabat -i <assembly> -o <outsuffix> *.bam) then you will get some bins but they will only use tetra-nucleotide info and not coverage. For example, this will also give you bins (metabat -i <assembly> -o <outsuffix>). So YOU DO NEED TO RUN runMetaBat.sh.

    Also, it looks like the sam files I pulled from IMG are the alignments from one library being mapped to the assembly of that library only and not to the co-assembly.

    I suggest that you take your reads from the 12 libraries and create your own bam files using the co-assembly. Then you know what you are getting. i.e. download bbtools and run bbmap.sh 12 times. The co-assembly I used was 1102103.scaffolds.fasta (14G). I'm not even sure this is really the co-assembly....but assuming.

  16. Rob Egan

    To add to what Jeff said, it definitely looks like the bam files that you downloaded from IMG are not all created with the same reference files. While the names of the entries are the same each of the files has a different scaffold0001. MetaBat requires that all bams are aligned to the exact same reference.

    regan@gpint200:~/workspace/MetaBAT> for i in 11021*.bam ; do samtools view -H $i | grep scaffold00001 ; done
    @SQ     SN:scaffold00001        LN:532811
    @SQ     SN:scaffold00001        LN:384962
    @SQ     SN:scaffold00001        LN:211466
    @SQ     SN:scaffold00001        LN:350957
    @SQ     SN:scaffold00001        LN:279399
    @SQ     SN:scaffold00001        LN:162416
    @SQ     SN:scaffold00001        LN:543103
    @SQ     SN:scaffold00001        LN:298157
    @SQ     SN:scaffold00001        LN:155325
    @SQ     SN:scaffold00001        LN:334158
    @SQ     SN:scaffold00001        LN:198408
    @SQ     SN:scaffold00001        LN:159803
    
  17. Bas E. Dutilh

    I had the same problem, and what you say makes sense.

    In my case it occurred because I was feeding BAM files to jgi_summarize_bam_contig_depths that contained different contigs as reference sequences, i.e. the BAM files came from mapping reads against different reference databases.

    The problem was resolved when I ran jgi_summarize_bam_contig_depths with only BAM files that were all derived from mapping against the same reference contigs. jgi_summarize_bam_contig_depths makes a table containing the read-mapping depth of every reference sequence (contig) in the BAM files, so if the contigs are inconsistent between the BAM files it can give this error.

  18. Rob Egan

    The code was supposed to validate that headers between the bam files are identical, but that check was not working properly. The latest version now fixes this bug and aborts with a clear message if the bam files were generated from different assemblies.

  19. huangjinqun

    Dear Rob Egan,

    It declared that the latest version of MetaBat2 fixed since v2.12.1.

    There are three samples in depth.txt from three sorted bamfiles using “jgi_summarize_bam_contig_depths”.

    I still get the “Segmentation fault“ error.

    I have uploaded the files on the git (as below) which maybe used for the test.

    Would you please do a faver for me? Any help from you will be greatly appreciated.

    Best,

    JinQun Huang

    https://huangjinqun@bitbucket.org/huangjinqun/jgi_summarize_bam_contig_depths-metabat2-segmentation-fault.git

    \$jgi_summarize_bam_contig_depths --outputDepth viral_3_binning/depth.txt viral_2_map_to_vf_filtered_ctg/bam/AF022C.sorted viral_2_map_to_vf_filtered_ctg/bam/AF025A.sorted viral_2_map_to_vf_filtered_ctg/bam/AF026A.sorted

    Output depth matrix to viral_3_binning/depth.txt

    Output matrix to viral_3_binning/depth.txt

    Opening 3 bams

    Consolidating headers

    Processing bam files

    Thread 1 finished: AF025A.sorted with 16480512 reads and 883 readsWellMapped

    Thread 0 finished: AF022C.sorted with 21715024 reads and 187 readsWellMapped

    Thread 2 finished: AF026A.sorted with 28251742 reads and 2297 readsWellMapped

    Creating depth matrix file: viral_3_binning/depth.txt

    Closing most bam files

    Closing last bam file

    Finished

    \$metabat2 -i viral_1_vf_filtered_contigs/vf_filtered_contig.fasta -a viral_3_binning/depth.txt -o ./ -m 1500

    MetaBAT 2 (v2.12.1) using minContig 1500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, and maxEdges 200.

    Segmentation fault

  20. Rob Egan

    Hi JinQun,

    So you are not running with the latest code from Bitbucket or docker; it should be 2.13-33-g236d20e

    MetaBAT: Metagenome Binning based on Abundance and Tetranucleotide frequency (version 2:v2.13-33-g236d20e; 2019-09-12T09:10:50)

    And if you want me to test on my machine with your data, you’ll need to make your git repo public.

    Cheers,

    Rob

  21. huangjinqun

    Dear Rob,

    Thanks for your reply.

    I have installed MetaBAT2(v2.13) from Bioconda(I do not see the version2.13 in this Bitbucket).

    But I got the same error. I have maked the data public( git clone https://huangjinqun@bitbucket.org/huangjinqun/jgi_summarize_bam_contig_depths-metabat2-segmentation-fault.git ).

    Thank you for your help.

    Best,

    JinQun Huang

    \$ metabat2 -i viral_1_vf_filtered_contigs/vf_filtered_contig.fasta -a viral_3_binning/depth.txt -o viral_3_binning/bins_dir/bin -m 1500

    MetaBAT 2 (v2.13 (Bioconda)) using minContig 1500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, and maxEdges 200.

    Segmentation fault

  22. Rob Egan

    Hi JinQun Huang,

    The data files are okay but the binning critera yielded nothing that was binnable.

    I added a check for 0 computed edges instead of throwing a segfault. Your data set is very small and short, so you may want to adjust the minimum contig length below 2500.

    Executing: 'metabat2 -v --inFile viral_1_vf_filtered_contigs/vf_filtered_contig.fasta --outFile vf_filtered_contig.fasta.metabat-bins-v-20191002_004459/bin --abdFile vf_filtered_contig.fasta.depth.txt' at Wed Oct 2 00:44:59 PDT 2019
    MetaBAT 2 (v2.14) using minContig 2500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, and maxEdges 200.
    [00:00:00] Executing with 32 threads
    [00:00:00] Parsing abundance file
    [00:00:00] Parsing assembly file
    [00:00:00] Number of large contigs >= 2500 are 3.
    [00:00:00] Reading abundance file
    [00:00:00] Finished reading 29 contigs and 3 coverages from vf_filtered_contig.fasta.depth.txt
    [00:00:00] Number of target contigs: 3 of large (>= 2500) and 26 of small ones (>=1000 & <2500).
    [00:00:00] Start TNF calculation. nobs = 3
    [00:00:00] Finished TNF calculation.
    [00:00:00] Finished Preparing TNF Graph Building [pTNF = 89.80]
    [00:00:00] Finished Building TNF Graph (3 edges) [-1.5Gb / 504.0Gb]
    There were 3 nodes and 0 edges -- insufficient to compute bins

    -Rob

  23. Log in to comment