Bowtie output as SAM and not BAM

Issue #111 resolved
Guillermo Marco Puche created an issue

Hello Eduardo,

I'm analyzing a big dataset of miRNA. I'm experiencing some trouble with disk space since Bowtie1 output is being created as SAM and not BAM. The strange thing is that files are named with *.bam extension. I don't know if this is a known bug or proposal, I report it as bug since since file extesions of bowtie are marked as BAM when they're not.

Do you have any idea on how I could append samtools command to bowtie command to output compressed BAM files.

Regards,

Comments (11)

  1. Eduardo Andres Leon

    Hi, In fact it was the opposite. In the first releases miARma convert sam -> bam, but it was a very slow process, so we remove this step.

    But you can "use" it by modifying the code a little. For example in ->lib/CbBio/RNASeq/ReadCount.pm you can uncomment the line 207 and those sam will be converted into bam. I also recommend you to previously rename bowtie1 output to use sam instead of bam. For this find the line 981 in the ->lib/CbBio/RNASeq/Aligner.pm (or search for my $output_file_bw in this file), and change the extension in the 2 lines below.

    Let me know

  2. Guillermo Marco Puche reporter

    Hello Eduardo,

    Line 981 of Aligner.pm seems to be comments about bowtie1 method, also the next two following lines.

    I've modified the following line 1170 and seems to be working. I've tested it on Known miRNA pipeline.

    $command="bowtie ".$bowtiepardef." ".$bowtieindex." ".$file." | samtools view -bS - > ".$output_file_bw;
    

    This way output is created directly to BAM and you avoid making another conversion later. What do you think about this solution?

  3. Eduardo Andres Leon

    search for my $output_file_bw in this file.

    This way output is created directly to BAM and you avoid making another conversion later. What do you think about this solution?

    But the Readcount module will count twice (the sam and the bam)

  4. Guillermo Marco Puche reporter

    Found this in Aligner.pm module:

            my $output_file_bw;
            if($adapter){
                $output_file_bw=$projectdir.$output_dir.$output_file_final."_bw1.bam";
            }
            else{
                $output_file_bw=$projectdir.$output_dir.$output_file_final."_nat_bw1.bam";
            }
    

    Indeed the main problem for me to work with bowtie SAM output is disk usage piping the samtools command inside bowtie command aligner result is created only in BAM format and I save a lot of disk space. If I've to wait to all my samples to be mapped to be converted later to BAM I run out of disk space. With the solution I proposed only a BAM file is created, there's no SAM file output from bowtie. Why should Readcount module count twice then?

  5. Eduardo Andres Leon

    It was my fault, I did not read the command you proposed. If your problem is the disk space and you don't care about the processing time, you should also sort your reads: samtools view -bS $sam_file | samtools sort - $bam_file

  6. Guillermo Marco Puche reporter

    Appreciate your time. I'm also thinking about collapsing reads prior to alignment in a way mirDeep2 does. I think a collapsing reads module after read adapter trimming module would be a nice new feature at least for miRNA pipeline.

  7. Eduardo Andres Leon

    I'll take a look later because I don't see the point. If you save cutadapt output in fasta format, you lost the quality values, but cutadapt can also create trimmed fastq files (as we do in miARma) including quality values

  8. Log in to comment