- edited description
Bowtie output as SAM and not BAM
Hello Eduardo,
I'm analyzing a big dataset of miRNA. I'm experiencing some trouble with disk space since Bowtie1 output is being created as SAM and not BAM. The strange thing is that files are named with *.bam extension. I don't know if this is a known bug or proposal, I report it as bug since since file extesions of bowtie are marked as BAM when they're not.
Do you have any idea on how I could append samtools command to bowtie command to output compressed BAM files.
Regards,
Comments (11)
-
reporter -
Hi, In fact it was the opposite. In the first releases miARma convert sam -> bam, but it was a very slow process, so we remove this step.
But you can "use" it by modifying the code a little. For example in ->lib/CbBio/RNASeq/ReadCount.pm you can uncomment the line 207 and those sam will be converted into bam. I also recommend you to previously rename bowtie1 output to use sam instead of bam. For this find the line 981 in the ->lib/CbBio/RNASeq/Aligner.pm (or search for my $output_file_bw in this file), and change the extension in the 2 lines below.
Let me know
-
reporter Hello Eduardo,
Line 981 of Aligner.pm seems to be comments about bowtie1 method, also the next two following lines.
I've modified the following line 1170 and seems to be working. I've tested it on Known miRNA pipeline.
$command="bowtie ".$bowtiepardef." ".$bowtieindex." ".$file." | samtools view -bS - > ".$output_file_bw;
This way output is created directly to BAM and you avoid making another conversion later. What do you think about this solution?
-
search for my $output_file_bw in this file.
This way output is created directly to BAM and you avoid making another conversion later. What do you think about this solution?
But the Readcount module will count twice (the sam and the bam)
-
reporter Found this in Aligner.pm module:
my $output_file_bw; if($adapter){ $output_file_bw=$projectdir.$output_dir.$output_file_final."_bw1.bam"; } else{ $output_file_bw=$projectdir.$output_dir.$output_file_final."_nat_bw1.bam"; }
Indeed the main problem for me to work with bowtie SAM output is disk usage piping the samtools command inside bowtie command aligner result is created only in BAM format and I save a lot of disk space. If I've to wait to all my samples to be mapped to be converted later to BAM I run out of disk space. With the solution I proposed only a BAM file is created, there's no SAM file output from bowtie. Why should Readcount module count twice then?
-
It was my fault, I did not read the command you proposed. If your problem is the disk space and you don't care about the processing time, you should also sort your reads: samtools view -bS $sam_file | samtools sort - $bam_file
-
reporter Appreciate your time. I'm also thinking about collapsing reads prior to alignment in a way mirDeep2 does. I think a collapsing reads module after read adapter trimming module would be a nice new feature at least for miRNA pipeline.
-
What do you mean with "collapse"?
E
-
reporter Something like this: https://seqcluster.readthedocs.io/collapse.html Or the collapse_reads.pl in https://www.mdc-berlin.de/content/mirdeep2-documentation
-
I'll take a look later because I don't see the point. If you save cutadapt output in fasta format, you lost the quality values, but cutadapt can also create trimmed fastq files (as we do in miARma) including quality values
-
reporter - changed status to resolved
Resolved edited Aligner.pm lines provided by Eduardo.
- Log in to comment