STAR two pass function

Issue #7 resolved
Alejandra Cervera created an issue

Katherine, Kaiyang and Alejandra know how to do this, ask them if you want to do this. Basically you need to call Align with STAR once, then join the splice junctions, redo the genome, and then call Align with STAR again.

Here you can see an example of a two pass alignment with STAR in anduril1, try to recreate in anduril2

annotation = INPUT(path="/opt/share/annotation/human-ensembl38/Homo_sapiens.GRCh38.80.gtf") reference = INPUT(path="/opt/share/annotation/human-ensembl38/genome.fa") reads=INPUT(path="/mnt/storageBig6/execution-folder/cerverat/TCGA_Ovarian/qc-_reads_combiner1/array/") mates=INPUT(path="/mnt/storageBig6/execution-folder/cerverat/TCGA_Ovarian/qc-_mates_combiner1/array/") STARparameters=INPUT(path="/mnt/csc-gc5/opt/STAR-master/source/parametersDefault")

genome = STARGenome(genomeFasta = reference, annotation = annotation, genomeParameters= "--sjdbOverhang 64", threads = 7, @cpu = 7)

splices={} for r:std.iterArray(reads.in) {

firstPass = Align( genome  = genome.genome,
                    reads   = reads.in[r.key],
                    mate    = mates.in[r.key],
                    parameters  = STARparameters,
                    aligner     = "star",
                    options     = "--readFilesCommand zcat --outSAMtype BAM Unsorted",
                    threads     = 7,
                    @name       = "firstPass_"+r.key,
                    @cpu        = 7)
canonical = CSVFilter(  csv             = firstPass.spliceJunctions,
                        regexp          = "Chromosome=[1-9]+|[X]|[Y]",
                        lowBound        = "UniqueMapping=5",
                        includeColumns  ="Chromosome,Start,End,Strand",
                        @name           = "canonical_"+r.key)

splices[r.key] = canonical

}

joinSplices = CSVJoin(array = splices, useKeys = false)

cleanSplices = CSVCleaner(original = joinSplices, skipQuotes = "*")

makeGenome = STARGenome(genomeFasta = reference, spliceJunctions = cleanSplices, annotation = annotation, genomeParameters= "--sjdbOverhang 64 --limitSjdbInsertNsj 7951343 --genomeChrBinNbits 14 --limitGenomeGenerateRAM 80000000000", threads = 7, @cpu = 7)

alignments = {} for r:std.iterArray(reads.in) {

secondPass = Align( genome  = makeGenome.genome,
                    reads   = reads.in[r.key],
                    mate    = mates.in[r.key],
                    parameters  = STARparameters,
                    aligner     = "star",
                    mainAlignmentType = "toTranscriptome",
                    options     = "--readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outReadsUnmapped Fastx --quantMode TranscriptomeSAM GeneCounts --quantTranscriptomeBan Singleend --limitBAMsortRAM 20000000000",
                    threads     = 8,
                    @name       = "secondPass_"+r.key,
                    @cpu        = 8)

// @memory = 27000)

alignments[r.key] = secondPass.alignment

}

Comments (5)

  1. Julia Casado

    The pipeline to develop and test the function (with TODO comments of parts missing) is at: /home/casado/Documents/Syncthing/Labsync/projects/CoCa16/repos/A2Functions/star2pass.scala

  2. Log in to comment