rna-seq gold standard variant calling function

Issue #17 resolved
Alejandra Cervera created an issue

Amjad: feel free to reassign it to someone else

follow gatk gold standards for variant calling for making this function, here is already some code for it in anduril1, but check the gatk documentation as well

reference = INPUT(path="/opt/share/annotation/human-ensembl38/genome.fa")

for r:std.iterArray(bams.in) {

regroup = BashEvaluate( var1    = bams.in[r.key],
                        script  = "java -Xmx4096m -jar /mnt/csc-gc5/opt/picard/picard.jar AddOrReplaceReadGroups I=@var1@ O=@optOut1@ RGID=r.key RGLB=paired RGSM=r.key RGPL=ILLUMINA RGPU=ILLUMINA",
                        @name   = "regroup_"+r.key)

@out.optOut1.filename = "dedupped.bam"
dup = BashEvaluate( var1    = regroup.optOut1,
                    script  = "java -Xmx4096m -jar /mnt/csc-gc5/opt/picard/picard.jar MarkDuplicates I=@var1@ O=@optOut1@ CREATE_INDEX=true VALIDATION_STRINGENCY=SILENT M=@optOut2@",
                    @name   = "marked_"+r.key)

@out.optOut1.filename = "split.bam"
splitNtrim = BashEvaluate(  var1    = dup.optOut1,
                            var2    = reference,
                            script  ="""export _JAVA_OPTIONS=-Djava.io.tmpdir=@folder1@
                                        java -jar /opt/share/gatk-3.5/GenomeAnalysisTK.jar -T SplitNCigarReads -R @var2@ -I @var1@ -o @optOut1@ -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS --fix_misencoded_quality_scores -fixMisencodedQuals""",
                            @name   = "splitNtrim_"+r.key,
                            @execute= "once")

@out.optOut1.filename = "calls.vcf"
hapCalls = BashEvaluate(var1    = splitNtrim.optOut1,
                        var2    = reference,
                        script  ="""java -jar /opt/share/gatk-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R @var2@ -I @var1@ -dontUseSoftClippedBases -stand_call_conf 20.0 -stand_emit_conf 20.0 -o @optOut1@""",
                        @name   = "hapCalls_"+r.key)

@out.optOut1.filename = "filtered.vcf"
filtered = BashEvaluate(var1    = hapCalls.optOut1,
                        var2    = reference,
                        script  ="""java -jar /opt/share/gatk-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R @var2@ -V @var1@ -window 35 -cluster 3 -filterName FS -filter "FS > 30.0" -filterName QD -filter "QD < 2.0" -o @optOut1@""",
                        @name   = "filtered_"+r.key)

}

Comments (1)

  1. Amjad Alkodsi

    created PrepRnaBam function that should do all preprocessing. Variant calling should be separate since it can be done in several different ways

  2. Log in to comment