Wiki
Clone wikijgi-workflows / Home
Lessons Learned
These pages are for workflow developers; feel free to edit. Questions and support requests should be posted to the slack channel.
Setting up the Environment
- Shared workflows should use docker containers for all tasks. Workflows should reference specific container versions (not "latest" or default).
Trouble installing a package using conda install
I get error when trying to run
conda install -y -c bioconda bbmap
CondaMultiError: CondaVerificationError: The package for openjdk located at <mypath>/miniconda2/pkgs/openjdk-11.0.1-h01d97ff_1016 appears to be corrupted. The path 'lib/modules' specified in the package manifest cannot be found.
The solution is to clean out the old cached packages
# first try this which will remove index cache, lock files, unused cache packages, and tarballs conda clean --all # and if that fails, do this but be warned: # This will remove *all* writable package caches. This option is not included with the --all flag. # WARNING: This will break environments with packages installed using symlinks back to the package cache. conda clean --force-pkgs-dirs
WDL Files
Using ${} in bash commands (and you don't want it interpreted as a variable by cromwell).
- WDL doesn't have an escape for variable interpolation, so ${} is always interpolated by cromwell;
so if we want to change the file suffix the bash ${file/suffix-old/suffix-new} wont work because cromwell will try and replace ${} with some value. This example won't work:
command { fname="scaffolds.fasta" fname1="${fname/fasta/trim.fasta}" # doesn't work because cromwell tries to # interpret this instead of letting bash shell do it. }
You can use a Cromwell variable to hack around this behavior:
String dollar="$" command <<< fname='scaffolds.fasta' ${dollar}{fname/fasta/trim.fasta} <<<
Or use basename, perl, sed, or awk:
command <<< fname="scaffolds.fasta" fname1="${fname/fasta/trim.fasta}" # doesn't work fname2="`basename $fname .fasta`.trim.fasta" fname3=`echo "$fname" | perl -pe 's/fasta$/trim.fasta/'` fname4=`echo "$fname" | sed 's|fasta|trim.fasta|'` <<<
Using languages other than bash in the command stanza
example running python code. If you need to pass information between bash and python, you can use a "tmpfile" (shown here), and maybe you can do it some other way but I couldn't get it to work.
command { # you can run bash and python echo We can run bash too python <<CODE > tmpfile import os, sys, glob if os.path.isfile('${dbpath}') and glob.glob('${dbpath}' + '*.nin'): print('${dbpath}') else: fname = os.path.basename('${dbpath}') cmd = "${cmd}" + " -in %s 1>makeblastdb.log"%fname os.symlink('${dbpath}', fname) os.system(cmd) print(os.path.join(os.getcwd(), fname)) CODE cat tmpfile }
Create a hash & then access its contents
create the hash (map)
Map[String, String] outputName = { "refseq.mito": "refseq.mito", "refseq.bacteria": "refseq.bacteria", "assembly": "assembly.fasta" }
Accessing the contents
scatter (pair in outputName){ prefix=pair.left value=pair.right }
Testing if a file exists
You need to run a task that returns a boolean. Then use the boolean in a "if then" statement.
### in the workflow section: ### # test if file exists. This task returns a boolean. call if_file_exists { input: myfile=reads } # now you can make a decision if(if_file_exists.answer) { call sometask {} } ### in the task section ### task if_file_exists { File myfile command { if [[ -s ${myfile} ]]; then echo true else echo false fi } output { Boolean answer = read_boolean(stdout()) } }
Remove File Extensions
in a task do something like this
File reference = "DOE_UTEX.polished.t635masked.fasta" String reference_bname = basename(reference) String a = sub(reference_bname,"\\.\\w+$",""
Scatter within Scatter
Nested Scatters are not supported (yet) ... but you might try a sub workflow to achieve the same effect!
setting WDL variable depending on conditional
The problem is you can't have a variable with the same name being set to different things in WDL. The solution is to use a special function select_first.
if (caller == 'VARSCAN') { Array[File]? bams_varscan=gatkqc.gatk_bams } Array[File]? gatkBamList =select_first([bams_varscan, merge_qc_bam_files.merged_bam])
gatkqc.gatk_bams
and merge_qc_bam_files.merged_bam
need to be arrays.
* Also, note that the question marks are also required.
Subworkflows
- For shared/production workflows, all subworkflows should be added to this repository and referenced by URL (use jaws --list) to list available workflows and their URLs. You can also supply your own WDL via the jaws -f option and any referenced subworkflows must be in the same folder as the main workflow (symlinks OK).
Inputs JSON file
- JSON format (i.e. inputs file) uses double-quotes, not single-quotes!
Cromwell Specific Issues
###Cromwell caching behavior
- Outputs are reused if the inputs and task command are identical, so during development, if you make changes to a called script, Cromwell will not recognize that as a different version (unless you changed the filename or command-line parameters). To avoid caching, using the "jaws --rm" command to purge old results.
Updated