Wiki

Clone wiki

anduril / Home

Welcome!

Introduction

Anduril is an open source component-based workflow framework for scientific data analysis developed at the Systems Biology of Drug Resistance in Cancer Laboratory, University of Helsinki.

Many common pre-processing and downstream analysis steps have been encapsulated in components coded in a variety of supported languages (R, Matlab, Python, Java). Components are organized in bundles dedicated to specific datasets (anima - image processing, flowand - flow cytometry, sequencing - omics data, tools - general, microarray) and are combined into pipelines in a Scala program run by the Anduril engine. Anduril constructs a graph and handles execution of the pipeline tasks in parallel while keeping track of the changes and status (pass/fail) of different steps to ensure reentrancy.

Installation

The easiest way to get started is using Docker, but Anduril can also be natively installed on Linux. Follow instructions below or from http://anduril.org to install Anduril and run your first pipeline.

Docker

If you have Docker installed then you use an Anduril image to run a shell with Anduril installed. The anduril/full image is ~13.5GB.

docker \
  run -ti --rm \
  anduril/full

anduril --help

Linux

The size of Anduril stable repository is 185M:

  • with tools 673M
  • with anima 736M
  • with microarray 979M
  • with sequencing 1.2G

Ubuntu 18.04

sudo apt-get install -y apt-transport-https gnupg ca-certificates
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
echo deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ | sudo tee /etc/apt/sources.list.d/r-project.list
sudo apt-get update
sudo apt-get install ant mercurial git default-jre default-jdk python-dev r-base-dev curl wget 
curl https://bootstrap.pypa.io/get-pip.py | sudo python
export ANDURIL_HOME=~/anduril
git clone https://bitbucket.org/anduril-dev/anduril --branch stable "$ANDURIL_HOME"
export PATH="$ANDURIL_HOME/bin:$PATH"
anduril build
anduril install tools

For image analysis also do:

anduril install anima
For NGS analysis install microarray and then sequencing:
anduril install microarray
anduril install sequencing

Ubuntu 16.04

sudo apt-get install ant mercurial git openjdk-8-jdk python r-base-dev
export ANDURIL_HOME=~/anduril
git clone https://bitbucket.org/anduril-dev/anduril --branch stable "$ANDURIL_HOME"
export PATH="$ANDURIL_HOME/bin:$PATH"
anduril build
anduril install tools

For image analysis also do:

anduril install anima
For NGS analysis install microarray and then sequencing:
anduril install microarray
anduril install sequencing

Other Linux

Install the following mandatory dependencies:

  • Java 7 or 8
  • Apache Ant
  • Mercurial
  • Git
  • Python 2.7
  • R 3.0+

Then follow the Ubuntu installation

Test installation

Simple tutorial pipelines (stored in your local folder "tutorial"):

mkdir tutorial
docker run -ti --rm \
    -e USER_ID=$( id -u ) \
    -v $( pwd )/tutorial:/tutorial \
    anduril/full
cd /tutorial
cp -r /anduril/doc/tutorial/examples/* .

# See the different tutorials:
ls

# Run for example CSV processing: 
cd 01-CSVProcessing
./csvProcessor.scala

# See the output files: 
ls result_csvProcessor/output

Sequencing bundle test pipeline:

mkdir analysis
docker run -ti --rm \
    -e USER_ID=$( id -u ) \
    -v $( pwd )/analysis:/analysis \
    anduril/full

cd /analysis
cp /anduril/bundles/sequencing/doc/test/test.scala  .
./test.scala

See the output files in analysis/result_test/output


To test an image analysis pipeline:

mkdir analysis
docker run -ti --rm \
    -e USER_ID=$( id -u ) \
    -v $( pwd )/analysis:/analysis \
    anduril/full

cd /analysis
cp /anduril/bundles/anima/doc/tutorial/analysis_FOSS.scala  .
./analysis_FOSS.scala

See the results in folder analysis/result_analysis_FOSS/output

Worked examples

More pipelines and worked examples or the sequencing bundle are available in the sequencing bundle wiki.

How to find components for my workflows

Check the component documentation. Any keyword can be used in the search function and it will return any components that include the word in its name or the component's documentation. For example, a search for "bam" will return any component with bam in the name like "Bam2Fastq" or STAR which is a component that produces alignments in bam format.

For examples on how to make workflows using these components and functions check the sequencing bundle wiki.

Update Anduril 1 pipelines to Anduril 2

We recommend that you change the extension of the code from .and to .scala to distinguish them, although this is not strictly necessary. Then follow the steps below.

1.- Add a hashbang.

#!/usr/bin/env anduril

The script starts with #!/usr/bin/env anduril, which indicates that the script is executed using the anduril program. Anduril workflow files are not standalone Scala programs and cannot be executed using the scala executable, although they are syntactically Scala code and can be edited using any Scala editor. Anduril takes your workflow definition as input, and uses it to construct, verify and execute a workflow.

2.- Import any bundles you are going to be using in your pipeline.

import anduril.builtin._
import anduril.microarray._
import anduril.sequencing._
import anduril.anima._
import anduril.tools._
import org.anduril.runtime._

3.- Define your pipeline as a Scala object

object myPipeline {
  val myCSV = INPUT(path="myFile.csv")
  val renameCol = CSVFilter(in=myCSV,
                            rename="Gene Name=geneName")
}
Basically insert your old code between the curly brackets: the object name { code }

4.- Add val/var accordingly before your component instances and variables.

Anduril 1 code:

renameCol = CSVFilter(csv=myCSV,
                      rename="Gene Name=geneName")

Anduril 2 code:

val renameCol = CSVFilter(in=myCSV,
                          rename="Gene Name=geneName")

5.- Finally check if any input/outputs have changed name (as in previous example, the input to CSVFilter has changed from csv to in). We have standardized the inputs and output names, so in most components with only one input and one output these are named in and out respectively. Parameters names have also been changed when there was an overlap in naming with an input/output. Please check the component documentation.

Whole pipeline:

#!/usr/bin/env anduril

import anduril.builtin._
import anduril.microarray._
import anduril.sequencing._
import anduril.anima._
import anduril.tools._
import org.anduril.runtime._

object myPipeline {
  val myCSV = INPUT(path="myFile.csv")
  val renameCol = CSVFilter(in=myCSV,
                            rename="Gene Name=geneName")
}

Updated