1. Johan Nystrom-Persson
  2. Friedrich

Overview

HTTPS SSH

Friedrich

Introduction

Friedrich is a framework for bioinformatics application development in Scala. It is especially well suited for heavy data processing in a flexible, experimental setting. A basic genome assembler, the first application built on Friedrich, is included.

A paper was presented at PRIB 2012:

Developed in collaboration between:

(Previously: Centre for Comparative Genomics, Murdoch University and NIBIO)

Installation

Friedrich depends on the following software:

  • Scala (>= 2.9.1)
  • Java (>= 1.6)
  • Prefuse (if you want to use the experimental GUI (currently not in the stable version), put this in a directory called lib/)

In addition, the following software is highly recommended for compiling Friedrich and launching the console:

  • SBT >= 0.11.2

Assuming that you choose to use SBT, you can compile Friedrich as follows. Launch SBT in the friedrich directory (where the file build.sbt resides). If it is the first time you launch SBT, it will download some libraries. Type 'compile'. If everything works, you should see SBT compiling. After you have compiled, you may type 'run (options)' to run the assembler with your supplied options, or 'console' to launch an interactive console with access to the Friedrich and Assembler classes.

GENOME ASSEMBLER

COMMAND LINE

In order to run the assembler on the command line, the following command can be used: Scala -J-Xmx4g -classpath target/scala-2.9.1/classes miniasm.Assembler (options)

This assumes that you compiled with sbt into the directory target/scala-2.9.1. The option -J-Xmx4g controls the maximum heap size of the JVM.

For a complete reference to command line options, users are encouraged to consult miniasm/util/ConfigReader.scala. In general, Miniasm (and all Friedrich applications) first look for a given option on the command line, and then consult the XML-based configuration in config.xml, if any exists. Finally, default values are used, if any.

The following options have no defaults and must be supplied:

-k          Kmer size (for example, 31, but much larger values are possible)
-input      Input file (fasta or fq format (.fa or .fq))
-minKmers   Minimum length of contigs to output, in number of kmers. For  example, 100.

The following options are optional:

-cutoff     Cutoff value for coverage. Kmers with less coverage than this are ignored. For example, 10.
-config     Configuration name (must be specified in config.xml). The default configuration is the one called "default".
-pipeline   Pipeline name. The default pipeline name is taken from the configuration being used.

XML CONFIGURATION

When the assembler starts, the file config.xml must be present. It contains a list of configurations and a list of pipelines. By default, the configuration called "default" will be used, and thus by default, the pipeline referenced by this configuration will also be used. The syntax should be self-explanatory. In pipelines, the ordering of phases is important, as phases are executed in the order listed. No other configuration elements are sensitive to ordering.

BASIC USAGE

A basic assembler invocation from the command line might look like this:

scala -J-Xmx4g miniasm.Assembler -input myReads.fq -k 47 -minKmers 100

This would read the "default" configuration from config.xml and assemble the reads from myReads.fq with k=47, outputting contigs that are at least 100 kmers long.

In order to perform the same assembly interactively, first a scala console should be launched. We recommend launching this using sbt. First, run sbt in the directory where Friedrich's build.sbt is located:

sbt

Then, type

console

This gives you access to the classes in Friedrich and miniasm, assuming everything compiles fine. You can then initialize a data object as follows:

import miniasm._
Assembler.initData("-input myReads.fq -k 47 -minKmers 100")

Assuming that the object returned by this statement was called res0, the default assembly pipeline may be launched using

Assembler.defaultPipeline(res0)

For smooth interactive use, it is a good idea to be familiar with the Friedrich/miniasm API. A Scaladoc documentation site is available (stable version, unstable version).

THE FRIEDRICH FRAMEWORK

DEVELOPING A FRIEDRICH APPLICATION

General Friedrich classes are located in the friedrich package. In order to make a minimal Friedrich application, it is necessary to extend PhaseData to define a data type that phases will operate on. BasicPhaseData can be used if no data is to be stored (an unlikely scenario).

It is also necessary to define at least one phase that operates on the data type. Phases extend the Phase trait.

ConfigReader can be extended to read configuration files if desired (see miniasm.util.ConfigReader for an example). If no configurability is desired, it is possible to use the BasicConfiguration class as a substitute.

Finally, a main application object should be created, which extends the FriedrichApp trait. The main function in this object can use defaultPipeline() to invoke a default pipeline (which must have been read from a configuration), or a pipeline object can be constructed by using Pipeline.fromPhaseNames and then run by invoking runPipeline on the resulting object.

Please consult the Scaladoc site (stable, unstable) for the details of the API.