HTTPS SSH
InFusion: a toolkit for fusion gene and chimeric transcript detection 
from RNA-seq data.

SUMMARY
-------

InFusion can discover and analyze chimeric transcripts and fusion genes 
from the whole transcriptome sequencing data.

It is written in C++ and Python. 

The software is free for academic use. 
For detailed information about code licencing please refer to the LICENSE file.

InFusion uses SeqAn library 1.4 (http://www.seqan.de).

Refer to documentation for more details:
https://bitbucket.org/kokonech/infusion/wiki/


BUILDING:
---------

0) Install dependencies

Compile time deps for InFusion:

-CMake >= 2.8
-Boost >= 1.40 (probably will work with earlier versions)
-gcc >= 4.4.3  (requires support of c++ 11)
-zlib 

Runtime deps for InFusion:
-Python >= 2.6
-Samtools >= 0.1.18
-Bowtie2 aligner >= v2.0.2
-glibc >= 2.14

Bowtie and Samtools are included in the prebuilt binary package.

Additional dependencies (only required for running extra InFusion tools):

-numpy 1.6.1
-matplotlib 1.1.1
-pandas 0.12.0
-HTSeq 0.5.3p9
-Biopython 1.58

1) Build code using command line 

External boost library dir should stated i.e.
export BOOST_ROOT=/home/okonechn/tools/boost_1_51_0


It is adviced to perform an out of source build.

mkdir -p build/Release
cd build/Release 
cmake ../../src -DCMAKE_BUILD_TYPE=Release
make

After the project is built successfully, the directory with source code can
be added to PATH.

Note that InFusion main script will search for compiled binaries either in 
home directory of InFusion or in its subfolder build/Release. Additionally it  
is possible to path to compiled binaries using '--bin-dir' parameter.

2) Build code using QtCreator

QtCreator (>= 2.5) can be used.
Use QtCreator -> Open src/CMakeLists.txt

In wizard select directory build/Release or build/Debug as your 
build destintation. By default InFusion is built in debug mode.
To build release use CMake option -DCMAKE_BUILD_TYPE=Release.

RUNNING:
--------

First, it is required to create a reference dataset for your genome. 

Reference dataset includes: 
- reference genome sequence and its index; 
- reference transcrptome sequences and their index;
- gene annotations in GTF format;
- repeat regions (optional)

Use script setup_reference_dataset.py to automatically create a reference
dataset for human genome. This script  allows to reuse existing data from
your machine (such as reference genome or annotations) or downloads the data
directly from Ensembl ftp.

The simplest way to launch InFusion is the following:
./infusion -1 path_to_reads_1 -2 path_to_reads_2 [configuration_file_path]

To get a complete list of available options use help command:
./infusion -h

Certain options can be speicified both in configuration file and via command 
line. If an option is specified both in config and as command line parameter, 
value given in command line will be used rather than value from config file.

RUNNING TESTS:
--------------

Running tests can be performed with using difftest.py command.
It is located in test dir. 
Example:

./difftest.py -t suite.minimal # Run minimal set of tests

./difftest.py -c find_breakpoint_candidates/tests.cfg # This runs a particular suite


ACKNOLEDGEMENTS:
----------------

We are thankful to German Grekhov for testing the software.


Special thanks to space aliens for not destroying Earth while the program 
was under the development.