InFusion: a toolkit for fusion gene and chimeric transcript detection from RNA-seq data. SUMMARY ------- InFusion can discover and analyze chimeric transcripts and fusion genes from the whole transcriptome sequencing data. It is written in C++ and Python. The software is free for academic use. For detailed information about code licencing please refer to the LICENSE file. InFusion uses SeqAn library 1.4 (http://www.seqan.de). Refer to documentation for more details: https://bitbucket.org/kokonech/infusion/wiki/ BUILDING: --------- 0) Install dependencies Compile time deps for InFusion: -CMake >= 2.8 -Boost >= 1.40 (probably will work with earlier versions) -gcc >= 4.4.3 (requires support of c++ 11) -zlib Runtime deps for InFusion: -Python >= 2.6 -Samtools >= 0.1.18 -Bowtie2 aligner >= v2.0.2 -glibc >= 2.14 Bowtie and Samtools are included in the prebuilt binary package. Additional dependencies (only required for running extra InFusion tools): -numpy 1.6.1 -matplotlib 1.1.1 -pandas 0.12.0 -HTSeq 0.5.3p9 -Biopython 1.58 1) Build code using command line External boost library dir should stated i.e. export BOOST_ROOT=/home/okonechn/tools/boost_1_51_0 It is adviced to perform an out of source build. mkdir -p build/Release cd build/Release cmake ../../src -DCMAKE_BUILD_TYPE=Release make After the project is built successfully, the directory with source code can be added to PATH. Note that InFusion main script will search for compiled binaries either in home directory of InFusion or in its subfolder build/Release. Additionally it is possible to path to compiled binaries using '--bin-dir' parameter. 2) Build code using QtCreator QtCreator (>= 2.5) can be used. Use QtCreator -> Open src/CMakeLists.txt In wizard select directory build/Release or build/Debug as your build destintation. By default InFusion is built in debug mode. To build release use CMake option -DCMAKE_BUILD_TYPE=Release. RUNNING: -------- First, it is required to create a reference dataset for your genome. Reference dataset includes: - reference genome sequence and its index; - reference transcrptome sequences and their index; - gene annotations in GTF format; - repeat regions (optional) Use script setup_reference_dataset.py to automatically create a reference dataset for human genome. This script allows to reuse existing data from your machine (such as reference genome or annotations) or downloads the data directly from Ensembl ftp. The simplest way to launch InFusion is the following: ./infusion -1 path_to_reads_1 -2 path_to_reads_2 [configuration_file_path] To get a complete list of available options use help command: ./infusion -h Certain options can be speicified both in configuration file and via command line. If an option is specified both in config and as command line parameter, value given in command line will be used rather than value from config file. RUNNING TESTS: -------------- Running tests can be performed with using difftest.py command. It is located in test dir. Example: ./difftest.py -t suite.minimal # Run minimal set of tests ./difftest.py -c find_breakpoint_candidates/tests.cfg # This runs a particular suite ACKNOLEDGEMENTS: ---------------- We are thankful to German Grekhov for testing the software. Special thanks to space aliens for not destroying Earth while the program was under the development.