HTTPS SSH

Software for Polymorphism Identification Regulating Expression (SPIRE) is a pipeline for quantitative trait locus analysis using expression data as phenotypes (eQTL). It provides a framework for analyzing eQTLs by using different univariate and multivariate methods and integrating useful tools for the pre- and the post-analysis of the results. SPIRE is optimized for multithreading and distributed computing to cope with Next-Generation Sequencing data analysis.

Below an example of analysis performed on two strain of C. elegans.

For the full documentation please refer to documentation_full_spire_right.pdf.


Prerequisites

  • Perl
  • Python
  • R (required modules: Methods, MASS, nnet, limma, car, edgeR, BiocGenerics, Biobase, affy, gplots, qtl, eqtl, snow, amap, randomForest, foreach, doSNOW, snow, maptools, RcolorBrewer, iBMQ, PEER)

Download and Installation

The software is available at Bitbucket (https://bitbucket.org/bereste/spire).

The easiest way to obtain the program is to clone the git repository:

$ git clone git@bitbucket.org:bereste/spire.git

To be able to access the pipeline globally add the bin/ directory to your PATH variable:

### SPIRE
export PATH=$PATH:<path-to-spire>/bin/

List of necessary software and modules which are already included inside of the archive:

  • far2 (Flexible Adapter Remover)
  • fastx (read filtering)
  • microRazorS (read mapping, 64Bit version included)
  • RazorS3.2 (read mapping, precompiled 64Bit version included)
  • Following parts of the mirDeep software package
  • fastq2fasta.pl (Converting fastq to fasta format)
  • collapse_reads.pl (Collapses reads in the fasta file to make each sequence entry unique)
  • PEER (Data decomposition, normalization method)

Edit your ~/.bashrc file in the following way:

### SPIRE
export PATH=$PATH:<path-to-spire>/bin/
### Sortware for SPIRE
export PATH=$PATH:<path-to-spire>/spec_bin/micro_razorS/
export PATH=$PATH:<path-to-spire>/spec_bin/fastx/bin/
export PATH=$PATH:<path-to-spire>/spec_bin/far2/
export PATH=$PATH:<path-to-spire>/spec_bin/fastq2fasta/
export PATH=$PATH:<path-to-spire>/spec_bin/collapse_reads/
## far2 setting the path for libtbb.so.2
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path-to-spire>/spec_bin/far2/

Execution

All of the scripts are located in the <path-to-spire>/bin/ folder.

The easiest way to execute SPIRE is to run the bash script spire_main.sh (in the bin/ folder).

$ spire_main.sh

(IMPORTANT) Every run requires an options file.

An example file (options_file_base.txt) with the default values is located in <path-to-spire>/. The options file contains all of the settings and you should modify them depending on which module SPIRE you want to run.

It is possible to specify the following options:

  • -h Show help message.
  • -x (!) Define the path to the config file. Please make sure, that all of the options are set correctly.

  • -m (SPECIAL) Use two libraries. Use two two closely related sets on genes instead of one (e.g. two different strains of one species). In this case the read mapping will be performed for both of the libraries separately, thus producing individual expression values for both libraries. Further options can be set in the configuration file.
  • -k (SPECIAL) Two gene libs. Preselections already done. If you have two gene libraries (-m): if the preselection of the expression values for all the genes from the two libraries has already been done, provide the file with the results.

  • -n (SPECIAL) Selective Normalization. Exclude one or more short-read files from the normalization (can be defined in the config file).

  • -o (SPECIAL) SNP Tagging. Select TagSNPs by clustering the genotype markers on each chromosome depending on the genotype across the samples. Speeds up the eQTL mapping.

  • -s (SPECIAL) Separate chromosomes. The genotype data will be processed one chromosome at a time. This will speed up the R processing of the data and the eQTL calculation.

  • -v (SPECIAL) Pre-calculated R-Object. If you already have a pre-calculated R-object (R/QTL package) with the dataset you can use it for the eQTL mapping.

  • -r (SPECIAL) RIL. Select if your dataset consists of Recombinant Inbred Lines (RIL).

  • -w (SPECIAL) Gene selection for eQTL. If you only need certain gene name tested for eQTL, supply a list with the names: one name, one line.


  • -t (DEBUGING) In case clustering was already pre-run, skip the clustering, but use the already generated output and the defined R object.
  • -u (DEBUGING) multivariate RF method: in case clustering was already pre-run use the already generated. Provide the R object with the values for the p_val correction.

  • -a (PREPROCESSING) Read filtering. Read quality filtering (fastx). Adapter clipping (far2). All detailed options can be set in the options file.
    Necessary input data: Expression data.

  • -b (PREPROCESSING) Barcode splitting. Multiplexed reads are separated according to their respective barcode (requires -a).
    Necessary input data: Expression data.


  • -d (MAIN) Read coverage. Generates a read coverage file for all genes present in the given library.
    Necessary input data: Expression data, Gene Sequences.

  • -e (MAIN) Normalization. Normalization of the read count.
    Necessary input data: Expression data.

  • -f (MAIN) Further Statistical plots (e.g. MA_plot).

  • -g (MAIN) eQTL analysis/mapping. (Please adjust the settings for this function inside the options file).
    Necessary input data: Normalized expression data, Gene Annotations, Genotypes.

  • -q (MAIN) External methods. Include external eQTL-mapping methods (see below).

  • -i (MAIN) The expression of Target genes of miRNAs is plotted (non-generic).

  • -j (MAIN) Hotspot. Calculation of hotspots of significant QTLs. Furthermore cis/trans classification of found QTLs is calculated. Necessary input data: eQTL mappinf results, Gene annotations.


  • -q Include external eQTL mapping methods.
    The pipeline can use a method provided externally by the user for the eQTL mapping. To use it follow the following steps:

    • A bash (.sh) script file must be supplied and has to be present in the script_dir/ folder.

    • In the script_dir/ the following file has to be present: external_eqtl_methods.txt.

    • There the script will parse for the name of the algorithm file and an algorithm name. The format is the following (Tab separated, each line corresponds to one algorithm):

      script_name1.sh    <name-of-algorithm1>
      script_name2.sh    <name-of-algorithm2>
      .
      .
      .
      script_name_n.sh   <name-of-algorithmN>
      

    The methods will than be executed.


Output

At each start of SPIRE a tmp folder with a time stamp is created (e.g. output_dir_tmp_2013−12−05−155531/). The intermediate files are stored there and can be accessed for debugging reasons. Main output files are stored inside of the directory, which was specified by the user inside of the options file. In your output folder you can find a file named as follows: output_process_{time_stamp}.txt documenting the working process of spire. The timestamp corresponds to the one in the name of the temporary folder.


Examples

Here is an example command to run SPIRE with all of the modules enabled and with SNP tagging via clustering enabled (-o). -r is required, as the example dataset under consideration is derived from recombinant inbred lines (RIL). The option file is options.txt.

spire_main.sh -x options.txt -o -r -a -b -d -e -f -g -j

The folder test_dataset/ contains some example datasets.


Contacts

  • Ivan Kel (ivan.kel@itb.cnr.it)
    Institute for Biomedical Technology – National Research Council of Italy.
  • Ivan Merelli (ivan.merelli@itb.cnr.it)
    Institute for Biomedical Technology – National Research Council of Italy.