Wiki

Clone wiki

RGAugury / Home

#Table contents [TOC] # Welcome

Welcome to our RGAugury wiki page! please follow below steps to install and configure your environment variables for command line or web version. Please make sure that command line works well prior to the installation of web version.

Requisitions

If all the software, as well as required Perl or Python modules, were installed for webUI purpose, please keep in mind all of them should be installed to a directory that all users have privileges to access to.

Essential software

Make sure below programs are correctly installed according to their installation manual and set up for environment variables in .bashrc or .bash_profile. All these programs or scripts are able to be downloaded by clicking corresponding links.

  • BLAST+ package download the file ending with "x64-linux.tar.gz" extension

  • Hmmer3 install Hmmer prior to pfam_scan package

  • pfam_scan package, make sure pfam_scan.pl can directly run from anywhere without adding path prefix. Check this link for easier dependency installation.

  • phobius1.01 packages, this is a 32bit program, you need to make sure the 64bit Linux Operation System has installed 32bit runtime (libstdc++6:i386) to load it. Refer to this thread for further help.

  • ncoils package has been embedded in this package, given that a minor modification in source code, making it adatp to the pipeline, thus we don't hope you use original one.

  • git is optional for you to directly clone our repository. We highly suggest you to use git to clone this repository in that the files' permission can be kept in right way.

  • jdk, JDK 1.8 is a requisite component when using InterproScan over v57.

  • interproscan, a HMM based domain/motif identification package

  • CViT, a genomic linkage feature visualization tools package based on Perl. Be sure all required perl modules have been successfully installed and no error reported when using CViT independent of RGAugury.

Library

Prior to installation of GD modules, you might need to install below libraries first.

Modules

RGAugury dependency

  • Log::Log4perl to log progressing status. Use command "cpan install Log::Log4perl" to install

  • GD graphic library.

CViT needs below modules:

Pfam_scan.pl needs below module:

  • Moose this is an essential module for pfam_scan package, see Pfam_scan's README to install. Following this guide for easier install. Or use command "cpan install Moose".

  • bioperl install BioPerl core via CPAN or its official website.

Check above installed software and programs and make sure all of them have been correctly setup the owner and file permission.

.bashrc or .bash_profile configuration

Below is a example how I setup my environment variables from scratch in a clean Ubuntu 14.04/16.04 LTS, user should change path correspondingly.

  export PATH=$PATH:/home/lipch/bin/phobius1.01  # to specify the path of phobius.pl script and binary.

  export PATH=$PATH:/home/lipch/bin/hmmer3/bin   # binary path

  export PATH=$PATH:/home/lipch/bin/blast/bin    # binary path of blast+ package

  export PATH=$PATH:/home/lipch/RGAugury_pipeline  # this package scripts path

  export PATH=$PATH:/home/lipch/RGAugury_pipeline/coils  #the path to scoils-ht, which is a modified version of coils to adapt to RGAugury pipeline.

  export PATH=$PATH:/home/lipch/database/interproscan-x.xx-xx.0    #download latest one as your wish. Do not add the path of "bin" under interproscan directory.

  export PATH=$PATH:/home/lipch/Downloads/PfamScan    #to specify the path for script of pfam_scan.pl

  export PATH=$PATH:/home/lipch/bin/cvit.1.2.1        #to specify the path of cvit.pl in CViT package, make sure cvit.pl can be found by 'which' command.

  export COILSDIR=/home/lipch/RGAugury_pipeline/coils # or create a plain file with putting this command only but a directory all user can access and drop it to /etc/profile.d/, file permission changes to 755, otherwise export it to user's profile and point to another user authorized directory

  export PERL5LIB=/home/lipch/Downloads/PfamScan:$PERL5LIB  #perl module for pfam_scan.pl

  export PFAMDB=/home/lipch/database/pfamdb           #to specifiy the hmm pfam-A/B DB path

interproscan.properties configuration

Due to the parallel modification on Tools.pm, thus we need to change the worker number of interproscan to 1, which will avoid the panic of RAM. Be aware of that we only optimized for regular workstation with multile thread supported, if you want to take advantate of grid, please refer to corresponding interproscan manual.

number.of.embedded.workers=1
maxnumber.of.embedded.workers=1

Installation of RGAugury pipeline

Download this pipeline by trying below command under Linux system if GIT was installed.

git clone https://bitbucket.org/yaanlpc/rgaugury.git

Before running pipeline, make sure all Perl scripts files permission are modified to 755, in directory of RGAugury:

chmod 755 *.pl

under directory of coils, try:

chmod 755 scoils-ht

database

  • pfam Follow the installation guide of pfam_scan package["Download Pfam data files" section] to prepare binary files by using 'Pfam-A.hmm'. Make sure put all files under directory of /home/user_ID_to_be_replaced_by_yours/database/pfam/, because this path has been hard coded in our scripts. Alternatively, make sure pfam folder is consisted with setting of $pfam_index_folder in RGAugury.pl

  • RGADB, RGADB has been embedded in this package. Be sure to keep its location without any change.

  • panther, if panther db will be used in either command line or web UI, be sure install it correctly according to instruction of interproscan package, meanwhile, configuration file of interproscan might need proper modification.

File formats requirements

FASTA format for protein and DNA

A typical file in fasta format is a text-based protein or DNA sequence file, which usually starts with symbol ">" for each unique accession number, in RGAugury package, gene ID is usually used as header. see an example at below, to decrease the file parsing error, no other supplementary info in header is included apart from accession number:

>AT1G52660.1
MGKDFKSLVTRCIYVGKMNDNAKKLKIATEELKDLGNNVMKRVKLCEEQQQMKRLDKVQTWLRQADTVIKEAEEYFLMSSSSSSSGLISSSHKMEKKICKKLKEVQEIKSRGMFEVVAESTGGIGGGAGGGLTIKDSDEQTIGLEAVSGLVWRCLTMENTGIIGLYGVEGVGKTTVLTQVNNRLLQQKANGFDFVLWVFVSKNLNLQKIQDTIREKIGFLDRTWTSKSEEEKAAKIFEILSKRRFALFLDDVWEKVDLVKAGVPPPDAQNRSKIVFTTCSEEVCKEMSAQTKIKVEKLAWERAWDLFKKNVGEDTIKSHPDIAKVAQEVAARCDGLPLALVTIGRAMASKKTPQEWRDALYILSNSPPNFSVLKLLDRN

gff3

gff3 file was abbreviated from generic feature format, which had fixed number of columns and defined gene structure and their coordination in genome aspect. Below is a gene definition example for major splicing form of gene AT1G01010.1 and AT1G01020.1

All columns in gff file are delimited by tabs. Prior to submitting to RGAugury, gff3 needs a pre-processing or standarizaton for better parsing, like sorting by column chr (column 1) in ascending option, and followed by gene start (column 4) and end (column 5) afterwards in ascending option too regardless of gene strands.

If you want RGAugury create a genomic RGA distribution figure, please do follow the nomenclature for column 1: 'Chr' + digital number. such as Chr1 or Chr01. The other format of column 1 like scaffold_1234 or scaffold20:20-4500 won't get genomic RGA distribution figure.

Column 9 is the gene accession number, which must start with "ID=" and ending with semi-comma, gene accession number in this column should 100% match with protein or DNA fasta gene accession number within fasta file.

So far, only four feature are allowed in column 3 (gene feature): mRNA, UTR, CDS and exon. Each gene MUST have a mRNA row.

Chr1    phytozomev10    mRNA    3631    5899    .   +   .   ID=AT1G01010.1;
Chr1    phytozomev10    UTR 3631    3759    .   +   .   ID=AT1G01010.1;
Chr1    phytozomev10    CDS 3760    3913    .   +   0   ID=AT1G01010.1;
Chr1    phytozomev10    CDS 3996    4276    .   +   2   ID=AT1G01010.1;
Chr1    phytozomev10    CDS 4486    4605    .   +   0   ID=AT1G01010.1;
Chr1    phytozomev10    CDS 4706    5095    .   +   0   ID=AT1G01010.1;
Chr1    phytozomev10    CDS 5174    5326    .   +   0   ID=AT1G01010.1;
Chr1    phytozomev10    CDS 5439    5630    .   +   0   ID=AT1G01010.1;
Chr1    phytozomev10    UTR 5631    5899    .   +   .   ID=AT1G01010.1;
Chr1    phytozomev10    mRNA    5928    8737    .   -   .   ID=AT1G01020.1;
Chr1    phytozomev10    UTR 5928    6263    .   -   .   ID=AT1G01020.1;
Chr1    phytozomev10    UTR 6437    6914    .   -   .   ID=AT1G01020.1;
Chr1    phytozomev10    CDS 6915    7069    .   -   2   ID=AT1G01020.1;
Chr1    phytozomev10    CDS 7157    7232    .   -   0   ID=AT1G01020.1;
Chr1    phytozomev10    CDS 7384    7450    .   -   1   ID=AT1G01020.1;
Chr1    phytozomev10    CDS 7564    7649    .   -   0   ID=AT1G01020.1;
Chr1    phytozomev10    CDS 7762    7835    .   -   2   ID=AT1G01020.1;
Chr1    phytozomev10    CDS 7942    7987    .   -   0   ID=AT1G01020.1;
Chr1    phytozomev10    CDS 8236    8325    .   -   0   ID=AT1G01020.1;
Chr1    phytozomev10    CDS 8417    8464    .   -   0   ID=AT1G01020.1;
Chr1    phytozomev10    CDS 8571    8666    .   -   0   ID=AT1G01020.1;
Chr1    phytozomev10    UTR 8667    8737    .   -   .   ID=AT1G01020.1;

Usage

main script RGAugury.pl has six options, but only input file is mandatory to be specified in command line, make sure fasta file's seq title has only no-space gene ID. Export the RGAugury directory PATH to ENV variable.

Scripts: Resistance Gene Analogs (RGAs) prediction pipeline

 Programmed by Pingchuan Li @ AAFC - Dr. Frank You Lab

Usage :perl RGAugury.pl <options>

arguments: 

        -p           protein fasta file
        -n           corresponding cDNA/CDS nucleotide for -p   (optional)
        -g           genome file in fasta format   (optional)
        -gff         a modified gff3-like file, see below format  (optional)
        -c           cpu or threads number, default = 2
        -pfx         prefix for filename, useful for multiple speices input in same folder   (optional)

Web UI installation introduction

Deploy website on Apache

Please do following configuration from Ubuntu console.

####Step 1: Install Apahce and mod_wsgi and allow the ports.

 ~$ sudo apt-get update
 ~$ sudo apt-get install apache2
 ~$ sudo apt-get install libapache2-mod-wsgi

open "http://localhost:80", if the default apache page cannot be displayed, then execute command

~$ sudo ufw allow in "Apache Full"

####Step 2: Copy and change permission

1 . create a rgaugury directory under /var/www

~$ sudo mkdir /var/www/rgaugury  

2 . Copy the webUI folder to /var/www/rgaugury/

3 . Change ownership for example

~$ sudo chown -R www-data:www-data rgaugury/

####Step 3: Configure the website

1 . Create 'rga.conf' file under '/etc/apache2/sites-available' sample content is as follows, correspondingly change the value of yourWebsiteURL, YourAppName, userName and GroupName and other corresponding paths.

   <VirtualHost *:80>
    ServerName yourWebsiteURL

    WSGIDaemonProcess yourAppName user=UserName group=GroupName home=/var/www/rgaugury/webUI/
    WSGIScriptAlias / /var/www/rgaugury/webUI/rga.wsgi

    <Directory /var/www/rgaugury/webUI>
        WSGIProcessGroup yourAppName
        WSGIApplicationGroup %{GLOBAL}
        WSGIScriptReloading On
        Require all granted
    </Directory>

        ErrorLog ${APACHE_LOG_DIR}/error.log
        CustomLog ${APACHE_LOG_DIR}/access.log combined

   </VirtualHost>

2 . Enable the website

~$ sudo a2ensite rga.conf

3 . In the 'webUI/config.py' file, change configures as follows:

ENABLE_CACHE = True

DEBUG = False 

ENVIR = {
'PATH' : '/usr/bin/:\
/opt/blast-2.2.31+/bin:\
/opt/hmmer3.1b2/bin:\
/opt/pfam_scan:\
/opt/interproscan:\
/opt/RGAugury:\
/opt/RGAugury/coils:\
/opt/cvit:\
/opt/phobius',

 'JAVA_HOME':'/usr/lib/jvm/java-8-oracle',
 'PERL5LIB':'/opt/pfam_scan',
 'COILSDIR':'/opt/RGAugury/coils',
 'PFAMDB':'/data/DATA/pfam.v27'
}


#Change the path with yours.
PIPELINE_HOME='/home/quanx/job/repo/rgaugury'
# add your bitbucket account to see Help page
USER_NAME = 'your bitbucket email'
PASSWORD = 'your bitbucket password'

4 . change the file permission to 600

~$ sudo chmod 600 /var/www/rgaugury/webUI/config.py

####Step 4: Restart the apache service

~$ sudo service apache2 restart

Essential software

1 . install python2.7

"~$ sudo apt-get install python2.7"

2 . install python-pip

"~$ sudo apt-get install python-pip"

3 . install python-dev

"~$ sudo apt-get install python-dev" to install psutil

4 . install modules

"~$ sudo pip install -r /var/www/rgaugury/webUI/requirements.txt" (requirements.txt is under webUI folder)

5 . install database

"~$ sudo apt-get install sqlite3"

6 . initialize sqlite database as corresponding directory owner, here it's 'www-data'

~$ sudo ./initializeDB.py
~$ sudo chown -R www-data:www-data /var/www/rgaugury/webUI

Configuration

Below configurations are for developers.

  1. cpu option can be turned on by setting cpu_toggle as 1 in webUI/config.py.

  2. By default, the Flask internal server port is 7000. It can be changed in config.py file. If you deployed the website on Apache then the default port is 80.

  3. Stylus installation on Ubuntu (optional) css files are generated by "stylus". If you want to modify css, you should install stylus first, and compile the stylus file to generate the corresponding css file.

~$ sudo apt-get install npm
~$ sudo npm install stylus -g
~$ sudo ln -s /usr/bin/nodejs /usr/bin/node

Web UI Help

To read more help of web version of RGAugury, please click here

Above configurations have been fully tested on Ubuntu 14.04/16.04 LTS.

Updated