Wiki

Clone wiki

RGAugury / Home

Welcome

Welcome to our RGAugury wiki page! please follow below steps to install and configure your environment variables for command line or web version. Please make sure that command line works well prior to the installation of web version.

Standard installation

Requisitions

If all the software, as well as required Perl or Python modules, were installed for webUI purpose, please keep in mind all of them should be installed to a directory that all users have privileges to access to.

However, if you have the chance to run docker, refer to the section of Docker in this wiki to simplify the installation.

Essential software

Make sure below programs are correctly installed according to their installation manual and set up for environment variables in .bashrc or .bash_profile. All these programs or scripts are able to be downloaded by clicking corresponding links.

  • BLAST+ package download the file ending with "x64-linux.tar.gz" extension

  • Hmmer3 install Hmmer prior to pfam_scan package

  • Java usually the java 11 will be needed for new Interproscan, make sure it's installed properly.

  • pfam_scan package, make sure pfam_scan.pl can directly run from anywhere without adding path prefix. Check this link for easier dependency installation.

  • phobius1.01 packages, this is a 32bit program, you need to make sure the 64bit Linux Operation System has installed 32bit runtime (libstdc++6:i386) to load it. Refer to this thread for further help.

  • ncoils package has been embedded in this package, given that a minor modification in source code, making it adatp to the pipeline, thus we don't hope you use original one.

  • git is optional for you to directly clone our repository. We highly suggest you to use git to clone this repository in that the files' permission can be kept in right way.

  • jdk, JDK 1.8 is a requisite component when using InterproScan over v57.

  • interproscan, a HMM based domain/motif identification package

  • CViT, a genomic linkage feature visualization tools package based on Perl. Be sure all required perl modules have been successfully installed and no error reported when using CViT independent of RGAugury.

Library

Prior to installation of GD modules, you might need to install below libraries first.

Modules

RGAugury dependency

  • Log::Log4perl to log progressing status. Use command "cpan install Log::Log4perl" to install

  • GD graphic library.

CViT needs below modules:

Pfam_scan.pl needs below module:

  • Moose this is an essential module for pfam_scan package, see Pfam_scan's README to install. Following this guide for easier install. Or use command "cpan install Moose".

  • bioperl install BioPerl core via CPAN or its official website.

Check above installed software and programs and make sure all of them have been correctly setup the owner and file permission.

.bashrc or .bash_profile configuration

Below is a example how I setup my environment variables from scratch in a clean Ubuntu 14.04/16.04 LTS, user should change path correspondingly.

  export PATH=$PATH:/home/lipch/bin/phobius1.01  # to specify the path of phobius.pl script and binary.

  export PATH=$PATH:/home/lipch/bin/hmmer3/bin   # binary path

  export PATH=$PATH:/home/lipch/bin/blast/bin    # binary path of blast+ package

  export PATH=$PATH:/home/lipch/RGAugury_pipeline  # this package scripts path

  export PATH=$PATH:/home/lipch/RGAugury_pipeline/coils  #the path to scoils-ht, which is a modified version of coils to adapt to RGAugury pipeline.

  export PATH=$PATH:/home/lipch/database/interproscan-x.xx-xx.0    #download latest one as your wish. Do not add the path of "bin" under interproscan directory.

  export PATH=$PATH:/home/lipch/Downloads/PfamScan    #to specify the path for script of pfam_scan.pl

  export PATH=$PATH:/home/lipch/bin/cvit.1.2.1        #to specify the path of cvit.pl in CViT package, make sure cvit.pl can be found by 'which' command.

  export COILSDIR=/home/lipch/RGAugury_pipeline/coils:$COILSDIR # or create a plain file with putting this command only but a directory all user can access and drop it to /etc/profile.d/, file permission changes to 755, otherwise export it to user's profile and point to another user authorized directory

  export PERL5LIB=/home/lipch/Downloads/PfamScan:$PERL5LIB  #perl module for pfam_scan.pl

  export PFAMDB=/home/lipch/database/pfamdb:$PFAMDB           #to specifiy the hmm pfam-A/B DB path

interproscan.properties configuration

Due to the parallel modification on Tools.pm, thus we need to change the worker number of interproscan to 1, which will avoid the panic of RAM. Be aware of that we only optimized for regular workstation with multile thread supported, if you want to take advantate of grid, please refer to corresponding interproscan manual.

number.of.embedded.workers=1
maxnumber.of.embedded.workers=1

Installation of RGAugury pipeline by git

Download this pipeline by trying below command under Linux system if GIT was installed.

git clone https://bitbucket.org/yaanlpc/rgaugury.git

Before running pipeline, make sure all Perl scripts files permission are modified to 755, in directory of RGAugury:

chmod 755 *.pl
under directory of coils, try:
chmod 755 scoils-ht

And make sure the path of RGAugury has been exported into the ENV.

database

  • pfam Follow the installation guide of pfam_scan package["Download Pfam data files" section] to prepare binary files by using three input files downloaded from pfam db website (xfam.org), including Pfam-A.hmm, Pfam-A.hmm.dat, active_site.dat, Make sure put all files under directory of /home/user_ID_to_be_replaced_by_yours/database/pfam/, because this path has been hard coded in our scripts. Alternatively, make sure pfam folder is consisted with setting of $pfam_index_folder in RGAugury.pl

  • RGADB, RGADB has been embedded in this package. Be sure to keep its location without any change.

  • panther, if panther db will be used in either command line or web UI, be sure install it correctly according to instruction of interproscan package, meanwhile, configuration file of interproscan might need proper modification.

File formats requirements

FASTA format for protein and DNA

A typical file in fasta format is a text-based protein or DNA sequence file, which usually starts with symbol ">" for each unique accession number, in RGAugury package, gene ID is usually used as header. see an example at below, to decrease the file parsing error, no other supplementary info in header is included apart from accession number:

>AT1G52660.1
MGKDFKSLVTRCIYVGKMNDNAKKLKIATEELKDLGNNVMKRVKLCEEQQQMKRLDKVQTWLRQADTVIKEAEEYFLMSSSSSSSGLISSSHKMEKKICKKLKEVQEIKSRGMFEVVAESTGGIGGGAGGGLTIKDSDEQTIGLEAVSGLVWRCLTMENTGIIGLYGVEGVGKTTVLTQVNNRLLQQKANGFDFVLWVFVSKNLNLQKIQDTIREKIGFLDRTWTSKSEEEKAAKIFEILSKRRFALFLDDVWEKVDLVKAGVPPPDAQNRSKIVFTTCSEEVCKEMSAQTKIKVEKLAWERAWDLFKKNVGEDTIKSHPDIAKVAQEVAARCDGLPLALVTIGRAMASKKTPQEWRDALYILSNSPPNFSVLKLLDRN

gff3

gff3 file was abbreviated from generic feature format, which had fixed number of columns and defined gene structure and their coordination in genome aspect. Below is a gene definition example for major splicing form of gene AT1G01010.1 and AT1G01020.1

All columns in gff file are delimited by tabs. Prior to submitting to RGAugury, gff3 needs a pre-processing or standarizaton for better parsing, like sorting by column chr (column 1) in ascending option, and followed by gene start (column 4) and end (column 5) afterwards in ascending option too regardless of gene strands.

If you want RGAugury create a genomic RGA distribution figure, please do follow the nomenclature for column 1: 'Chr' + digital number. such as Chr1 or Chr01. The other format of column 1 like scaffold_1234 or scaffold20:20-4500 won't get genomic RGA distribution figure.

Column 9 is the gene accession number, which must start with "ID=" and ending with semi-comma, gene accession number in this column should 100% match with protein or DNA fasta gene accession number within fasta file.

So far, only four feature are allowed in column 3 (gene feature): mRNA, UTR, CDS and exon. Each gene MUST have a mRNA row.

Chr1    phytozomev10    mRNA    3631    5899    .   +   .   ID=AT1G01010.1;
Chr1    phytozomev10    UTR 3631    3759    .   +   .   ID=AT1G01010.1;
Chr1    phytozomev10    CDS 3760    3913    .   +   0   ID=AT1G01010.1;
Chr1    phytozomev10    CDS 3996    4276    .   +   2   ID=AT1G01010.1;
Chr1    phytozomev10    CDS 4486    4605    .   +   0   ID=AT1G01010.1;
Chr1    phytozomev10    CDS 4706    5095    .   +   0   ID=AT1G01010.1;
Chr1    phytozomev10    CDS 5174    5326    .   +   0   ID=AT1G01010.1;
Chr1    phytozomev10    CDS 5439    5630    .   +   0   ID=AT1G01010.1;
Chr1    phytozomev10    UTR 5631    5899    .   +   .   ID=AT1G01010.1;
Chr1    phytozomev10    mRNA    5928    8737    .   -   .   ID=AT1G01020.1;
Chr1    phytozomev10    UTR 5928    6263    .   -   .   ID=AT1G01020.1;
Chr1    phytozomev10    UTR 6437    6914    .   -   .   ID=AT1G01020.1;
Chr1    phytozomev10    CDS 6915    7069    .   -   2   ID=AT1G01020.1;
Chr1    phytozomev10    CDS 7157    7232    .   -   0   ID=AT1G01020.1;
Chr1    phytozomev10    CDS 7384    7450    .   -   1   ID=AT1G01020.1;
Chr1    phytozomev10    CDS 7564    7649    .   -   0   ID=AT1G01020.1;
Chr1    phytozomev10    CDS 7762    7835    .   -   2   ID=AT1G01020.1;
Chr1    phytozomev10    CDS 7942    7987    .   -   0   ID=AT1G01020.1;
Chr1    phytozomev10    CDS 8236    8325    .   -   0   ID=AT1G01020.1;
Chr1    phytozomev10    CDS 8417    8464    .   -   0   ID=AT1G01020.1;
Chr1    phytozomev10    CDS 8571    8666    .   -   0   ID=AT1G01020.1;
Chr1    phytozomev10    UTR 8667    8737    .   -   .   ID=AT1G01020.1;

Usage

main script RGAugury.pl has six options, but only input file is mandatory to be specified in command line, make sure fasta file's seq title has only no-space gene ID. Export the RGAugury directory PATH to ENV variable. The first time of execution of pipeline would be longer than expectation as InterProscan need to initialize some of its dataset prior to scanning the db.

Scripts: Resistance Gene Analogs (RGAs) prediction pipeline

 Programmed by Pingchuan Li @ AAFC - Dr. Frank You Lab

Usage :perl RGAugury.pl <options>

arguments: 

        -p           protein fasta file
        -n           corresponding cDNA/CDS nucleotide for -p   (optional)
        -g           genome file in fasta format   (optional)
        -gff         a modified gff3-like file, see below format  (optional)
        -c           cpu or threads number, default = 2
        -pfx         prefix for filename, useful for multiple speices input in same folder   (optional)

Container version of RGAugury Installation, including Docker and Podman.

Along with the years of feedback from our users, the installation appears to be a challenge for non bioinformatician users. In order to simplify the whole process and shorten the time they can begin to use this tool, we wrapped everything mentioned above except the giant database files, so the user can download this well developed container version of RGAugury from docker.io. Though the physical environment can acquire the best efficiency, container environment can achieve the most easiest way to deploy the pipeline.

Generally, as long as interproscan and pfamdb are ready within the host, then we will take care of the leftover. A brief instruction below can ease the painfulness of installation.

steps:

Several key steps are to be discussed below, though we have cut 90% of installation steps for our users, a 10% of work still relies on them, including interproscan and Pfam db.

Installation of Interproscan

Given the capacity of Interproscan, it's not included in the Docker.io, so our users have to take care of the installation themselves, good news is, compared to the installation of the entire package, the installation of InterProscan and other dependencies are way easier. The path of the interproscan will be needed afterward.

Installation of pfam db.

simply download the required pfamdb, (ending with .hmm, .hmm.dat and activate_site.dat in v33.1), and apply the same installation in the 'database' section. To be clear, we dont' need the pfamscan package, we only need the hmmpress (in Hmmer3) to successfully format the pfam, refer to the README of pfamscan regarding requested DB and data initialization, write down the path of pfam db for container usage.

Installation of Docker container.

For my case, I used below command to grab a docker image from docker.io and start it without virtual machine, but you are still in a virtual environment without concern of ruining your Linux.

1) download the Docker image for Docker, presuming you have already had the Docker installed as a service.

$ docker image pull yaanlpc/rgaugury:2.2
2) start the docker container

$ docker run -it  \
    --mount type=bind, source=/home/pingchuan/docker_project/database/pfam, target=/opt/pfam \
    --mount type=bind, source=/home/pingchuan/docker_project/database/interproscan, target=/opt/interproscan  \
    --mount type=bind, source=/home/pingchuan/docker_project/input, target=/root/input \
    yaanlpc/rgaugury:2.2 /bin/bash \
Replace the value of source to your own directory path, afterward, you will be in the container like in a real Linux OS. all the host directories including pfam and interproscan will be mounted within /opt of the container. You just need to focus on running the pipeline.

if you exit the container for whatever reson, the way to throw yourself back to the container :

$ docker exec -it {CONTAINERID} /bin/bash
simply replace the {containerid} to the one you acquired by below command:
$ docker ps -a
you will see the rgaugury container id in the list, be sure to remember it after you exit the running container, which can help to avoid the duplication of containers created again and again.

Installation of PODMAN container.

Podman is pretty similar or 100% identical to Docker literally. In most of cases, simply replace all the 'docker' to 'podman' within the command, then it will work.

Dowload image

podman pull docker.io/yaanlpc/rgaugury:2.2

Below example presuming you already have had podman installed well, the only difference here is the repository need to include the part of docker.io

podman run -it  \
    --mount type=bind,source=/home/pingchuan/docker_project/database/pfam,target=/opt/pfam \
    --mount type=bind,source=/home/pingchuan/docker_project/database/interproscan,target=/opt/interproscan  \
    --mount type=bind,source=/home/pingchuan/docker_project/input, target=/root/input \
    docker.io/yaanlpc/rgaugury:2.2 /bin/bash 

Docker/Podman test run

Once you connected to the container with above correct database (pfam and interproscan) parameters, you should be able to list them in the folder of /opt of container, and you can find a sample.fas file under /root/input

perl -S RGAugury.pl -p sample.fas -c 2 -pfx test
The results will be exported to host directory, or /home/pingchuan/docker_project/input for above example.

Retrieve the data from the docker/podman

SSH service has been integrated in the docker image, you can use scp command or docker/podman command to copy the file from container to host, thus below service in most of cases is not essential, however it might be easier for our users.

$ service ssh start

you can figure out the IP address by below command, so that you can use SCP or other command to grab the data to host.

$ ifconfig

running below command in the host

scp -r user@containerIP:/path/to/file

As the early version, WebUI was not integrated, however, stay tuned. That's it.

Web UI installation introduction

Deploy website on Apache

Please do following configuration from Ubuntu console.

####Step 1: Install Apahce and mod_wsgi and allow the ports.

 ~$ sudo apt-get update
 ~$ sudo apt-get install apache2
 ~$ sudo apt-get install libapache2-mod-wsgi

open "http://localhost:80", if the default apache page cannot be displayed, then execute command

~$ sudo ufw allow in "Apache Full"

####Step 2: Copy and change permission

1 . create a rgaugury directory under /var/www

~$ sudo mkdir /var/www/rgaugury  

2 . Copy the webUI folder to /var/www/rgaugury/

3 . Change ownership for example

~$ sudo chown -R www-data:www-data rgaugury/

####Step 3: Configure the website

1 . Create 'rga.conf' file under '/etc/apache2/sites-available' sample content is as follows, correspondingly change the value of yourWebsiteURL, YourAppName, userName and GroupName and other corresponding paths.

#!xml
   <VirtualHost *:80>
    ServerName yourWebsiteURL

    WSGIDaemonProcess yourAppName user=UserName group=GroupName home=/var/www/rgaugury/webUI/
    #usually, the user and group can be both valued as www-data

    WSGIScriptAlias / /var/www/rgaugury/webUI/rga.wsgi

    <Directory /var/www/rgaugury/webUI>
        WSGIProcessGroup yourAppName
        WSGIApplicationGroup %{GLOBAL}
        WSGIScriptReloading On
        Require all granted
    </Directory>

        ErrorLog ${APACHE_LOG_DIR}/error.log
        CustomLog ${APACHE_LOG_DIR}/access.log combined

   </VirtualHost>

2 . Enable the website

~$ sudo a2ensite rga.conf

3 . In the 'webUI/config.py' file, change configures as follows:

ENABLE_CACHE = True

DEBUG = False 

ENVIR = {
'PATH' : '/usr/bin/:\
/opt/blast-2.2.31+/bin:\
/opt/hmmer3.1b2/bin:\
/opt/pfam_scan:\
/opt/interproscan:\
/opt/RGAugury:\
/opt/RGAugury/coils:\
/opt/cvit:\
/opt/phobius',

 'JAVA_HOME':'/usr/lib/jvm/java-8-oracle',
 'PERL5LIB':'/opt/pfam_scan',
 'COILSDIR':'/opt/RGAugury/coils',
 'PFAMDB':'/data/DATA/pfam.v27'
}


#Change the path with yours.
PIPELINE_HOME='/home/quanx/job/repo/rgaugury'
# add your bitbucket account to see Help page
USER_NAME = 'your bitbucket email'
PASSWORD = 'your bitbucket password'

4 . change the file permission to 600

~$ sudo chmod 600 /var/www/rgaugury/webUI/config.py

####Step 4: Restart the apache service

~$ sudo service apache2 restart

Essential software

1 . install python2.7

"~$ sudo apt-get install python2.7"

2 . install python-pip

"~$ sudo apt-get install python-pip"

3 . install python-dev

"~$ sudo apt-get install python-dev" to install psutil

4 . install modules

"~$ sudo pip install -r /var/www/rgaugury/webUI/requirements.txt" (requirements.txt is under webUI folder)

5 . install database

"~$ sudo apt-get install sqlite3"

6 . initialize sqlite database as corresponding directory owner, here it's 'www-data'

~$ sudo ./initializeDB.py
~$ sudo chown -R www-data:www-data /var/www/rgaugury/webUI

Configuration

Below configurations are for developers.

  1. cpu option can be turned on by setting cpu_toggle as 1 in webUI/config.py.

  2. By default, the Flask internal server port is 7000. It can be changed in config.py file. If you deployed the website on Apache then the default port is 80.

  3. Stylus installation on Ubuntu (optional) css files are generated by "stylus". If you want to modify css, you should install stylus first, and compile the stylus file to generate the corresponding css file.

~$ sudo apt-get install npm
~$ sudo npm install stylus -g
~$ sudo ln -s /usr/bin/nodejs /usr/bin/node

Web UI Help

To read more help of web version of RGAugury, please click here

Above configurations have been fully tested on Ubuntu 14.04/16.04 LTS.

Updated