Wiki
Clone wikiRGAugury / Home
Welcome
Welcome to our RGAugury wiki page! please follow below steps to install and configure your environment variables for command line or web version. Please make sure that command line works well prior to the installation of web version.
Standard installation
Requisitions
If all the software, as well as required Perl or Python modules, were installed for webUI purpose, please keep in mind all of them should be installed to a directory that all users have privileges to access to.
However, if you have the chance to run docker, refer to the section of Docker in this wiki to simplify the installation.
Essential software
Make sure below programs are correctly installed according to their installation manual and set up for environment variables in .bashrc or .bash_profile. All these programs or scripts are able to be downloaded by clicking corresponding links.
-
BLAST+ package download the file ending with "x64-linux.tar.gz" extension
-
Hmmer3 install Hmmer prior to pfam_scan package
-
Java usually the java 11 will be needed for new Interproscan, make sure it's installed properly.
-
pfam_scan package, make sure pfam_scan.pl can directly run from anywhere without adding path prefix. Check this link for easier dependency installation.
-
phobius1.01 packages, this is a 32bit program, you need to make sure the 64bit Linux Operation System has installed 32bit runtime (libstdc++6:i386) to load it. Refer to this thread for further help.
-
ncoils package has been embedded in this package, given that a minor modification in source code, making it adatp to the pipeline, thus we don't hope you use original one.
-
git is optional for you to directly clone our repository. We highly suggest you to use git to clone this repository in that the files' permission can be kept in right way.
-
jdk, JDK 1.8 is a requisite component when using InterproScan over v57.
-
interproscan, a HMM based domain/motif identification package
-
CViT, a genomic linkage feature visualization tools package based on Perl. Be sure all required perl modules have been successfully installed and no error reported when using CViT independent of RGAugury.
Library
Prior to installation of GD modules, you might need to install below libraries first.
Modules
RGAugury dependency
-
Log::Log4perl to log progressing status. Use command "cpan install Log::Log4perl" to install
-
GD graphic library.
CViT needs below modules:
Pfam_scan.pl needs below module:
-
Moose this is an essential module for pfam_scan package, see Pfam_scan's README to install. Following this guide for easier install. Or use command "cpan install Moose".
-
bioperl install BioPerl core via CPAN or its official website.
Check above installed software and programs and make sure all of them have been correctly setup the owner and file permission.
.bashrc or .bash_profile configuration
Below is a example how I setup my environment variables from scratch in a clean Ubuntu 14.04/16.04 LTS, user should change path correspondingly.
export PATH=$PATH:/home/lipch/bin/phobius1.01 # to specify the path of phobius.pl script and binary.
export PATH=$PATH:/home/lipch/bin/hmmer3/bin # binary path
export PATH=$PATH:/home/lipch/bin/blast/bin # binary path of blast+ package
export PATH=$PATH:/home/lipch/RGAugury_pipeline # this package scripts path
export PATH=$PATH:/home/lipch/RGAugury_pipeline/coils #the path to scoils-ht, which is a modified version of coils to adapt to RGAugury pipeline.
export PATH=$PATH:/home/lipch/database/interproscan-x.xx-xx.0 #download latest one as your wish. Do not add the path of "bin" under interproscan directory.
export PATH=$PATH:/home/lipch/Downloads/PfamScan #to specify the path for script of pfam_scan.pl
export PATH=$PATH:/home/lipch/bin/cvit.1.2.1 #to specify the path of cvit.pl in CViT package, make sure cvit.pl can be found by 'which' command.
export COILSDIR=/home/lipch/RGAugury_pipeline/coils:$COILSDIR # or create a plain file with putting this command only but a directory all user can access and drop it to /etc/profile.d/, file permission changes to 755, otherwise export it to user's profile and point to another user authorized directory
export PERL5LIB=/home/lipch/Downloads/PfamScan:$PERL5LIB #perl module for pfam_scan.pl
export PFAMDB=/home/lipch/database/pfamdb:$PFAMDB #to specifiy the hmm pfam-A/B DB path
interproscan.properties configuration
Due to the parallel modification on Tools.pm, thus we need to change the worker number of interproscan to 1, which will avoid the panic of RAM. Be aware of that we only optimized for regular workstation with multile thread supported, if you want to take advantate of grid, please refer to corresponding interproscan manual.
number.of.embedded.workers=1
maxnumber.of.embedded.workers=1
Installation of RGAugury pipeline by git
Download this pipeline by trying below command under Linux system if GIT was installed.
git clone https://bitbucket.org/yaanlpc/rgaugury.git
Before running pipeline, make sure all Perl scripts files permission are modified to 755, in directory of RGAugury:
chmod 755 *.pl
chmod 755 scoils-ht
And make sure the path of RGAugury has been exported into the ENV.
database
-
pfam Follow the installation guide of pfam_scan package["Download Pfam data files" section] to prepare binary files by using three input files downloaded from pfam db website (xfam.org), including Pfam-A.hmm, Pfam-A.hmm.dat, active_site.dat, Make sure put all files under directory of /home/user_ID_to_be_replaced_by_yours/database/pfam/, because this path has been hard coded in our scripts. Alternatively, make sure pfam folder is consisted with setting of $pfam_index_folder in RGAugury.pl
-
RGADB, RGADB has been embedded in this package. Be sure to keep its location without any change.
-
panther, if panther db will be used in either command line or web UI, be sure install it correctly according to instruction of interproscan package, meanwhile, configuration file of interproscan might need proper modification.
File formats requirements
FASTA format for protein and DNA
A typical file in fasta format is a text-based protein or DNA sequence file, which usually starts with symbol ">" for each unique accession number, in RGAugury package, gene ID is usually used as header. see an example at below, to decrease the file parsing error, no other supplementary info in header is included apart from accession number:
>AT1G52660.1 MGKDFKSLVTRCIYVGKMNDNAKKLKIATEELKDLGNNVMKRVKLCEEQQQMKRLDKVQTWLRQADTVIKEAEEYFLMSSSSSSSGLISSSHKMEKKICKKLKEVQEIKSRGMFEVVAESTGGIGGGAGGGLTIKDSDEQTIGLEAVSGLVWRCLTMENTGIIGLYGVEGVGKTTVLTQVNNRLLQQKANGFDFVLWVFVSKNLNLQKIQDTIREKIGFLDRTWTSKSEEEKAAKIFEILSKRRFALFLDDVWEKVDLVKAGVPPPDAQNRSKIVFTTCSEEVCKEMSAQTKIKVEKLAWERAWDLFKKNVGEDTIKSHPDIAKVAQEVAARCDGLPLALVTIGRAMASKKTPQEWRDALYILSNSPPNFSVLKLLDRN
gff3
gff3 file was abbreviated from generic feature format, which had fixed number of columns and defined gene structure and their coordination in genome aspect. Below is a gene definition example for major splicing form of gene AT1G01010.1 and AT1G01020.1
All columns in gff file are delimited by tabs. Prior to submitting to RGAugury, gff3 needs a pre-processing or standarizaton for better parsing, like sorting by column chr (column 1) in ascending option, and followed by gene start (column 4) and end (column 5) afterwards in ascending option too regardless of gene strands.
If you want RGAugury create a genomic RGA distribution figure, please do follow the nomenclature for column 1: 'Chr' + digital number. such as Chr1 or Chr01. The other format of column 1 like scaffold_1234 or scaffold20:20-4500 won't get genomic RGA distribution figure.
Column 9 is the gene accession number, which must start with "ID=" and ending with semi-comma, gene accession number in this column should 100% match with protein or DNA fasta gene accession number within fasta file.
So far, only four feature are allowed in column 3 (gene feature): mRNA, UTR, CDS and exon. Each gene MUST have a mRNA row.
Chr1 phytozomev10 mRNA 3631 5899 . + . ID=AT1G01010.1; Chr1 phytozomev10 UTR 3631 3759 . + . ID=AT1G01010.1; Chr1 phytozomev10 CDS 3760 3913 . + 0 ID=AT1G01010.1; Chr1 phytozomev10 CDS 3996 4276 . + 2 ID=AT1G01010.1; Chr1 phytozomev10 CDS 4486 4605 . + 0 ID=AT1G01010.1; Chr1 phytozomev10 CDS 4706 5095 . + 0 ID=AT1G01010.1; Chr1 phytozomev10 CDS 5174 5326 . + 0 ID=AT1G01010.1; Chr1 phytozomev10 CDS 5439 5630 . + 0 ID=AT1G01010.1; Chr1 phytozomev10 UTR 5631 5899 . + . ID=AT1G01010.1; Chr1 phytozomev10 mRNA 5928 8737 . - . ID=AT1G01020.1; Chr1 phytozomev10 UTR 5928 6263 . - . ID=AT1G01020.1; Chr1 phytozomev10 UTR 6437 6914 . - . ID=AT1G01020.1; Chr1 phytozomev10 CDS 6915 7069 . - 2 ID=AT1G01020.1; Chr1 phytozomev10 CDS 7157 7232 . - 0 ID=AT1G01020.1; Chr1 phytozomev10 CDS 7384 7450 . - 1 ID=AT1G01020.1; Chr1 phytozomev10 CDS 7564 7649 . - 0 ID=AT1G01020.1; Chr1 phytozomev10 CDS 7762 7835 . - 2 ID=AT1G01020.1; Chr1 phytozomev10 CDS 7942 7987 . - 0 ID=AT1G01020.1; Chr1 phytozomev10 CDS 8236 8325 . - 0 ID=AT1G01020.1; Chr1 phytozomev10 CDS 8417 8464 . - 0 ID=AT1G01020.1; Chr1 phytozomev10 CDS 8571 8666 . - 0 ID=AT1G01020.1; Chr1 phytozomev10 UTR 8667 8737 . - . ID=AT1G01020.1;
Usage
main script RGAugury.pl has six options, but only input file is mandatory to be specified in command line, make sure fasta file's seq title has only no-space gene ID. Export the RGAugury directory PATH to ENV variable. The first time of execution of pipeline would be longer than expectation as InterProscan need to initialize some of its dataset prior to scanning the db.
Scripts: Resistance Gene Analogs (RGAs) prediction pipeline Programmed by Pingchuan Li @ AAFC - Dr. Frank You Lab Usage :perl RGAugury.pl <options> arguments: -p protein fasta file -n corresponding cDNA/CDS nucleotide for -p (optional) -g genome file in fasta format (optional) -gff a modified gff3-like file, see below format (optional) -c cpu or threads number, default = 2 -pfx prefix for filename, useful for multiple speices input in same folder (optional)
Container version of RGAugury Installation, including Docker and Podman.
Along with the years of feedback from our users, the installation appears to be a challenge for non bioinformatician users. In order to simplify the whole process and shorten the time they can begin to use this tool, we wrapped everything mentioned above except the giant database files, so the user can download this well developed container version of RGAugury from docker.io. Though the physical environment can acquire the best efficiency, container environment can achieve the most easiest way to deploy the pipeline.
Generally, as long as interproscan and pfamdb are ready within the host, then we will take care of the leftover. A brief instruction below can ease the painfulness of installation.
steps:
Several key steps are to be discussed below, though we have cut 90% of installation steps for our users, a 10% of work still relies on them, including interproscan and Pfam db.
Installation of Interproscan
Given the capacity of Interproscan, it's not included in the Docker.io, so our users have to take care of the installation themselves, good news is, compared to the installation of the entire package, the installation of InterProscan and other dependencies are way easier. The path of the interproscan will be needed afterward.
Installation of pfam db.
simply download the required pfamdb, (ending with .hmm, .hmm.dat and activate_site.dat in v33.1), and apply the same installation in the 'database' section. To be clear, we dont' need the pfamscan package, we only need the hmmpress (in Hmmer3) to successfully format the pfam, refer to the README of pfamscan regarding requested DB and data initialization, write down the path of pfam db for container usage.
Installation of Docker container.
For my case, I used below command to grab a docker image from docker.io and start it without virtual machine, but you are still in a virtual environment without concern of ruining your Linux.
1) download the Docker image for Docker, presuming you have already had the Docker installed as a service.
$ docker image pull yaanlpc/rgaugury:2.2
$ docker run -it \ --mount type=bind, source=/home/pingchuan/docker_project/database/pfam, target=/opt/pfam \ --mount type=bind, source=/home/pingchuan/docker_project/database/interproscan, target=/opt/interproscan \ --mount type=bind, source=/home/pingchuan/docker_project/input, target=/root/input \ yaanlpc/rgaugury:2.2 /bin/bash \
if you exit the container for whatever reson, the way to throw yourself back to the container :
$ docker exec -it {CONTAINERID} /bin/bash
$ docker ps -a
Installation of PODMAN container.
Podman is pretty similar or 100% identical to Docker literally. In most of cases, simply replace all the 'docker' to 'podman' within the command, then it will work.
Dowload image
podman pull docker.io/yaanlpc/rgaugury:2.2
Below example presuming you already have had podman installed well, the only difference here is the repository need to include the part of docker.io
podman run -it \ --mount type=bind,source=/home/pingchuan/docker_project/database/pfam,target=/opt/pfam \ --mount type=bind,source=/home/pingchuan/docker_project/database/interproscan,target=/opt/interproscan \ --mount type=bind,source=/home/pingchuan/docker_project/input, target=/root/input \ docker.io/yaanlpc/rgaugury:2.2 /bin/bash
Docker/Podman test run
Once you connected to the container with above correct database (pfam and interproscan) parameters, you should be able to list them in the folder of /opt of container, and you can find a sample.fas file under /root/input
perl -S RGAugury.pl -p sample.fas -c 2 -pfx test
Retrieve the data from the docker/podman
SSH service has been integrated in the docker image, you can use scp command or docker/podman command to copy the file from container to host, thus below service in most of cases is not essential, however it might be easier for our users.
$ service ssh start
you can figure out the IP address by below command, so that you can use SCP or other command to grab the data to host.
$ ifconfig
running below command in the host
scp -r user@containerIP:/path/to/file
As the early version, WebUI was not integrated, however, stay tuned. That's it.
Web UI installation introduction
Deploy website on Apache
Please do following configuration from Ubuntu console.
####Step 1: Install Apahce and mod_wsgi and allow the ports.
~$ sudo apt-get update ~$ sudo apt-get install apache2 ~$ sudo apt-get install libapache2-mod-wsgi
open "http://localhost:80", if the default apache page cannot be displayed, then execute command
~$ sudo ufw allow in "Apache Full"
####Step 2: Copy and change permission
1 . create a rgaugury directory under /var/www
~$ sudo mkdir /var/www/rgaugury
2 . Copy the webUI folder to /var/www/rgaugury/
3 . Change ownership for example
~$ sudo chown -R www-data:www-data rgaugury/
####Step 3: Configure the website
1 . Create 'rga.conf' file under '/etc/apache2/sites-available' sample content is as follows, correspondingly change the value of yourWebsiteURL, YourAppName, userName and GroupName and other corresponding paths.
#!xml <VirtualHost *:80> ServerName yourWebsiteURL WSGIDaemonProcess yourAppName user=UserName group=GroupName home=/var/www/rgaugury/webUI/ #usually, the user and group can be both valued as www-data WSGIScriptAlias / /var/www/rgaugury/webUI/rga.wsgi <Directory /var/www/rgaugury/webUI> WSGIProcessGroup yourAppName WSGIApplicationGroup %{GLOBAL} WSGIScriptReloading On Require all granted </Directory> ErrorLog ${APACHE_LOG_DIR}/error.log CustomLog ${APACHE_LOG_DIR}/access.log combined </VirtualHost>
2 . Enable the website
~$ sudo a2ensite rga.conf
3 . In the 'webUI/config.py' file, change configures as follows:
ENABLE_CACHE = True DEBUG = False ENVIR = { 'PATH' : '/usr/bin/:\ /opt/blast-2.2.31+/bin:\ /opt/hmmer3.1b2/bin:\ /opt/pfam_scan:\ /opt/interproscan:\ /opt/RGAugury:\ /opt/RGAugury/coils:\ /opt/cvit:\ /opt/phobius', 'JAVA_HOME':'/usr/lib/jvm/java-8-oracle', 'PERL5LIB':'/opt/pfam_scan', 'COILSDIR':'/opt/RGAugury/coils', 'PFAMDB':'/data/DATA/pfam.v27' } #Change the path with yours. PIPELINE_HOME='/home/quanx/job/repo/rgaugury' # add your bitbucket account to see Help page USER_NAME = 'your bitbucket email' PASSWORD = 'your bitbucket password'
4 . change the file permission to 600
~$ sudo chmod 600 /var/www/rgaugury/webUI/config.py
####Step 4: Restart the apache service
~$ sudo service apache2 restart
Essential software
1 . install python2.7
"~$ sudo apt-get install python2.7"
2 . install python-pip
"~$ sudo apt-get install python-pip"
3 . install python-dev
"~$ sudo apt-get install python-dev" to install psutil
4 . install modules
"~$ sudo pip install -r /var/www/rgaugury/webUI/requirements.txt" (requirements.txt is under webUI folder)
5 . install database
"~$ sudo apt-get install sqlite3"
6 . initialize sqlite database as corresponding directory owner, here it's 'www-data'
~$ sudo ./initializeDB.py ~$ sudo chown -R www-data:www-data /var/www/rgaugury/webUI
Configuration
Below configurations are for developers.
-
cpu option can be turned on by setting cpu_toggle as 1 in webUI/config.py.
-
By default, the Flask internal server port is 7000. It can be changed in config.py file. If you deployed the website on Apache then the default port is 80.
-
Stylus installation on Ubuntu (optional) css files are generated by "stylus". If you want to modify css, you should install stylus first, and compile the stylus file to generate the corresponding css file.
~$ sudo apt-get install npm ~$ sudo npm install stylus -g ~$ sudo ln -s /usr/bin/nodejs /usr/bin/node
Web UI Help
To read more help of web version of RGAugury, please click here
Above configurations have been fully tested on Ubuntu 14.04/16.04 LTS.
Updated