Wiki

Clone wiki

ApicoAMP / ApicoAMP (version 1) User Manual

Introduction

ApicoAMP is a program for predicting Apicoplast targeted membrane proteins that contain at least one transmembrane(TM) domain. ApicoAMP combines 10 different classifier results, each gotten from a classification unit containing two different classifiers: TM-classifier and GO-classifier.

TM-classifier uses features extracted exclusively from TM regions. GO-classifier depends on the existence of Gene Ontology (GO) annotations for the protein in question.

IMPORTANT NOTE ON THE USE OF APICOAMP: ApicoAMP accepts fasta formatted protein sequences. Due to the requirements of the algorithm, protein ids in addition to the protein sequences should be provided. Please make sure you use EupathDb ids of the proteins in the description line of the fasta formatted protein sequences.You can find an example fasta formatted sequence below, for your reference.

Instructions for running ApicoAMP

Requirements: Python version 2.7x is needed to launch the software, which can be downloaded from http://www.python.org/download/releases/2.7.3/ . First, unpack the archive ApicoAMPxx.zip on your path of choice. Please note some files in this directory are essential and any damage to them would cause ApicoAMP stop working.

NOTE on MAC OS X: Default downloads are stored in the Downloads folder. Once you download the ApicoAMPxx.zip, you need to open a Terminal and type the following commands:

cd Downloads

unzip ApicoAMPxx.zip

cd ApicoAMPxx

Please note that you'll need to replace the each appearance of xx with the actual version tag. For example for the first version of the software for MAC, xx should be replaced by _12_19_12_MAC_OS_X

ApicoAMP can be used from command line or can be launched through a graphical user interface (GUI).

Launching ApicoAP with Graphical User Interface (GUI):

  • For Mac OS X, simply double-clicking ApicoAMP_GUI.py would launch the software if the default program to open ".py" files is set to be the terminal. If instead of the program, an editor opens the source code, and you don't know how to change the default program to open a file, then, open the Terminal app, change directories to the location of ApicoAMP (as explained in detail above), and type: python ApicoAMP_GUI.py

  • NOTE on MAC OS X: Splash screen of the program may not appear when above instructions were followed. Instructions will be included as a separate wiki page for you to convert .py file to an .app file at the end of this document. When app file is launched splash screen will be visible.

  • For Windows, simply double click on ApicoAMP_GUI.py or open a command prompt, change directories to the location of ApicoAMP, and type: python ApicoAMP_GUI.py

Launching ApicoAMP from command line: Please refer to the section entitled "ApicoAMP - command line"

Instructions for using ApicoAMP GUI

  • Set the 'Vote count' parameter. ApicoAMP needs one parameter to be set by the user, which is called 'Vote count'. This parameter can be set between 6 and 10, inclusive. ApicoAMP, is essentially an ensemble classifier where 10 classifiers vote on a protein sequence. By setting the vote count parameter to, let's say 8, you are saying that you want to see a protein sequence predicted as positive if it got 8 or more positive votes from these ensembles. The bigger this parameter is, the stringent ApicoAMP will get at predicting a protein as positive.

  • Copy-paste FASTA formatted protein sequences to the text box, or import a FASTA formatted file by selecting the "Import sequences from fasta file" option. NOTE: exampleFasta directory contains some FASTA formatted files that can be used for testing ApicoAMP.

IMPORTANT NOTE: ApicoAMP make use of predicted TM topology of a given protein sequence for calculating certain features that are used in prediction. ApicoAMP also uses GO terms associated with given protein sequence. All these external knowledge that we make use of are accessible to ApicoAMP, only if you provide appropriate protein ids in addition to the protein sequences.

Since we used EupathDB for GO term extraction, we need you to provide EupathDb ids of proteins in the description line of the fasta formatted protein sequence files.

Here is an example fasta formatted sequence (that contains protein id in description line):

>BBOV_IV007730
MEIPAAASDLSNLDDHYVRSDDEVRDTTLIGRSRRCCVGKKTMWIVLLGTAILTAAITSG
IILLVTSLSGSKAKPSGGVKHIGKFDGLNRADCHVSPETFAELSSMAHLGEINVSDPAEI
VKYMDFTRMAKKFDRKYDTVAERHTAFLNFRRNHDIVKSHEHNKAATYTKDLNHFFDKDI
KAVAAKLLHKIDVYNESNISVTPTDTTATKENQPIYATLKNYSVSAGYPPIGSKVNFEDI
DWRRADAVTPVKDQGMCGSCWAFAAVGSVESLLKRQKTDVRLSEQELVSCQLGNQGCNGG
YSDYALNYIKFNGIHRSEEWPYLAADGKCVAHDGTKYYIKGYHAAKGRSVANQLLVMGPT
VVYIAVSEDLMHYSGGVFNGECSDSELNHAVLLVGEGYDSALKKRYWLLKNSWGTSWGED
GYFRLERTNTPTDKCGVLSYGYVPY
  • Click "Run ApicoAMP" button to see the predictions along with some extra information on the prediction. You'll encounter the following predictions on the second column: non-ApicoATMP, ApicoATMP, noTM, unknownId. First two are self-explanatory. noTM means that the sequence is predicted (by TMHMM) to contain no Transmembrane(TM) region, which makes ApicoAMP not applicable, since it exclusively predicts TM containing apicoplast targeted proteins. unknownId means that this id is not familiar to us, therefore we couldn't find TM-topology information for the given sequence.

Third column tells you about the vote count the corresponding protein received and the last column gives you information on the GO Classifier Result. In addition to the labels we talked above, GO-classifier might output "noGO", meaning there are no GO terms associated with the given id.

  • Optional: Save predictions to a file.

  • Click "Enter new data" button to enter new data. Note: Clicking this button enables the use of the rest of the form that is disabled after prediction results are displayed.

ApicoAMP - command line

*Usage:* ApicoAMP_CL.py -f FASTA_FILE -o OUT_FILE [-v VOTE_THRESHOLD]

*Options:*
  -h, --help            show this help message and exit
  -f FILE, --fastaFile=FILE
                        Read FASTA formatted sequences from FILE.
  -o FILE, --outputFile=FILE
                        Write output to FILE.
  -v THRESHOLD, --voteThreshold=THRESHOLD
                        THRESHOLD number of ensembles (out of 10) need to vote
                        positive for a sequence to be predicted as positive.
                        [default: 10]

Please cite the following paper if you are using this tool:

Cilingir, Gokcen, Audrey OT Lau, and Shira L. Broschat. "ApicoAMP: The first computational model for identifying apicoplast-targeted transmembrane proteins in Apicomplexa." Journal of microbiological methods 95.3 (2013): 313-319.

Updated