Wiki
Clone wikiCABSflex / Home
Welcome to CABSflex wiki page! Installation instructions and the method outline are provided on CABSflex OVERVIEW PAGE
Table of contents
2.2 Protein structure input options
2.3 Distance restraints options
3.3 Changing global flexibility
3.4 Changing local flexibility
3.6 Rebuilding representative models to all-atom representation
4.3 Handling not identical input and reference models
1. CABSflex modeling scheme
1.1 Pipeline
CABS-flex is a method for efficient simulations of protein flexibility. CABS-flex combines CABS coarse-grained model with structural clustering of the simulation results and reconstruction of selected models to all-atom representation. Number of models generated at each modeling step can be set up by the user. In the default mode (presented in the figure below), CABS-flex trajectory counts 1000 models in coarse-grained representation that are clustered to 10 representative models, subsequently reconstructed to all-atom models. In addition to protein models in PDB format, CABS-flex standalone provides a number of analysis and visualization options such as contact maps and histograms. The picture below shows CABS-flex pipeline with default settings.
1.2 Default distance restraints and simulation settings
CABS-flex simulations with default settings are complementary to short-timescale Molecular Dynamics (MD) simulations in predicting protein regions that undergo conformational changes as well as the extent of such changes. In default mode, CABS-flex uses a set of distance restraints and simulation settings obtained in work of Jamroz et al. (JCTC, 2013, 9 (1), 119–125). The default settings and restraints were optimized to provide the best possible convergence between CABS-flex simulations and a consensus picture of protein fluctuations in aqueous solution derived by all-atom Molecular Dynamics (MD) simulations (of 10 nanosecond length, with different force fields and for globular proteins). CABS-flex predictions of protein fluctuations have been also shown to be well correlated to fluctuations seen in NMR ensembles (Bioinformatics, 30:2150–2154, 2014). The default settings are as follows:
CABSflex -i INPUT --protein-restraints ss2 3 3.8 8.0 --temperature 1.4 1.4 --replicas 1 --mc-cycles 50 --mc-steps 50 --mc-annealing 20
See option descriptions:
1.3 Example simulation movies
The movies below shows example fluctuations of 1HPW protein structure generated by CABSflex with default settings:
- movie showing 10 representative all-atom models (created using PyMOL):
- morphing movie showing 10 representative all-atom models (created using PyMOL):
- movie showing CABS-flex trajectory (C-alpha trace only):
1.4 CABS simulation engine
CABS-flex method uses an efficient simulation engine: CABS coarse-grained protein model. The picture below shows comparison between all-atom representation (left) and CABS coarse-grained model representation (right) for an example 4-residue protein fragment. In CABS, single amino acid is represented by 4 atoms (or pseudo-atoms): C-alpha (CA), C-beta (CB), center of the mass of Side-Chain group (SC) and center of the peptide bond (cp). Note that CABSflex modeling scheme allows to apply/modify distance restraints between selected CA atoms or between selected SC pseudoatoms.
CABS design and applications have been recently described in the review: Chemical Reviews, 116:7898–7936, 2016
2. CABSflex options
2.1 Basic options
table of contents Click on an option link to read full description
- -i, --input-protein INPUT - Loads input protein structure.
2.2 Protein structure input options
table of contents Click on an option link to read full description
- -f, --protein-flexibility FLEXIBILITY - Modifies flexibility of selected protein residues.
- -g, --protein-restraints MODE GAP MIN MAX - Allows to generate a set of binary distance restraints for CA atoms, that keep the protein in predefined conformation.
- --protein-restraints-reduce FACTOR - Allows to reduce the number of generated restraints.
- -N, --no-protein-restraints - Turns off automatic restraints generation.
- --weighted-fit ARG - Allows to set and customize the way models are structurally aligned.
- --gauss-iterations NUM - Sets the maximum number of iterations of the dynamic weighted-fit algorithm to NUM.
2.3 Distance restraints options
table of contents Click on an option link to read full description
- --ca-rest-add RESI RESJ DIST WEIGHT - Adds distance restraint between CA atom in residue RESI and CA atom in residue RESJ.
- --sc-rest-add RESI RESJ DIST WEIGHT - Adds distance restraint between SC pseudo-atom in residue RESI and SC pseudo-atom in residue RESJ.
- --ca-rest-weight WEIGHT - Set global weight for all CA restraints (including automatically generated restraints for a protein).
- --sc-rest-weight WEIGHT - Set global weight for all SC restraints.
- --ca-rest-file FILE - Read CA restraints from file (use multiple times to add multiple files).
- --sc-rest-file FILE - Read SC restraints from file (use multiple times to add multiple files).
2.4 Simulation options
table of contents Click on an option link to read full description
- -a, --mc-annealing NUM - sets number of Monte Carlo temperature annealing cycles to NUM (NUM > 0, default value = 20, changing default value is recommended only for advanced users).
- -y, --mc-cycles NUM - sets number of Monte Carlo cycles to NUM (NUM>0, default value = 50).
- -s, --mc-steps NUM - sets number of Monte Carlo cycles between trajectory frames to NUM (NUM > 0, default value = 50).
- -r, --replicas NUM - sets number of replicas to be used in Replica Exchange Monte Carlo (NUM > 0, default value = 1, changing default value is recommended only for advanced users).
- -D, --replicas-dtemp DELTA - sets temperature increment between replicas (DELTA > 0, default value = 0.5).
- -t, --temperature TINIT TFINAL - sets temperature range for simulated annealing TINIT - initial temperature, TFINAL - final temperature (default values TINIT = 1.4 TFINAL = 1.4).
- -z, --random-seed SEED - sets seed for random number generator.
2.5 All-atom reconstruction options
table of contents Click on an option link to read full description
- -A, --aa-rebuild - Rebuild final models to all-atom representation (requires MODELLER installed).
- -m, --modeller-iterations NUM - Set number of iterations for the reconstruction procedure in MODELLER (default: 3).
2.6 Results analysis options
table of contents Click on an option link to read full description
- -R, --reference-pdb REF - Load reference complex structure.
- -k, --clustering-medoids NUM - Sets number of medoids in k-medoids clustering algorithm.
- --clustering-iterations NUM - Sets number of iterations of the clustering k-medoids algorithm.
- -n, --filtering-count NUM - Sets number of low-energy models from trajectories to be clustered (default 1000).
- --filtering-mode MODE - Picks (filtering-number/replicas) models from each replica.
- -M, --contact-maps - Stores contact maps matrix plots and histograms of contact frequencies.
- -T, --contact-threshold DIST - Set contact distance between side chains pseudoatoms (SC) for contact map plotting.
- --contact-threshold-aa DIST - Set contact distance between heavy atoms for contact map plotting (all-atom representative models only).
- --contact-map-colors COLORS - sets colors in hex code to be used in contact map color bars.
- --align METHOD - Method to be used to align target with reference pdb.
- --alignment-options - Options to be passed to method aligning (target).
2.7 Output options
table of contents Click on an option link to read full description
- -S, --save-cabs-files - Save CABSflex simulation file. The filename will have the following format: yymmddHHMMSS<RANDOM 6-CHARACTERS STRING>.cbs format. For example: 181116161924knWPtn.cbs
- -L, --load-cabs-files FILE - Load CABSflex simulation file(.cbs). This option allows for repeated scoring and analysis of CABSflex trajectories (with new settings, for example using a reference complex structure).
- -C, --save-config - Save simulation parameters in config file.
- -o, --pdb-output SELECTION - Select structures to be saved in the pdb format.
#2.8 Miscellaneous options table of contents Click on an option link to read full description
- --work-dir DIR - set working directory to DIR.
- --dssp-command PATH - provide path to dssp binary.
- --fortran-command PATH - provide path to fortran compiler binary.
- --image-file-format FMT - produces all the image files in given format.
- -V, --verbose VERBOSITY - Controls how explicit the program output is, 0 for silent mode (only critical messages), 4 for maximum verbosity (default: 2).
- --log - redirect all output to the log file (CABS.log)
- --version - print version and exit program
- -h, --help - print help and exit program
2.9 Options' index
-A
, --aa-rebuild
Rebuild final models to all-atom representation. (default: True)
--align
METHOD
Method to be used to align model with reference. Available options are:
-
SW -- Smith-Waterman (default)
-
blastp -- protein BLAST (requires NCBI+ package installed)
-
trivial -- simple sequential alignment, useful only to speed up run (by omitting Smith-Waterman algorithm) in case of obvious one-chain input and reference of the same length (e.g. when input and reference are the same file).
-
CSV -- loads alignment from given file (passed as alignment setting called
fname
) in format described by Berbalk et. al. in 2009.
--alignment-options
Options to be passed to method aligning target.
CABSflex --align blastp --alignment-options task=short-task
CABSflex --align blastp --alignment-options task=short-task
--ca-rest-add
RESI
RESJ
DIST
WEIGHT
Adds a distance restraint between CA (CA) atom in residue RESI and CA atom in residue RESJ.
DIST is a distance between these atoms and WEIGHT is restraints weight from [0, 1].
NOTES: * Can be used multiple times to add multiple restraints.
--ca-rest-file
FILE
Reads CA restraints from a file (use multiple times to add multiple files).
--ca-rest-weight
WEIGHT
Sets a global weight for all CA restraints (including automatically generated restraints for the protein) (default: 1.0)
-c
, --config
CONFIG
Reads options from the configuration file CONFIG
--clustering-iterations
NUM
Set the number of iterations of the clustering k-medoids algorithm (default: 100).
-k
, --clustering-medoids
NUM
Sets the number of medoids in the k-medoids clustering algorithm. This option also sets the number of final models to be generated. (default: 10)
--contact-map-colors
COLORS
Sets 6 colors (hex code, e.g. #00FF00 for green etc.) to be used in contact map color bars.
-M
, --contact-maps
Store contact maps matrix plots and histograms of contact frequencies.
--contact-threshold-aa
DIST
Set contact distance between heavy atoms for contact map plotting (all-atom representative models only). (default: 5.5 Angstroms)
-T
, --contact-threshold
DIST
Set contact distance between side chains pseudo-atoms (SC) for contact map plotting. (default: 6.5 Angstroms)
--dssp-command
PATH
Use the provided path to the dssp binary.
-n
, --filtering-count
NUM
Sets the number of low-energy models from trajectories to be clustered (default 1000)
--filtering-mode
MODE
Choose the filtering mode to select NUM (set by --filtering-count) models for clustering.
MODE can be either: (default: each)
each
- models are ordered by protein energy and top n = [NUM / R] (R is the number of replicas) is selected from EACH replicaall
- models are ordered by protein energy and top NUM is selected from ALL replicas combined
--fortran-command
PATH
Use the provided path to the fortran compiler binary.
--gauss-iterations
NUM
Sets number of iterations of dynamic weighted-fit algorithm used for superposition of structures.
This option has no effect when --weighted-fit is set to anything other than
gauss
.
NUM = 100 by default
-h
, --help
print help and exit program
--image-file-format
FMT
Produce all the image files in given format.
-i
, --input-protein
INPUT
Loads input protein structure.
INPUT can be either:
-
PDB code (optionally with chain IDs) i.e.
-i 1CE1:HL
loads chains H and L of 1CE1 protein structure downloaded from the PDB database -
path to a local PDB file (optionally gzipped)
--insertion-clash
DIST
This option enables advanced settings of building starting conformations of modelled complexes. The option sets distance in Angstroms between any two atoms (of different modeled chains) at which a clash occurs while building initial complex (default: 1.0 Angstrom)
-L
, --load-cabs-files
FILE
Loads CABSflex simulation files and allows for repeated scoring and analysis of CABSflex trajectories (with new settings , for example using a reference complex structure - --reference-pdb option).
--log
Automatically redirects output to the CABS.log file created in the working directory and stops progress bar from showing on higher verbosity levels and turns off log coloring. Piping standard error will not work with this option. If the log file already exists it will be appended to.
-a
, --mc-annealing
NUM
Sets the number of Monte Carlo temperature annealing cycles to NUM (NUM > 0, default value = 20, changing the default value is recommended only for advanced users).
-y
, --mc-cycles
NUM
Sets the number of Monte Carlo cycles to NUM (NUM>0, default value = 50). Total number of snapshots generated for each replica/trajectory = [mc-annealing] x [mc-cycles], default: 20x50=1000.
-s
, --mc-steps
NUM
Sets the number of Monte Carlo cycles between trajectory frames to NUM (NUM > 0, default value = 50). NUM = 1 means that every generated conformation will occur in trajectory. This option enables to increase the simulation length (between printed snapshots) and doesnt impact the number of snapshots in trajectories.
-m
, --modeller-iterations
NUM
Sets number of iterations for reconstruction procedure in MODELLER package (default: 3). Bigger numbers may result in more accurate models, but reconstruction will take longer.
-N
, --no-protein-restraints
Do not automatically generate any protein restraints. This option has precedence over the --protein-restraints option and will overwrite any settings set by the latter. With this flag on, restraints can still be added with the --ca-rest-add or --ca-rest-file options.
-o
, --pdb-output
SELECTION
Select structures to be saved in the pdb format.
Available options are:
* A
- all (default)
* R
- replicas
* F
- filtered
* C
- clusters
* M
- models
* N
- none
Example:
-o RM
- saves replicas and models
-f
, --protein-flexibility
FLEXIBILITY
Modifies flexibility of selected protein residues:
0
- fully flexible backbone,1
- almost stiff backbone (default value, given appropriate number of protein restraints),>1
- increased stiffness.
FLEXIBILITY can be either:
-
a positive real number - all protein residues will be assigned flexibility equal to this number.
-
bf
- flexibility for each residue is read from the beta factor column of the CA atom in the PDB input file. Note that the standard beta factors in PDB files have an opposite meaning to the CABSflex flexibility. Remember to edit the PDB file accordingly or useFLEXIBILITY = bfi
). -
bfi
- each residue is assigned its flexibility based on the inverted beta factors stored in the input PDB file, so that bf = 0.0->
f = 1.0 and bf >= 1.0->
f = 0.0 -
<filename>
- flexibility is read from file <filename> in the format of single residue entries: resid_ID <flexibility> i.e.12:A 0.75
, or residue ranges: resid_ID - resid_ID <flexibility> i.e.12:A - 15:A 0.75
Default value for residues not explicitely specified can be set by inserting at the top of the file a following line: default <default flexibility value>, if this line is omitted, the default value becomes 1.0. Multiple entries can be used.
-g
, --protein-restraints
MODE
GAP
MIN
MAX
Allows to generate a set of binary distance restraints for CA atoms, that keep the protein in predefined conformation
(default: all, 5, 5.0, 15.0
)
MODE can be either:
all
- generates restraints for all protein residuesss1
- generates restraints only when at least one restrained residue is assigned regular secondary structure (helix or sheet)ss2
- generates restraints only when both restrained residues are assigned regular secondary structure (helix, sheet)
GAP specifies the gap along the main chain for the two residues to be restrained. MIN and MAX are min and max values in Angstroms for the two residues to be restrained.
The default setting, recommended for standard applications, is all 5 5.0 15.0
--protein-restraints-reduce
FACTOR
Reduce the number of protein restraints by a FACTOR, where FACTOR is a number from [0, 1]. This option reduces the number of automatically generated restraints for the protein molecule in order to speed up computation. Restraints are randomly selected from all generated restraints, so that the final number of restraints #reduced = #all * FACTOR.
-z
, --random-seed
SEED
Sets the seed for random number generator.
-R
, --reference-pdb
REF
Loads a reference complex structure. This option allows for comparison with the reference complex structure and triggers additional analysis features
REF must be either:
[pdb code]:[protein chains]
...[pdb file]:[protein chains]
...
Examples:
1abc:AB:C
1abc:AB:CD
myfile.pdb:AB:C
myfile.pdb.gz:AB:CDE
-r
, --replicas
NUM
Sets the number of replicas to be used in Replica Exchange Monte Carlo (NUM > 0, default value = 1, changing the default value is recommended only for advanced users)
-D
, --replicas-dtemp
DELTA
Sets the temperature increment between replicas (DELTA > 0, default value = 0.5, changing the default value is recommended only for advanced users)
-S
, --save-cabs-files
Saves CABS-flex simulation files.
-C
, --save-config
Save simulation parameters in config file.
--sc-rest-add
RESI
RESJ
DIST
WEIGHT
Adds a distance restraint between SC pseudoatom in the residue RESI and SC pseudoatom in the residue RESJ; DIST is a distance between these pseudoatoms (the geometric centers of their side chain atoms) and WEIGHT is restraints weight from [0, 1]. Can be used multiple times to add multiple restraints.
--sc-rest-file
FILE
Reads SC restraints from a file (use multiple times to add multiple files).
--sc-rest-weight
WEIGHT
Sets a global weight for all SC restraints (default: 1.0)
-t
, --temperature
TINIT
TFINAL
Sets the temperature range for simulated annealing procedure: TINIT
- initial temperature, TFINAL
- final
temperature (default values TINIT=1.4
, TFINAL=1.4
).
CABS-flex uses a temperature-like parameter that does not correspond straightforwardly to the real temperature.
Temperature value around 1.0
roughly corresponds to nearly frozen conformation, while the folding temperature of small
proteins in the CABS model is usually around 2.0
.
-V
, --verbose
VERBOSITY
Controls how explicit the program output is, 0
for silent mode (only critical messages), 4
for maximum verbosity,
default 2
.
--version
print version and exit program
--weighted-fit
ARG
This option allows to set and customize the way models are structurally aligned, which affects both calculation of the RMSD/RMSF and clustering together with the selectiom of the final models. Models are aligned by the Kabsch optimal fit algorithm. This options assigns weights to all atoms, which specify how 'important' the atom is in the structural fit process. Weights are numbers from [0:1] range with '0' meaning 'irrelevant in fitting process.'
ARG
can be either:
off
Turns off weighted-fit (all weights are 1.0) (default).gauss
Weights are generated automatically in the iterative procedure described in Biophys J. 2006 Jun 15; 90(12): 4558-4573. The procedure consists of the following steps: (1) Set wi = 1.0 for i = [1,2 ... N], where N is the number of atoms. (2) Align structures using weights wi. (3) Calculate di - displacement of the i-th atom. (4) Update weights according to formula: wi = exp(-0.5 * di * di). Repeat (2) through (4) until convergence (max 100 iterations, can be changed with --gauss-iterations).flex
Weights are taken from the flexibility settings. (See help entry for --protein-flexibility).ss
Weights are taken from the secondary structure assignment. Atoms in helices and sheets are given w = 1.0, while those in loops and coil get w = 0.0.- <filename> Weights are read from a file <filename>. The file should follow this format:
default 1.0 (default value, if omitted w = 1.0 is assumed) 1:A 0.5 5:A 0.1 ... 1:B 0.99 ...
--work-dir
DIR
Set working directory to DIR.
3. Ready-to-use examples
3.1 Default simulation
To run CABSflex using the default settings (see chapter 1.2) use the following command:
$ CABSflex –i PDB/FILE
$ CABSflex –i 1hpw
$ CABSflex –i 1hpw.pdb
1hpw.pdb
from working directory.
3.2 Multiple chains
In contrast to web server version, standalone CABSflex supports simulations of proteins composed from several chains. For example, the command:
$ CABSflex -i 4w2o
$ CABSflex -i 4w2o:ABCDEFGH
$ CABSflex -i 4w2o:AEG
Same syntax works with path to the file on local drive:
$ CABSflex -i pdbs/4w2o.pdb:AEG
3.3 Changing global flexibility
In some cases default restraints are to be modified to achieve what is needed. One can need to only allow for small changes in backbone position, holding restraints for whole backbone, not only for elements with well defined secondary structure (as default restraints do). To change behavior of the whole input complex simply modify parameters with --protein-restraints
option, which takes four arguments. Let us look on an example of streptavidin, e.g. PDB id 1KL3
:
CABSflex -i 1kl3 --protein-restraints all 3 3.8 8.0
--protein-restraints
option) for information about all arguments passed. Only the first one was changed in respect to default settings.
It tells CABSflex to prepare restraints for all residues, not only for residues in secondary structures, as it would be in case of default ss2
first argument. Given arguments tells CABS, respectively:
- for which residues prepare restraints;
- how many nearby residues to skip;
- how low can the distance in space be to create restraint;
- how high can the distance in space be to create restraint.
And by modifying them one can adjust level of flexibility by changing the number of initial restraints.
Picture below shows 10 representative models for that run:
3.4 Changing local flexibility
Lets look again at the previous example with global flexibility set to all
. Protein with PDB id 1KL3
is a streptavidin. It is known, that loop 45-51 plays important role in dynamics of that protein. Unfortunately with all
mode the loop sticks to protein core to much. To deal with that problem, one can use --protein-flexibility
or -f
option that modifies local flexibility of chosen protein fragments.
To fix this particular simulation we prepare and load file telling CABS which part to modify and how much flexibility do we need. E.g. 1kl3flex.inp
, which contains only one line:
45:A - 51:A 0.0
tells CABSflex that the loop from 45th to 51th residue in chain A is to be as flexible as possible (0.0 -- scale is revered). Other flexible regions can be defined in following lines.
With such file we are ready to run wanted CABSflex simulation with command:
$ CABSflex -i 1kl3 -f 1kl3flex.inp
In pictures below there are 10 representative models for:
* default run with all
argument passed to --protein-restraints
- run with
all
passed to--protein-restraints
and flexible loop:
The loop is marked in red.
Movie below shows pseudotrajectory for default (bottom) and modified flexibility of 45-51 loop:
Plots below show RMSF for both runs. Note the difference in fluctuation of 45-51 loop:
- default run with
all
argument passed to--protein-restraints
- run with
all
passed to--protein-restraints
and flexible loop:
It is also possible to read flexibility from b-factor column in PDB
file. See --protein-flexibility FLEXIBILITY
for more information.
3.5 Changing temperature
Default temperature is set to 1.4
and is slightly above temperature of the crystal (1.0
in CABS units), but for example modeling intrinsically disordered regions in proteins may demand temperatures similar to those at which structure folding is to be simulated. Here is the example of MDM2 with flexible C- and N-terminal loops.
To run simulation in default temperature use:
$ CABSflex -i 1z1m
$ CABSflex -i 1z1m --temperature 1.4 1.4 --protein-restraints ss2 3 3.8 8.0
To perform simulation of the same protein in higher temperature simply run:
$ CABSflex -i 1z1m --temperature 2. 2.
3.6 Rebuilding representative models to all-atom representation
By default representative models are not rebuilt with Modeler package to all-atom representation (see the pipeline).
To force rebuild simply use --aa-rebuild
flag:
$ CABSflex –i 1hpw --aa-rebuild
Make sure that representative models are returned by CABSflex, so that option --pdb-output
or -o
is set to A
(all: replicas, clusters and models) or at least to M
(models). See full option description for more information.
3.7 Contact maps
Adding option -M
or --contact-maps
invokes creation of contact maps in contact_maps
directory.
Contact maps in CABSflex shows frequencies of contacts over full pseudotrajectory, in clusters and for representative models. In default mode, they are created basing on position of side chain centres of the masses (see chapter for CABS model representation) with the cutoff of 6.5 Angstroms. If representative models are rebuild to all-atom models -- contact involves any atoms and cutoff is set to 5.5 Angstroms.
contact_maps
directory contains contact map matrix plot (all.svg
file) and data (all.txt
) will be stored. To change image format use option --image-file-format
with format used by matplotlib, e.g. png
, pdf
, ps
, eps
and svg
. See matplotlib savefig documentation for more information. By default svg format is used.
Sample contact map for 1HPW`:
TXT
file contains residues PDB ids and number of frames in which they were in contact. Number of all frames is given in first comment line of the file:
# n=1000 A22 A23 302 A22 A24 301 A22 A25 677 A22 A26 162 ...
For multiple chains subsequent chains will be marked on one axis:
It is possible to change default colors of contact maps. Scale is divided into six uneven regions and one can easily pass list of 6 colors in hex format to CABSflex with option --contact-map-colors
, e.g.:
$ CABSflex -i 1kpw -M --contact-map-colors /#ffffff /#777777 /#4b8f24 /#e80915 /#f2d600 /#000000
For now feature allowing users to pick their own number and width of ranges is not going to be added, but in case of urgency Python developers are welcome to modify method ContactMap.save_fig
in module CABS.cmap
.
4. Output analysis
4.1 Output files
Simulation results are stored in working directory in following directories:
* output_pdbs -- PDB
files storing replica(s) pseudotrajectory, 10 top models and all models present in clusters.
* output_data -- RMSD file(s) for each frame of each replica (only one by default).
* plots -- plots (Energy vs. RMSD and RMSF) data and graphics.
First directory contains pseudotrajectory in pdb
file format and 10 representative models. Trajectory contains only CA-trace, while representative models are by default rebuild to all-atom representation. Sample result for 1hpw in cartoon and sticks representation is given below:
Second directory contains only csv
file with RMSDs of each frame comparing to reference structure (input structure by default).
plots
directory contains:
* total energy vs RMSD for all frames of pseudotrajectory;
* RMSD change during simulation;
* RMSF profile (root mean square fluctuation vs residue index).
For each plot there is also a csv
file containing all data present in pictures.
User can change directory in which output is to be stored with option --work-dir
, which can be useful in organization of projects that involve running multiple simulations:
$ for i in 1hpw 1no5 1kl3; do $ mkdir $i $ CABSflex -i $i --work-dir ./$i $ done
4.2 RMSD plot analysis
If --reference-pdb PDB
option is passed to CABSflex, it will compare the whole pseudotrajectory to given reference structure. If no reference is given input structure is used as reference. Comparison generates:
* plot of RMSF (root mean square fluctuation)
* total energy vs. RMSD (to reference structure if given, to input structure otherwise).
Sample output:
- RMSF (root mean square fluctuation) of subsequent target residues (around input position). In case of long target protein, only some reference residues are marked on x axis. Values of RMSF ranges from 0 to 1. Sample path
workdir/plots/RMSF_seq.svg
. Plain text file containing this data is available in correspondingworkdir/RMSF.csv
.
- Energy vs. RMSD to reference structure. Output of that analysis consist of plot and histogram of RMSDs distribution along pseudotrajectory.
Upper plot: energy vs. RMSD plot. Energy is given in CABS units. Sample path to plot and plain text file are, respectively,
workdir/plots/E_RMSD.svg
andworkdir/plots/E_RMSD.csv
. Lower histogram: counts of frames with particular RMSD. Bins are at most 1 Å width (less if difference between highest and lowest RMSD is less than 5).
- RMSD to reference vs. pseudotrajectory frame index. For each replica (only one by default) CABS provides history of RMSD changes (to reference PDB, if it was given; to input structure otherwise). Dotted line between points is introduced for clarity of points sequence. Sample path
workdir/plots/RMSD_frame_replica_<replica number>.svg
andworkdir/plots/RMSD_frame_replica_<replica number>.svg
for plain text file.
4.3 Handling not identical input and reference models
- Built-in sequence alignment.
During calculation of RMSD to reference structure sequential alignment between simulation and reference are created for target protein, as they may slightly differ. Option
--align
allows user to determine method of sequence alignment to be used. By default CABSflex uses its own implementation of Smith-Waterman algorithm. If package NCBI+ is installed, it is also possible to use protein BLAST. In that case one can set align method to blastp:CABSflex -i 1hpw --reference-pdb 1qve:A --align blastp
- Loading alignment from file.
If alignment to reference structures is known or when available sequence alignments are not enough to properly align structures -- path to reference alignment can be passed to CABSflex. To do so one needs to set
--align
argument toCSV
to order CABSflex to use aligning method that load external file, and--alignment-options
to pass file name asfname=<path>
. E.g.:Given file needs to be in CSV format as described by Berbalk et. al. in 2009 (doi: 10.1002/pro.213; alignments returned by CABSflex are in that particular csv format). Sample file is given below:CABSflex -i 1hpw --reference-pdb 1qve:A --align CSV --alignment-options fname=external/file.csv
reference template B:687:H C:687:H B:688:K C:688:K B:689:I C:689:I B:690:L C:690:L B:691:H C:691:H B:692:R C:692:R B:693:L C:693:L B:694:L C:694:L B:695:Q C:695:Q B:696:D C:696:D
4.4 Analysis of an already finished simulation
It is sometimes necessary to perform additional analysis of the results - for example calculate RMSD to another reference complex or produce contact maps with slightly changed cut-off. To perform this kind of analysis, remember run your original job with --save-cabs-files and --save-config-file option, e.g.:
$ CABSflex -i 1hpw --save-cabs-files --save-config-file
This option will result in storing three additional files: TRAF
and SEQ
in native CABS format and a CABSflex config file.
To re-run the default analysis of your job use the following command using -c, --config and --load-cabs-files options:
$ CABSflex -c SAVED_CONFIG_FILE --load-cabs-files PATH_TO_TRAF_FILE PATH_TO_SEQ_FILE
You can use this syntax to specify any additional analysis option (your command line options will overwrite any options specified in the CONFIG file).
4.5 Visualization
Output can be easily visualized with PyMOL. E.g. movie from section 1.2 :
can be prepared using PyMOL morph and mroll utilty.
To generate exactly the effect from above movie use cartoon_side_chain_helper
setting set to 1
, sticks and cartoon representation and backbone atoms (N, CA, O, C) colored differently than other atoms.
PyMOL can be easily use to get movies showing top models as in server version of CABSflex (See examples). To do so use this script:
from pymol import cmd for i in range(10): cmd.load('output_pdbs/model_%i.pdb' % i, 'my_prot', state=i + 1) # assuming it runs in working directory cmd.mset('1-10') # 10 frames corresponding to models for i in range(8): cmd.madd('1-10') # repeating those 10 frames 8 times cmd.util.mroll(1, 90) # rolling the protein for 90 frames (ca. 5 seconds; 24 frames = 1 sec). cmd.copy('surf', 'my_prot') cmd.hide() cmd.show('cartoon', 'my_prot') cmd.show('surface', 'surf') cmd.set('transparency', .3) cmd.util.chainbow('my_prot') cmd.color('gray', 'surf') # coloring surface gray and chainbowing cartoon representation
Updated