Wiki

Clone wiki

CABSflex / Home

CABS-flex-logo-1401.jpg

Welcome to CABSflex wiki page! Installation instructions and the method outline are provided on CABSflex OVERVIEW PAGE


Table of contents

1. CABSflex modeling scheme

1.1 Pipeline

1.2 Default distance restraints and simulation settings

1.3 Example simulation movies

1.4 CABS simulation engine

2. CABSflex options

2.1 Basic options

2.2 Protein structure input options

2.3 Distance restraints options

2.4 Simulation options

2.5 All-atom reconstruction options

2.6 Results analysis options

2.7 Output options

2.8 Miscellaneous options

2.9 Options' index

3. Ready-to-use examples

3.1 Default simulation

3.2 Multiple chains

3.3 Changing global flexibility

3.4 Changing local flexibility

3.5 Changing temperature

3.6 Rebuilding representative models to all-atom representation

3.7 Contact maps

4. Output analysis

4.1 Output files

4.2 RMSD plot analysis

4.3 Handling not identical input and reference models

4.4 Analysis of an already finished simulation

4.5 Visualization


1. CABSflex modeling scheme

1.1 Pipeline

table of contents

CABS-flex is a method for efficient simulations of protein flexibility. CABS-flex combines CABS coarse-grained model with structural clustering of the simulation results and reconstruction of selected models to all-atom representation. Number of models generated at each modeling step can be set up by the user. In the default mode (presented in the figure below), CABS-flex trajectory counts 1000 models in coarse-grained representation that are clustered to 10 representative models, subsequently reconstructed to all-atom models. In addition to protein models in PDB format, CABS-flex standalone provides a number of analysis and visualization options such as contact maps and histograms. The picture below shows CABS-flex pipeline with default settings.

pipeline-flex-standalone-800.jpg

1.2 Default distance restraints and simulation settings

table of contents

CABS-flex simulations with default settings are complementary to short-timescale Molecular Dynamics (MD) simulations in predicting protein regions that undergo conformational changes as well as the extent of such changes. In default mode, CABS-flex uses a set of distance restraints and simulation settings obtained in work of Jamroz et al. (JCTC, 2013, 9 (1), 119–125). The default settings and restraints were optimized to provide the best possible convergence between CABS-flex simulations and a consensus picture of protein fluctuations in aqueous solution derived by all-atom Molecular Dynamics (MD) simulations (of 10 nanosecond length, with different force fields and for globular proteins). CABS-flex predictions of protein fluctuations have been also shown to be well correlated to fluctuations seen in NMR ensembles (Bioinformatics, 30:2150–2154, 2014). The default settings are as follows:

CABSflex -i INPUT
    --protein-restraints ss2 3 3.8 8.0
    --temperature 1.4 1.4 
    --replicas 1
    --mc-cycles 50
    --mc-steps 50
    --mc-annealing 20

See option descriptions:

1.3 Example simulation movies

table of contents

The movies below shows example fluctuations of 1HPW protein structure generated by CABSflex with default settings:

  • movie showing 10 representative all-atom models (created using PyMOL):

Video 0)

  • morphing movie showing 10 representative all-atom models (created using PyMOL):

Video 1

  • movie showing CABS-flex trajectory (C-alpha trace only):

Video 2

1.4 CABS simulation engine

table of contents

CABS-flex method uses an efficient simulation engine: CABS coarse-grained protein model. The picture below shows comparison between all-atom representation (left) and CABS coarse-grained model representation (right) for an example 4-residue protein fragment. In CABS, single amino acid is represented by 4 atoms (or pseudo-atoms): C-alpha (CA), C-beta (CB), center of the mass of Side-Chain group (SC) and center of the peptide bond (cp). Note that CABSflex modeling scheme allows to apply/modify distance restraints between selected CA atoms or between selected SC pseudoatoms.

representation.png

CABS design and applications have been recently described in the review: Chemical Reviews, 116:7898–7936, 2016

2. CABSflex options

2.1 Basic options

table of contents Click on an option link to read full description

2.2 Protein structure input options

table of contents Click on an option link to read full description

2.3 Distance restraints options

table of contents Click on an option link to read full description

2.4 Simulation options

table of contents Click on an option link to read full description

  • -a, --mc-annealing NUM - sets number of Monte Carlo temperature annealing cycles to NUM (NUM > 0, default value = 20, changing default value is recommended only for advanced users).
  • -y, --mc-cycles NUM - sets number of Monte Carlo cycles to NUM (NUM>0, default value = 50).
  • -s, --mc-steps NUM - sets number of Monte Carlo cycles between trajectory frames to NUM (NUM > 0, default value = 50).
  • -r, --replicas NUM - sets number of replicas to be used in Replica Exchange Monte Carlo (NUM > 0, default value = 1, changing default value is recommended only for advanced users).
  • -D, --replicas-dtemp DELTA - sets temperature increment between replicas (DELTA > 0, default value = 0.5).
  • -t, --temperature TINIT TFINAL - sets temperature range for simulated annealing TINIT - initial temperature, TFINAL - final temperature (default values TINIT = 1.4 TFINAL = 1.4).
  • -z, --random-seed SEED - sets seed for random number generator.

2.5 All-atom reconstruction options

table of contents Click on an option link to read full description

2.6 Results analysis options

table of contents Click on an option link to read full description

2.7 Output options

table of contents Click on an option link to read full description

  • -S, --save-cabs-files - Save CABSflex simulation file. The filename will have the following format: yymmddHHMMSS<RANDOM 6-CHARACTERS STRING>.cbs format. For example: 181116161924knWPtn.cbs
  • -L, --load-cabs-files FILE - Load CABSflex simulation file(.cbs). This option allows for repeated scoring and analysis of CABSflex trajectories (with new settings, for example using a reference complex structure).
  • -C, --save-config - Save simulation parameters in config file.
  • -o, --pdb-output SELECTION - Select structures to be saved in the pdb format.

#2.8 Miscellaneous options table of contents Click on an option link to read full description

2.9 Options' index

-A, --aa-rebuild

Rebuild final models to all-atom representation. (default: True)


--align METHOD

Method to be used to align model with reference. Available options are:

  • SW -- Smith-Waterman (default)

  • blastp -- protein BLAST (requires NCBI+ package installed)

  • trivial -- simple sequential alignment, useful only to speed up run (by omitting Smith-Waterman algorithm) in case of obvious one-chain input and reference of the same length (e.g. when input and reference are the same file).

  • CSV -- loads alignment from given file (passed as alignment setting called fname) in format described by Berbalk et. al. in 2009.


--alignment-options

Options to be passed to method aligning target.

CABSflex --align blastp --alignment-options task=short-task

--ca-rest-add RESI RESJ DIST WEIGHT

Adds a distance restraint between CA (CA) atom in residue RESI and CA atom in residue RESJ.

DIST is a distance between these atoms and WEIGHT is restraints weight from [0, 1].

NOTES: * Can be used multiple times to add multiple restraints.


--ca-rest-file FILE

Reads CA restraints from a file (use multiple times to add multiple files).


--ca-rest-weight WEIGHT

Sets a global weight for all CA restraints (including automatically generated restraints for the protein) (default: 1.0)


-c, --config CONFIG

Reads options from the configuration file CONFIG


--clustering-iterations NUM

Set the number of iterations of the clustering k-medoids algorithm (default: 100).


-k, --clustering-medoids NUM

Sets the number of medoids in the k-medoids clustering algorithm. This option also sets the number of final models to be generated. (default: 10)


--contact-map-colors COLORS

Sets 6 colors (hex code, e.g. #00FF00 for green etc.) to be used in contact map color bars.


-M, --contact-maps

Store contact maps matrix plots and histograms of contact frequencies.


--contact-threshold-aa DIST

Set contact distance between heavy atoms for contact map plotting (all-atom representative models only). (default: 5.5 Angstroms)


-T, --contact-threshold DIST

Set contact distance between side chains pseudo-atoms (SC) for contact map plotting. (default: 6.5 Angstroms)


--dssp-command PATH

Use the provided path to the dssp binary.


-n, --filtering-count NUM

Sets the number of low-energy models from trajectories to be clustered (default 1000)


--filtering-mode MODE

Choose the filtering mode to select NUM (set by --filtering-count) models for clustering.

MODE can be either: (default: each)

  • each - models are ordered by protein energy and top n = [NUM / R] (R is the number of replicas) is selected from EACH replica
  • all - models are ordered by protein energy and top NUM is selected from ALL replicas combined

--fortran-command PATH

Use the provided path to the fortran compiler binary.


--gauss-iterations NUM

Sets number of iterations of dynamic weighted-fit algorithm used for superposition of structures. This option has no effect when --weighted-fit is set to anything other than gauss. NUM = 100 by default


-h, --help

print help and exit program


--image-file-format FMT

Produce all the image files in given format.


-i, --input-protein INPUT

Loads input protein structure.

INPUT can be either:

  • PDB code (optionally with chain IDs) i.e. -i 1CE1:HL loads chains H and L of 1CE1 protein structure downloaded from the PDB database

  • path to a local PDB file (optionally gzipped)


--insertion-clash DIST

This option enables advanced settings of building starting conformations of modelled complexes. The option sets distance in Angstroms between any two atoms (of different modeled chains) at which a clash occurs while building initial complex (default: 1.0 Angstrom)


-L, --load-cabs-files FILE

Loads CABSflex simulation files and allows for repeated scoring and analysis of CABSflex trajectories (with new settings , for example using a reference complex structure - --reference-pdb option).


--log

Automatically redirects output to the CABS.log file created in the working directory and stops progress bar from showing on higher verbosity levels and turns off log coloring. Piping standard error will not work with this option. If the log file already exists it will be appended to.


-a, --mc-annealing NUM

Sets the number of Monte Carlo temperature annealing cycles to NUM (NUM > 0, default value = 20, changing the default value is recommended only for advanced users).


-y, --mc-cycles NUM

Sets the number of Monte Carlo cycles to NUM (NUM>0, default value = 50). Total number of snapshots generated for each replica/trajectory = [mc-annealing] x [mc-cycles], default: 20x50=1000.


-s, --mc-steps NUM

Sets the number of Monte Carlo cycles between trajectory frames to NUM (NUM > 0, default value = 50). NUM = 1 means that every generated conformation will occur in trajectory. This option enables to increase the simulation length (between printed snapshots) and doesnt impact the number of snapshots in trajectories.

loops-in-cabs-dock-wide.png


-m, --modeller-iterations NUM

Sets number of iterations for reconstruction procedure in MODELLER package (default: 3). Bigger numbers may result in more accurate models, but reconstruction will take longer.


-N, --no-protein-restraints

Do not automatically generate any protein restraints. This option has precedence over the --protein-restraints option and will overwrite any settings set by the latter. With this flag on, restraints can still be added with the --ca-rest-add or --ca-rest-file options.


-o, --pdb-output SELECTION

Select structures to be saved in the pdb format. Available options are: * A - all (default) * R - replicas * F - filtered * C - clusters * M - models * N - none

Example: -o RM - saves replicas and models


-f, --protein-flexibility FLEXIBILITY

Modifies flexibility of selected protein residues:

  • 0 - fully flexible backbone,
  • 1 - almost stiff backbone (default value, given appropriate number of protein restraints),
  • >1 - increased stiffness.

FLEXIBILITY can be either:

  • a positive real number - all protein residues will be assigned flexibility equal to this number.

  • bf - flexibility for each residue is read from the beta factor column of the CA atom in the PDB input file. Note that the standard beta factors in PDB files have an opposite meaning to the CABSflex flexibility. Remember to edit the PDB file accordingly or use FLEXIBILITY = bfi).

  • bfi - each residue is assigned its flexibility based on the inverted beta factors stored in the input PDB file, so that bf = 0.0 -> f = 1.0 and bf >= 1.0 -> f = 0.0

  • <filename> - flexibility is read from file <filename> in the format of single residue entries: resid_ID <flexibility> i.e. 12:A 0.75, or residue ranges: resid_ID - resid_ID <flexibility> i.e. 12:A - 15:A 0.75

Default value for residues not explicitely specified can be set by inserting at the top of the file a following line: default <default flexibility value>, if this line is omitted, the default value becomes 1.0. Multiple entries can be used.


-g, --protein-restraints MODE GAP MIN MAX

Allows to generate a set of binary distance restraints for CA atoms, that keep the protein in predefined conformation (default: all, 5, 5.0, 15.0)

MODE can be either:

  • all - generates restraints for all protein residues
  • ss1 - generates restraints only when at least one restrained residue is assigned regular secondary structure (helix or sheet)
  • ss2 - generates restraints only when both restrained residues are assigned regular secondary structure (helix, sheet)

GAP specifies the gap along the main chain for the two residues to be restrained. MIN and MAX are min and max values in Angstroms for the two residues to be restrained.

The default setting, recommended for standard applications, is all 5 5.0 15.0


--protein-restraints-reduce FACTOR

Reduce the number of protein restraints by a FACTOR, where FACTOR is a number from [0, 1]. This option reduces the number of automatically generated restraints for the protein molecule in order to speed up computation. Restraints are randomly selected from all generated restraints, so that the final number of restraints #reduced = #all * FACTOR.


-z, --random-seed SEED

Sets the seed for random number generator.


-R, --reference-pdb REF

Loads a reference complex structure. This option allows for comparison with the reference complex structure and triggers additional analysis features

REF must be either:

  • [pdb code]:[protein chains]...
  • [pdb file]:[protein chains]...

Examples:

  • 1abc:AB:C
  • 1abc:AB:CD
  • myfile.pdb:AB:C
  • myfile.pdb.gz:AB:CDE

-r, --replicas NUM

Sets the number of replicas to be used in Replica Exchange Monte Carlo (NUM > 0, default value = 1, changing the default value is recommended only for advanced users)


-D, --replicas-dtemp DELTA

Sets the temperature increment between replicas (DELTA > 0, default value = 0.5, changing the default value is recommended only for advanced users)


-S, --save-cabs-files

Saves CABS-flex simulation files.


-C, --save-config

Save simulation parameters in config file.


--sc-rest-add RESI RESJ DIST WEIGHT

Adds a distance restraint between SC pseudoatom in the residue RESI and SC pseudoatom in the residue RESJ; DIST is a distance between these pseudoatoms (the geometric centers of their side chain atoms) and WEIGHT is restraints weight from [0, 1]. Can be used multiple times to add multiple restraints.


--sc-rest-file FILE

Reads SC restraints from a file (use multiple times to add multiple files).


--sc-rest-weight WEIGHT

Sets a global weight for all SC restraints (default: 1.0)


-t, --temperature TINIT TFINAL

Sets the temperature range for simulated annealing procedure: TINIT - initial temperature, TFINAL - final temperature (default values TINIT=1.4, TFINAL=1.4).

CABS-flex uses a temperature-like parameter that does not correspond straightforwardly to the real temperature. Temperature value around 1.0 roughly corresponds to nearly frozen conformation, while the folding temperature of small proteins in the CABS model is usually around 2.0.


-V, --verbose VERBOSITY

Controls how explicit the program output is, 0 for silent mode (only critical messages), 4 for maximum verbosity, default 2.


--version

print version and exit program


--weighted-fit ARG

This option allows to set and customize the way models are structurally aligned, which affects both calculation of the RMSD/RMSF and clustering together with the selectiom of the final models. Models are aligned by the Kabsch optimal fit algorithm. This options assigns weights to all atoms, which specify how 'important' the atom is in the structural fit process. Weights are numbers from [0:1] range with '0' meaning 'irrelevant in fitting process.'

ARG can be either:

  • off Turns off weighted-fit (all weights are 1.0) (default).
  • gauss Weights are generated automatically in the iterative procedure described in Biophys J. 2006 Jun 15; 90(12): 4558-4573. The procedure consists of the following steps: (1) Set wi = 1.0 for i = [1,2 ... N], where N is the number of atoms. (2) Align structures using weights wi. (3) Calculate di - displacement of the i-th atom. (4) Update weights according to formula: wi = exp(-0.5 * di * di). Repeat (2) through (4) until convergence (max 100 iterations, can be changed with --gauss-iterations).
  • flex Weights are taken from the flexibility settings. (See help entry for --protein-flexibility).
  • ss Weights are taken from the secondary structure assignment. Atoms in helices and sheets are given w = 1.0, while those in loops and coil get w = 0.0.
  • <filename> Weights are read from a file <filename>. The file should follow this format:
    default 1.0 (default value, if omitted w = 1.0 is assumed)
    1:A 0.5
    5:A 0.1
    ...
    1:B 0.99
    ...
    

--work-dir DIR

Set working directory to DIR.



3. Ready-to-use examples

3.1 Default simulation

table of contents

To run CABSflex using the default settings (see chapter 1.2) use the following command:

$ CABSflex –i PDB/FILE
For example:
$ CABSflex –i 1hpw
will download 1hpw.pdb file from PDB to cache and start simulation on default settings, while
$ CABSflex –i 1hpw.pdb
would try to open file 1hpw.pdb from working directory.

3.2 Multiple chains

table of contents

In contrast to web server version, standalone CABSflex supports simulations of proteins composed from several chains. For example, the command:

$ CABSflex -i 4w2o
loads all protein chains from 4w2o PDB entry and is equivalent to

$ CABSflex -i 4w2o:ABCDEFGH
since 4w2o protein is composed from 8 chains named from A to H. To simulate only selected protein chains write appropriate chain symbols after the colon sign, e.g.:
$ CABSflex -i 4w2o:AEG

Same syntax works with path to the file on local drive:

$ CABSflex -i pdbs/4w2o.pdb:AEG

4w2o_top10_800.png

3.3 Changing global flexibility

table of contents

In some cases default restraints are to be modified to achieve what is needed. One can need to only allow for small changes in backbone position, holding restraints for whole backbone, not only for elements with well defined secondary structure (as default restraints do). To change behavior of the whole input complex simply modify parameters with --protein-restraints option, which takes four arguments. Let us look on an example of streptavidin, e.g. PDB id 1KL3:

CABSflex -i 1kl3 --protein-restraints all 3 3.8 8.0
See full description of --protein-restraints option) for information about all arguments passed. Only the first one was changed in respect to default settings. It tells CABSflex to prepare restraints for all residues, not only for residues in secondary structures, as it would be in case of default ss2 first argument. Given arguments tells CABS, respectively:

  • for which residues prepare restraints;
  • how many nearby residues to skip;
  • how low can the distance in space be to create restraint;
  • how high can the distance in space be to create restraint.

And by modifying them one can adjust level of flexibility by changing the number of initial restraints.

Picture below shows 10 representative models for that run:

1kl3_all_topsi.png

3.4 Changing local flexibility

table of contents

Lets look again at the previous example with global flexibility set to all. Protein with PDB id 1KL3 is a streptavidin. It is known, that loop 45-51 plays important role in dynamics of that protein. Unfortunately with all mode the loop sticks to protein core to much. To deal with that problem, one can use --protein-flexibility or -f option that modifies local flexibility of chosen protein fragments. To fix this particular simulation we prepare and load file telling CABS which part to modify and how much flexibility do we need. E.g. 1kl3flex.inp, which contains only one line:

45:A - 51:A 0.0

tells CABSflex that the loop from 45th to 51th residue in chain A is to be as flexible as possible (0.0 -- scale is revered). Other flexible regions can be defined in following lines.

With such file we are ready to run wanted CABSflex simulation with command:

$ CABSflex -i 1kl3 -f 1kl3flex.inp

In pictures below there are 10 representative models for: * default run with all argument passed to --protein-restraints

1kl3_all_topsi.png

  • run with all passed to --protein-restraints and flexible loop:

1kl3_loop_topsi.png

The loop is marked in red.

Movie below shows pseudotrajectory for default (bottom) and modified flexibility of 45-51 loop:

Video 3

Plots below show RMSF for both runs. Note the difference in fluctuation of 45-51 loop:

  • default run with all argument passed to --protein-restraints

RMSF_seq_1kl3all.png

  • run with all passed to --protein-restraints and flexible loop:

RMSF_seq_1kl3loop.png

It is also possible to read flexibility from b-factor column in PDB file. See --protein-flexibility FLEXIBILITY for more information.

3.5 Changing temperature

table of contents

Default temperature is set to 1.4 and is slightly above temperature of the crystal (1.0 in CABS units), but for example modeling intrinsically disordered regions in proteins may demand temperatures similar to those at which structure folding is to be simulated. Here is the example of MDM2 with flexible C- and N-terminal loops.

To run simulation in default temperature use:

$ CABSflex -i 1z1m
It is equivalent to:
$ CABSflex -i 1z1m --temperature 1.4 1.4 --protein-restraints ss2 3 3.8 8.0

To perform simulation of the same protein in higher temperature simply run:

$ CABSflex -i 1z1m --temperature 2. 2.
Default temperature makes both terminal regions stick to protein core (upper picture), while in higher temperature one obtains much more extensive sampling of conformational space (lower picture):

1z1m800LT.png

1z1m800.png

3.6 Rebuilding representative models to all-atom representation

table of contents

By default representative models are not rebuilt with Modeler package to all-atom representation (see the pipeline). To force rebuild simply use --aa-rebuild flag:

$ CABSflex –i 1hpw --aa-rebuild

Make sure that representative models are returned by CABSflex, so that option --pdb-output or -o is set to A (all: replicas, clusters and models) or at least to M (models). See full option description for more information.

3.7 Contact maps

table of contents

Adding option -M or --contact-maps invokes creation of contact maps in contact_maps directory.

Contact maps in CABSflex shows frequencies of contacts over full pseudotrajectory, in clusters and for representative models. In default mode, they are created basing on position of side chain centres of the masses (see chapter for CABS model representation) with the cutoff of 6.5 Angstroms. If representative models are rebuild to all-atom models -- contact involves any atoms and cutoff is set to 5.5 Angstroms.

contact_maps directory contains contact map matrix plot (all.svg file) and data (all.txt) will be stored. To change image format use option --image-file-format with format used by matplotlib, e.g. png, pdf, ps, eps and svg. See matplotlib savefig documentation for more information. By default svg format is used.

Sample contact map for 1HPW`:

1hpwCM.png

TXT file contains residues PDB ids and number of frames in which they were in contact. Number of all frames is given in first comment line of the file:

# n=1000
A22 A23 302
A22 A24 301
A22 A25 677
A22 A26 162
...

For multiple chains subsequent chains will be marked on one axis:

1ck0CM.png

It is possible to change default colors of contact maps. Scale is divided into six uneven regions and one can easily pass list of 6 colors in hex format to CABSflex with option --contact-map-colors, e.g.:

$ CABSflex -i 1kpw -M --contact-map-colors /#ffffff /#777777 /#4b8f24 /#e80915 /#f2d600 /#000000

For now feature allowing users to pick their own number and width of ranges is not going to be added, but in case of urgency Python developers are welcome to modify method ContactMap.save_fig in module CABS.cmap.

4. Output analysis

4.1 Output files

table of contents

Simulation results are stored in working directory in following directories: * output_pdbs -- PDB files storing replica(s) pseudotrajectory, 10 top models and all models present in clusters. * output_data -- RMSD file(s) for each frame of each replica (only one by default). * plots -- plots (Energy vs. RMSD and RMSF) data and graphics.

First directory contains pseudotrajectory in pdb file format and 10 representative models. Trajectory contains only CA-trace, while representative models are by default rebuild to all-atom representation. Sample result for 1hpw in cartoon and sticks representation is given below:

pipeline-flex-standalone-800.jpg

Second directory contains only csv file with RMSDs of each frame comparing to reference structure (input structure by default).

plots directory contains: * total energy vs RMSD for all frames of pseudotrajectory; * RMSD change during simulation; * RMSF profile (root mean square fluctuation vs residue index).

For each plot there is also a csv file containing all data present in pictures.

User can change directory in which output is to be stored with option --work-dir, which can be useful in organization of projects that involve running multiple simulations:

$ for i in 1hpw 1no5 1kl3; do
$ mkdir $i
$ CABSflex -i $i --work-dir ./$i
$ done

4.2 RMSD plot analysis

table of contents

If --reference-pdb PDB option is passed to CABSflex, it will compare the whole pseudotrajectory to given reference structure. If no reference is given input structure is used as reference. Comparison generates: * plot of RMSF (root mean square fluctuation) * total energy vs. RMSD (to reference structure if given, to input structure otherwise).

Sample output:

  • RMSF (root mean square fluctuation) of subsequent target residues (around input position). In case of long target protein, only some reference residues are marked on x axis. Values of RMSF ranges from 0 to 1. Sample path workdir/plots/RMSF_seq.svg. Plain text file containing this data is available in corresponding workdir/RMSF.csv.

RMSF_seq.png

  • Energy vs. RMSD to reference structure. Output of that analysis consist of plot and histogram of RMSDs distribution along pseudotrajectory. Upper plot: energy vs. RMSD plot. Energy is given in CABS units. Sample path to plot and plain text file are, respectively, workdir/plots/E_RMSD.svg and workdir/plots/E_RMSD.csv. Lower histogram: counts of frames with particular RMSD. Bins are at most 1 Å width (less if difference between highest and lowest RMSD is less than 5).

E_RMSD_AB_total.png

  • RMSD to reference vs. pseudotrajectory frame index. For each replica (only one by default) CABS provides history of RMSD changes (to reference PDB, if it was given; to input structure otherwise). Dotted line between points is introduced for clarity of points sequence. Sample path workdir/plots/RMSD_frame_replica_<replica number>.svg and workdir/plots/RMSD_frame_replica_<replica number>.svg for plain text file.

4.3 Handling not identical input and reference models

table of contents

  • Built-in sequence alignment. During calculation of RMSD to reference structure sequential alignment between simulation and reference are created for target protein, as they may slightly differ. Option --align allows user to determine method of sequence alignment to be used. By default CABSflex uses its own implementation of Smith-Waterman algorithm. If package NCBI+ is installed, it is also possible to use protein BLAST. In that case one can set align method to blastp:
    CABSflex -i 1hpw --reference-pdb 1qve:A --align blastp
    
  • Loading alignment from file. If alignment to reference structures is known or when available sequence alignments are not enough to properly align structures -- path to reference alignment can be passed to CABSflex. To do so one needs to set --align argument to CSV to order CABSflex to use aligning method that load external file, and --alignment-options to pass file name as fname=<path>. E.g.:
    CABSflex -i 1hpw --reference-pdb 1qve:A --align CSV --alignment-options fname=external/file.csv
    
    Given file needs to be in CSV format as described by Berbalk et. al. in 2009 (doi: 10.1002/pro.213; alignments returned by CABSflex are in that particular csv format). Sample file is given below:
                             reference   template
                             B:687:H C:687:H
                             B:688:K C:688:K
                             B:689:I C:689:I
                             B:690:L C:690:L
                             B:691:H C:691:H
                             B:692:R C:692:R
                             B:693:L C:693:L
                             B:694:L C:694:L
                             B:695:Q C:695:Q
                             B:696:D C:696:D
    

4.4 Analysis of an already finished simulation

table of contents

It is sometimes necessary to perform additional analysis of the results - for example calculate RMSD to another reference complex or produce contact maps with slightly changed cut-off. To perform this kind of analysis, remember run your original job with --save-cabs-files and --save-config-file option, e.g.:

$ CABSflex -i 1hpw --save-cabs-files --save-config-file

This option will result in storing three additional files: TRAF and SEQ in native CABS format and a CABSflex config file.

To re-run the default analysis of your job use the following command using -c, --config and --load-cabs-files options:

$ CABSflex -c SAVED_CONFIG_FILE --load-cabs-files PATH_TO_TRAF_FILE PATH_TO_SEQ_FILE

You can use this syntax to specify any additional analysis option (your command line options will overwrite any options specified in the CONFIG file).

4.5 Visualization

table of contents

Output can be easily visualized with PyMOL. E.g. movie from section 1.2 :

Video 1

can be prepared using PyMOL morph and mroll utilty.

To generate exactly the effect from above movie use cartoon_side_chain_helper setting set to 1, sticks and cartoon representation and backbone atoms (N, CA, O, C) colored differently than other atoms.

PyMOL can be easily use to get movies showing top models as in server version of CABSflex (See examples). To do so use this script:

from pymol import cmd

for i in range(10):
    cmd.load('output_pdbs/model_%i.pdb' % i, 'my_prot', state=i + 1)    # assuming it runs in working directory

cmd.mset('1-10')    # 10 frames corresponding to models
for i in range(8):
    cmd.madd('1-10')    # repeating those 10 frames 8 times

cmd.util.mroll(1, 90)   # rolling the protein for 90 frames (ca. 5 seconds; 24 frames = 1 sec).

cmd.copy('surf', 'my_prot')
cmd.hide()
cmd.show('cartoon', 'my_prot')
cmd.show('surface', 'surf')
cmd.set('transparency', .3)

cmd.util.chainbow('my_prot')
cmd.color('gray', 'surf')   # coloring surface gray and chainbowing cartoon representation

Updated