jfinkels avatar jfinkels committed 7140605

added previous version of README, with information about how to run Matt

Comments (0)

Files changed (2)

 original author sometimes used a value of -1 as a sentinel value to denote an
 unknown or uninitialized value, or something similar.
+Running Matt, and much, much more
+For more information about compiling, running, and effectively using Matt, see
+the README.old file contained in this directory, which is a README file from an
+older version of the code.
+Matt is a multiple protein structure alignment program. It uses local geometry
+to align segments of two sets of proteins, allowing limited bends in the
+backbones between the segments.  If you use Matt, please cite: M. Menke,
+B. Berger, L. Cowen, "Matt: Local Flexibility Aids Protein Multiple Structure
+Alignment", 2007, preprint.
+Matt is licensed under the GNU public license version 2.0. If you would like to
+license Matt in an enviroment where the GNU public license is unacceptable
+(such as inclusion in a non-GPL software package) comercial Matt licensing is
+available through the MIT and Tufts offices of Technology Transfer. Contact
+betawrap@csail.mit.edu or cowen@cs.tufts.edu for more information. Contact
+mmenke@mit.edu for issues involving the code itself.
+To compile under Linux, simply type "make". Note that the makefile will build
+Matt without OpenMP support.  To build it with OpenMP support, gcc 4.2 or hight
+is required.  Just add the "-fopenmp" switch to the command.  Matt has not yet
+been tested with OpenMP under Linux.
+Microsoft Visual Studio 6.0 and 2005 project files are included.  To compile
+with either one, just open up the corresponding project file and compile.  By
+default, the Visual Studio 2005 project is set to compile with OpenMP enabled.
+The Express Edition does not support OpenMP, so will return an error message.
+The option to disable it is under Project Properties > C/C++ > Language.
+To install, simply copy the binary to the directory you want Matt to run from
+and type in the command to run it.  Matt needs no environment variables and
+does not need to be in the active directory to run properly.
+Matt takes a set of pdb files as input.  Individual chains can optionally be
+specified for individual source files.  Source files can be compressed in gzip
+or compress file formats.  The ".Z" and ".gz" extensions can optionally be left
+off the file name.
+Up to eight files will be created.  Their names are <outprefix>.<extension> and
+<outprefix>_bent.<extension>, where extension is fasta, txt, pdb, and spt.
+<outprefix> is specified as a command line option.  In all files, proteins are
+listed in the input order, except for the assembly order section of the txt
+files.  By default, the "bent" files will not be created.
+The fasta format contains the alignment in fasta format, using periods to
+indicate unoccupied positions.  Only residues in the common core (i.e. are
+aligned across all input structures) are currently aligned.  This will be
+changed in a future version.
+The txt is a visual alignment of the sequences of the three structures.  It
+also includes the assembly order, RMSD, number of core residues, raw score,
+p-value (For pairwise alignments), and the reference structure.  The reference
+structure is the one that is untransformed in the final pass of the algorithm.
+It also is not transformed in the output pdb files.  Other than this and the
+fact that it's one of the two structures used to calculate rotation angles in
+the final alignment, the reference structure has no special significance.
+The pdb files contain 3D atomic coordinates of the structural alignment.  The
+spt files are Rasmol/Jmol scripts that highlight aligned regions.  To run the
+scripts with Jmol, just open the PDB files and drag the script to the Jmol
+window.  With Rasmol, open the pdb file, and type "script <filename>.spt".  The
+core residues from each structure will be set to a different color.  The colors
+repeat after 10 structures.
+Note that the pdb and txt files will have insertion codes in them if they were
+present in the pdb file.  To get rid of the codes, use the "-r" option.
+The bent files contain the results generated before the final pass, which align
+the unbent structures and fills in gaps.  The bent pdb files are the output
+structures generated by the first phase of the algorithm.  The source
+structures may be broken apart between different fragments.  The RMSD in the
+text file is the RMSD of the aligment of deformed structures, so should not be
+compared directly to the RMSD of other structural alignment algorithms.  The -d
+switch enables creating the bent files.
+Matt uses the OpenMP multithreading extensions when built with OpenMP enabled
+with a compliant compiler, such as gcc 4.2 and retail versions of Microsoft
+Visual Studio 2005.  Note that for compatibility reasons, the included makefile
+will not create a binary with OpenMP support.
+Each thread works on aligning a different pair of structures, so there's only a
+benefit from this when running multiple alignments.  Unless otherwise
+specified, Matt will use OpenMP's default number of threads, which is generally
+one thread per CPU per core.  More information on that command line option (-t)
+is below.  When run on long proteins, particularly those with a lot of
+self-similarity, each thread can use a fairly large amount of memory.  If Matt
+takes up too much memory when run on a particular set of structures, running
+slowly or crashing as a result, try reducing the number of threads.
+Matt will report the number of threads that are created, not the number of
+threads that are active.  Therefore, when running a pairwise alignment on a
+multi-core system, it may report multiple threads, even though only one is
+doing any work.
+Running Matt with no parameters will display version and usage information.  If
+built with multithreading support, the version number will be followed by
+Matt -o outprefix [-c cutoff] [-t threads] [-[rlsVd][01]]*
+     [file[:chain[,chain]*]]* [-L listfile]*
+Command line notes:
+For options that don't take a space before their parameter (r, l, s, V, and d),
+giving the option with no parameter is equivalent to specifying a parameter of
+1.  Also, the order of parameters is irrelevant.  "-s", which affects how pdb
+files will be read, affects both pdb files before and after the -s option.  You
+can also combine multiple options with a single hyphen, so the following two
+lines are equivalent:
+Matt 1plu.pdb 1tsp.pdb -r1 -s -d0 -c 4.0 -o alignment
+Matt -rsd0c 4.0 -o alignment 1plu.pdb 1tsp.pdb
+Mandatory command line parameters:
+-o outprefix:  Specifies prefix of output file names.
+[file[:chain[,chain]*]]*: Specifies the files and chain names to load.  Each
+chain within a single file should only be listed once.  If no chains are
+specified, and the file has named chains, all named chains are loaded.  If the
+file does not have named chains, the single unnamed chain is specified.  Commas
+are required when more than one chain is specified. A colon followed by no
+chain names or two commas in a row indicate the chain with no name.  Chain
+names are case sensitive.
+-L listfile: Specifies a file containing a list of pdb files to load.  Each
+line of the file must specify a file.  Individual chains can optionally be
+specified using the same syntax as above.  Leading and trailing white space is
+ignored.  Blank lines are allowed.
+Optional command line parameters:
+-b[01]: Disables or enables creation of bent files.  Disabled by default.
+-c <cutoff>: Sets the distance cutoff value, in Angstroms, for the final pass,
+which fills in some of the gaps.  Cutoff can be any non-negative floating point
+number.  This does not affect any of the bent files.  The default value is 5.0
+angstroms, which is what was used in the paper.  A value of 0 prevents the last
+pass from running at all.  Note that there should be a space after the c.
+P-values are calculated before the final extension pass, so the cutoff does no
+affect reported p-values.
+-r[01]: Disables or enables renumbering of all residues in all proteins.  Each
+protein will start from residue 1 and all residues will be numbered
+consecutively.  Insertion codes will be removed.  Note that all loaded residues
+will be given a number, so if SEQRES entries are loaded or some residues have
+no alpha carbons, the first residue used in the alignment may not be residue 1.
+Disabled by default.
+-l[01]: Disables or enables renaming chains in the output pdb files.  When
+enabled, chains are first labeled by capital letters, then numbers, then
+symbols, then lowercase letters, and then the pattern repeats when there are
+over 90 chains.  The limit is due to the fact that chains in a pdb file can
+only have a single character label.  Chains are numbered according to the order
+they're specified in the command line.  Enabled by default.
+-s[01]: Disables or enables reading SEQRES lines in source pdb files.  When
+enabled, the program tries to align residues in the ATOM entries to residues in
+the SEQRES entries.  This allows detecting gaps between residues that would
+otherwise be assumed to be adjacent.  Fragments cannot cross over regions with
+no alpha carbon coordinates.  Note that residues with ATOM entries but no alpha
+carbons coordinate will always be loaded.  -s also affects residue renumbering
+if -r is set.  Enabled by default.
+-V[01]: Sets verbosity of feedback to stdout.  A value of 0 will only display
+errors and warnings, and 1 will display a list of chains as they are loaded.
+Default value is 1.
+-d[01]: Disables or enables sending current progress to stderr.  Enabled by
+-t <thread count>: Sets the number of threads Matt uses.  If not specified,
+Matt will use OpenMP's default number of threads, which is implementation
+dependent, though it is generally the number of threads a system is capable of
+running synchronously.  When not compiled with OpenMP support, a warning will
+be displayed and the option will be ignored. Note that there should be a space
+between t and the number of threads.
+Matt will list residues with no alpha-carbon coordinates in its sequence
+alignments, but will not align them.  The recommended way to unambiguously
+figure out which atom entries in the alignment files corresponds to which entry
+in the created pdb files is to enable renumbering, keeping in mind Matt
+includes HETATOM entries with alpha carbons in the alignment.
+Matt currently makes no effort to align residues not in the common core in the
+sequence alignments it produces.  This will be changed in a future version.
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.