NightShift: NMR shift inference by General Hybrid model Training 
		 --- A pipeline for automated NMR shift prediction training ---
		Anna Dehof
		July 2011

 - BALL 1.5.0-alpha or later (http://www.ball-project.org or http://bitbucket.org/ball/ball)
 - ClustalW  (http://www.clustal.org/clustal2/)
 - PISCES    (optional - http://dunbrack.fccc.edu/Guoli/pisces_download.php)
 - ShiftX2   (optional - http://www.shiftx2.ca/download.html) 
 - sqlite3 development packages (http://www.sqlite.org/download.html)
 - python 
 - R with packages:
   - RSQLite
	 - outliers
	 - foreach
	 - MASS
	 - randomForest
	 - e1071

**** Installation ****

1.) Install BALL
NightShift relies on a recent version of BALL (1.5.0-alpha or greater). Download it either from
your operating system distribution, or from http://bitbucket.org/ball/ball. 
Guidelines on how to install BALL are provided on http://bitbucket.org/ball/ball as well.

2.) Install further prerequisites as listed above. For installation of the R packages, you
you can use the provided script "installRPackages.sh" in the scripts directory like this:

Rscript scripts/installRPackages.sh

When downloading PISCES, please remember that you also need to download and extract BLASTDB.tar.gz.
For more information, please refer to the PISCES homepage. If you choose not to install PISCES,
homology restriction will be disabled. This is NOT recommended.

SQLite development packages can either be installed from the operating system distribution (e.g.,
package libsqlite3-dev on ubuntu), or downloaded from http://www.sqlite.org/download.html.

3.) Configure NightShift C++ source
To configure NightShift for compilation, create a build directory (e.g., with mkdir build) and change into it
(e.g., with cd build). The configuration process uses CMake, and can be customized in several different ways
to adapt to your local environment. In the simplest case, i.e., if you have installed every dependency in standard
paths and want to use the default settings of NightShift, it suffices to use


i.e., if you installed BALL in /home/ball, you'd type

cmake .. -DBALL_INSTALL_DIRECTORY=/home/ball

To adapt NightShift to your local environment, you will most probably have to set a number of additional options,
though. Each option can be set using


at the call of cmake. NightShift's options are described in the following:


If you compiled BALL using its provided contrib package, NightShift will need
to know about its location to help it find some third-party libraries. Use the
same path you set during the CMake-run used to configure BALL. 


This describes the root directory into which NightShift will be installed. The default is set by cmake. 
On Unix-like systems, this is typically /usr/local


The directory where NightShift will store its output. Please note that this directory will grow to several Gb,
and will contain log files, data bases, downloaded structures and NMR data, and statistical models.
The default is set to



Full path to the Pisces-executable used for homology restriction. If Pisces cannot be found,
homology restriction will be disabled.


Full path to the ShiftX2-installation directory (the directory containing shiftx2.py). If ShiftX2 
cannot be found, comparison of shift predictions against ShiftX2 will be disabled.


Directory, where ShiftX2 results will be stored. Please note that this directory can become quite large.
Default is set to



If the sip-module could not be found during configuration of NightShift,
you can set this option to help it along the way. Typically, sip should
either be installed in a standard location and detected automatically, or
should be contained in BALL's contrib package and covered by the 


Full path to the clustalw installation directory (the directory containing clustalw2).
Since alignments are an integral part of the assignment of shifts to PDB atoms, 
clustalw is a required dependency.


cmake .. -DBALL_INSTALL_DIRECTORY=/home/ball -DCMAKE_INSTALL_PREFIX=/home/nightshift -DNIGHTSHIFT_OUTPUT_DIRECTORY=/local/nightshift/data -DBALL_CONTRIB_PATH=/home/ball_contrib

Now, you are ready to build the NightShift binaries with


and to install them with

make install

4.) Run NightShift

The simplest way to run NightShift consists in executing the provided NightShift.sh script, which has 
been installed into $CMAKE_INSTALL_PREFIX/bin in the last step. Several options guide the execution
which can be set either in the NightShift.sh script itself, or through the command line in the
form --option=value. The recognized options are described in the following:


Set to true, if all entries in the database (structures, shifts, features, and predictions) should be 
cleaned before the pipeline run. Default: false.


Set to true to download current data from PDB and BMRB. Files that have already been downloaded in a 
previous run are cached and will not be downloaded again. Default: true.


Set to true to create a pdb to bmrb mapping (table PDB_BMRB). Setting it to false allows to save some 
time if a mapping already exists from a previous run, or to use a user-defined mapping. Default: true.


Set to true to use a homology filter during dataset construction (directly while downloading the data or afterwards).
This requires the Pisces binary and will have no effect if Pisces could not be found during configuration. 
Default: true.


Set to true to compute the features used for predictor creation, to create ATOM_PROPS table,
and to fill it with the features. Can be set to false
to save time if features have already been computed in a previous run. Default: true


Set to true to remove all data from the ATOM_PROPS table to recompute all properties. Default: false


Set to true to run ShiftX2 and add values to the data set for comparison of the models trained by NightShift.
This requires the ShiftX2 binary and will have no effect if ShiftX2 could not be found during configuration.
Default: true


Set to true if reference corrected versions should be downloaded from the RefDB. Default: false.


Set to true to use refdb data as input for training and evaluation. Requires --download_refdb_data=true. Default: false.


Set to true to train and evaluate a random forest model. Default: true.


Set to true to train a second random forest model based on the significant features of the last run.
Default: false.


Set to true to train and evaluate a linear model. Default: false


Seperate data into test- and training data based on the pdbid. Otherwise,
shifts are separated independently of their origin. Default: false

Example: To download data from all sources, including the refdb, compute all features,
perform homology restriction, train and evaluate a random forest
model based on the refdb, and compare the results to those of ShiftX2, use

NightShift.sh --download_refdb_data=true --train_from_refdb=true

(if NightShift.sh was not found, use the full path you set as the installation directory;
i.e., if you set CMAKE_INSTALL_PREFIX=/home/nightshift/install, then try
/home/nightshift/install/bin/NightShift.sh --... instead).

Be prepared to wait for a while... Also, you should expect this to generate several Gb of data... 
The results can then be found in the datadir folder you set during configuration. The analysis of the 
final models can be found in the corresponding log files.

Should you encounter problems at this point, try removing the CMakeCache.txt - file contained
in your build directory and running cmake with the correct options again.