Wiki

Clone wiki

bnpy-dev / HowToUseBrownCSGrid

FAQ

Please read our FAQ Page page to answer any questions you have. If you have a question that isn't answered, please add it to the list!

Also, you can checkout the department's guide here: http://cs.brown.edu/about/system/services/hpc/gridengine/ That page is a bit more general than this, and lacks lots of Python-specific advice. Advanced users should read both carefully.

Setup

Python at Brown CS

The technical staff maintains a default system installation of Python 2.7, but this is rarely updated and is not configured for ideal numerical performance. Instead, I (user mhughes) recommend using a custom install that I maintain via contrib. This install is (as of Dec 2016) based on Anaconda. So it uses smart MKL libraries for fast matrix computations and makes installing additional packages via pip very simple.

To use bnpy effectively, you'll want to make this version of python the default one. That is, the one you get just by typing python or ipython at the prompt. This is done in just one step:

1) Edit your PATH environment variable to include the following absolute path.

export PATH="/contrib/projects/anaconda-python/miniconda2/bin/:$PATH"

To verify this, you can type which python. You should see stdout display the /contrib/projects/anaconda-python/miniconda2/bin/... file path.

Using custom Python packages with the Brown CS installation

Suppose you need some package that is not available out-of-the-box with the install in /contrib/.

The best thing to do is simply:

  • 0) Make sure your PATH is set as above
  • 1a) First, try to install with "conda install <package-name>"
  • 1b) If the above fails, try to use pip (but make sure to use all the options below to install in the right place)
pip install <package-name> --global-option="--no-user-cfg" --no-cache-dir -v -v -v

This will automatically install the named package inside /contrib/projects/anaconda-python/miniconda2.

Permissions for anybody in the "liv" group should be set AUTOMAGICALLY! So anybody else in the group can now also use that package.

First-timers: HelloWorld with the Grid

Here is a self-contained python script we can run on the grid. This script tracks how long it takes to multiply matrices of size NxD for various D, and prints the results.

#!/contrib/projects/anaconda-python/miniconda2/bin/python
#
# ^^^ hashbang line above guarantees the conda executable of python is used
# when the script is run as an executable at the terminal

import numpy as np
import time

if __name__ == '__main__':
  print '<<<<<<<<<<<<<<<<< This is HelloGrid.py'
  N = 10000
  nTrial = 5
  for D in xrange(10, 101, 10): # 5, 10, ... 100
    # Generate random N x D matrix
    X = np.random.randn(N, D)
    starttime = time.time()
    for trial in xrange(nTrial):
      Y = np.dot(X.T, X)
    elapsedtime = time.time() - starttime
    print '%5d %8.3f sec' % (D, elapsedtime/nTrial)
You can save this file as HelloGrid.py. To run this as a script on the grid, you need to make sure the file is executable. For more info, see http://linuxcommand.org/wss0010.php

We can submit this script from the command line to the grid with one command.

qsub -cwd -o stdout.txt -e stderr.txt HelloGrid.py

qsub takes lots of optional arguments. Here, we just highlight three of the most important ones.

  • -cwd set grid machine's current directory to match the current dir of the terminal where you launched the job from
  • -o path/to/some/file sets up where to save the stdout log from your script.
  • -e path/to/another/file sets up where to save the stderr log from your script.

If everything went well, after a few seconds the job should be done and you can look at the file stdout.txt (in your current directory) to see the output.

That's it! You've just run a Python script on the grid.

Basic Everyday Use

Here are some tips for using the grid frequently

Monitoring jobs with qstat

Want to know about all the active jobs you're running? Just type this:

qstat
You'll get a nice print out like this
6818111 0.50006 gridSearch sghosh       r     11/11/2014 10:19:26 long.q@dblade40.cs.brown.edu       4 1
6818111 0.50006 gridSearch sghosh       r     11/11/2014 10:19:26 long.q@mblade1201.cs.brown.edu     4 2
6817985 0.50006 gridSearch sghosh       r     11/11/2014 10:18:57 long.q@dblade13.cs.brown.edu       4 1
6817985 0.50006 gridSearch sghosh       r     11/11/2014 10:18:57 long.q@dblade14.cs.brown.edu       4 2
6817985 0.50006 gridSearch sghosh       r     11/11/2014 10:18:57 long.q@dblade27.cs.brown.edu       4 3
The most important columns here are these

  • First column: unique job id.

This is the number you need to track to ask the grid about a current job or to request killing an active job

  • Final column: task id.

For bnpy, we often want to run multiple initializations of an experiment under the same conditions (model, data, algorithm). The grid makes it easy to do that, with an extra optional variable called "task id". Here, we can see 2 tasks for job 6818111 and 3 tasks for 6817985.

  • 5th column: job status.

'r' means running. This is good. Other options are not good: 'Eqw' means there was an error loading your job, while 'dr' means the job died while running.

Deleting your jobs

Delete job with specified job-id.

$ qdel job-id

Delete all jobs for username

$ qdel -u username

Submitting jobs with qsub

Here are some helpful options for qsub.

  • -t 3 : specify a task number for the job. This will run task 3.
  • -t 1-4: specify a range of tasks for the job. This will run tasks 1, 2, 3, and 4
  • -l hour/day/inf: specify the run length of the job. Default is one hour. Using inf is discouraged but sometimes unavoidable, so please monitor carefully.
  • -l vf=2G : limit this job to run on a grid machine with 2GB of RAM. Similarly, you can do 3G for 3GB, etc.
  • -v ENVAR : set the specified environment variable on the grid machine to the value from the calling machine

To specify multiple options that require the -l flag (like a queue type and a memory requirement), include the -l flag each time, like so

qsub -l hour -l vf=3G HelloGrid.py

Specifying which machine

Brown CS has many different machines available. Here is a short list of the most "modern" machines on the complete list

#!matlab
-q '*@@name' | # machines | # cores / machine | RAM per machine
=============================================================== 
liv          |  7         | 64                | 128 GB or 256GB
mblade12     | 20         | 32                |  64 GB          
blade09      | 16         | 16                |  24 GB
dblade       | 70         |  4                |   8 GB
You unfortunately cannot request a specific machine, like mblade1205. However, you can request cores from one of the categories above (liv, mblade12, blade09, dblade).

To request a machine from the mblade12 category, supply this option to qsub

  • -q '*@@mblade12

LIV group permissions

Only students with advanced liv group privileges can request the machines in the liv category.

Most introductory students do not have such privileges yet. However, there are plenty of other fine machines on the grid, so this is not too much of a disadvantage. The mblade12 machines have basically the same specs, so this will only be a disadvantage if you really need much more that 4-8 GB per run.

Alternative way to specify qsub options: as comments in the script

If you find you use some options all the time, it can be a pain to type them out each time you run qsub. Instead, you can bake them into your script header, as follows

#!/contrib/projects/anaconda-python/miniconda2/bin/python
# ------ set working directory
#$ -cwd 
# ------ set required RAM to 2GB
#$ -l vf=2G
# ------ attach environment variables
#$ -v HOME -v PATH -v PYTHONPATH -v OMP_NUM_THREADS
< main script text goes here>

Making a script grid-ready

Any script that can be run from your terminal can run on the grid. This includes python scripts and bash scripts.

Always make sure to do chmod 755 to ensure the script is executable. Also, include a shebang so the program used to execute the script is clear.

Here's the recommended shebang for python files.

#!/contrib/projects/anaconda-python/miniconda2/bin/python

Using the grid with bnpy

RunBNPYonGrid.py

Download Python source file: RunBNPYonGrid.py

I highly recommend using this wrapper script to run any bnpy jobs on the grid. It works the same as Run.py (as any wrapper script should), but includes a few extra preprocessing steps to make your life on the grid a lot easier. Specifically, it does things like

  • configure the task id variable within bnpy to match the task id from the grid.
  • set up log files for stdout and stderr that are saved within the same directory that Run.py outputs all its logs and saved files to
  • fixes weird default behavior for Python on the grid where log files are not written frequently, sometimes not until a run is completely finished (see below)

Basic usage

Consider the following basic bnpy call

python -m bnpy.Run AsteriskK8 MixModel Gauss VB --nLap 10 --K 25

Here's how we would execute the same call via RunBNPYonGrid.py` in the same terminal

./RunBNPYonGrid.py AsteriskK8 MixModel Gauss VB --nLap 10 --K 25

And here's how we'd execute the same call via qsub, to run on a grid machine.

qsub -v BNPYOUTDIR -v PYTHONPATH -cwd -o stdout -e stderr -t 1 -l hour \
         RunBNPYonGrid.py AsteriskK8 MixModel Gauss VB --nLap 10 --K 25
This call makes sure that the values of key environment variables are passed along.

To run 10 independent tasks, we can just replace -t 1 with -t 1-10 in the above line. These all run in parallel on separate grid machines. Awesome!

Log files and Python

Frequently, we wish to run a script for a long time, and monitor progress as we go. Unfortunately, the default behavior of file writing on the Brown filesystem is weird, and doesn't save buffered output to disk frequently enough to be useful. There are some fixes in RunBNPYonGrid.py that take care of this.

Using jobname intelligently

Remember that the bnpy keyword pair --jobname <myname> controls the name of the folder where all bnpy output for this run is saved. Make sure to set this as an option when submitting jobs to the grid, so that all your results will be stored in the proper place.

Updated