Wiki
Clone wikibnpy-dev / HowToUseBrownCSGrid
FAQ
Please read our FAQ Page page to answer any questions you have. If you have a question that isn't answered, please add it to the list!
Also, you can checkout the department's guide here: http://cs.brown.edu/about/system/services/hpc/gridengine/ That page is a bit more general than this, and lacks lots of Python-specific advice. Advanced users should read both carefully.
Setup
Python at Brown CS
The technical staff maintains a default system installation of Python 2.7, but this is rarely updated and is not configured for ideal numerical performance. Instead, I (user mhughes
) recommend using a custom install that I maintain via contrib. This install is (as of Dec 2016) based on Anaconda. So it uses smart MKL libraries for fast matrix computations and makes installing additional packages via pip very simple.
To use bnpy effectively, you'll want to make this version of python the default one. That is, the one you get just by typing python
or ipython
at the prompt. This is done in just one step:
1) Edit your PATH
environment variable to include the following absolute path.
export PATH="/contrib/projects/anaconda-python/miniconda2/bin/:$PATH"
To verify this, you can type which python
. You should see stdout display the /contrib/projects/anaconda-python/miniconda2/bin/...
file path.
Using custom Python packages with the Brown CS installation
Suppose you need some package that is not available out-of-the-box with the install in /contrib/.
The best thing to do is simply:
- 0) Make sure your PATH is set as above
- 1a) First, try to install with "conda install <package-name>"
- 1b) If the above fails, try to use pip (but make sure to use all the options below to install in the right place)
pip install <package-name> --global-option="--no-user-cfg" --no-cache-dir -v -v -v
This will automatically install the named package inside /contrib/projects/anaconda-python/miniconda2.
Permissions for anybody in the "liv" group should be set AUTOMAGICALLY! So anybody else in the group can now also use that package.
First-timers: HelloWorld with the Grid
Here is a self-contained python script we can run on the grid. This script tracks how long it takes to multiply matrices of size NxD for various D, and prints the results.
#!/contrib/projects/anaconda-python/miniconda2/bin/python # # ^^^ hashbang line above guarantees the conda executable of python is used # when the script is run as an executable at the terminal import numpy as np import time if __name__ == '__main__': print '<<<<<<<<<<<<<<<<< This is HelloGrid.py' N = 10000 nTrial = 5 for D in xrange(10, 101, 10): # 5, 10, ... 100 # Generate random N x D matrix X = np.random.randn(N, D) starttime = time.time() for trial in xrange(nTrial): Y = np.dot(X.T, X) elapsedtime = time.time() - starttime print '%5d %8.3f sec' % (D, elapsedtime/nTrial)
We can submit this script from the command line to the grid with one command.
qsub -cwd -o stdout.txt -e stderr.txt HelloGrid.py
qsub
takes lots of optional arguments. Here, we just highlight three of the most important ones.
-cwd
set grid machine's current directory to match the current dir of the terminal where you launched the job from-o path/to/some/file
sets up where to save the stdout log from your script.-e path/to/another/file
sets up where to save the stderr log from your script.
If everything went well, after a few seconds the job should be done and you can look at the file stdout.txt (in your current directory) to see the output.
That's it! You've just run a Python script on the grid.
Basic Everyday Use
Here are some tips for using the grid frequently
Monitoring jobs with qstat
Want to know about all the active jobs you're running? Just type this:
qstat
6818111 0.50006 gridSearch sghosh r 11/11/2014 10:19:26 long.q@dblade40.cs.brown.edu 4 1 6818111 0.50006 gridSearch sghosh r 11/11/2014 10:19:26 long.q@mblade1201.cs.brown.edu 4 2 6817985 0.50006 gridSearch sghosh r 11/11/2014 10:18:57 long.q@dblade13.cs.brown.edu 4 1 6817985 0.50006 gridSearch sghosh r 11/11/2014 10:18:57 long.q@dblade14.cs.brown.edu 4 2 6817985 0.50006 gridSearch sghosh r 11/11/2014 10:18:57 long.q@dblade27.cs.brown.edu 4 3
- First column: unique job id.
This is the number you need to track to ask the grid about a current job or to request killing an active job
- Final column: task id.
For bnpy, we often want to run multiple initializations of an experiment under the same conditions (model, data, algorithm). The grid makes it easy to do that, with an extra optional variable called "task id". Here, we can see 2 tasks for job 6818111 and 3 tasks for 6817985.
- 5th column: job status.
'r' means running. This is good. Other options are not good: 'Eqw' means there was an error loading your job, while 'dr' means the job died while running.
Deleting your jobs
Delete job with specified job-id.
$ qdel job-id
Delete all jobs for username
$ qdel -u username
Submitting jobs with qsub
Here are some helpful options for qsub.
-t 3
: specify a task number for the job. This will run task 3.-t 1-4
: specify a range of tasks for the job. This will run tasks 1, 2, 3, and 4-l hour/day/inf
: specify the run length of the job. Default is one hour. Using inf is discouraged but sometimes unavoidable, so please monitor carefully.-l vf=2G
: limit this job to run on a grid machine with 2GB of RAM. Similarly, you can do3G
for 3GB, etc.-v ENVAR
: set the specified environment variable on the grid machine to the value from the calling machine
To specify multiple options that require the -l flag
(like a queue type and a memory requirement), include the -l
flag each time, like so
qsub -l hour -l vf=3G HelloGrid.py
Specifying which machine
Brown CS has many different machines available. Here is a short list of the most "modern" machines on the complete list
#!matlab -q '*@@name' | # machines | # cores / machine | RAM per machine =============================================================== liv | 7 | 64 | 128 GB or 256GB mblade12 | 20 | 32 | 64 GB blade09 | 16 | 16 | 24 GB dblade | 70 | 4 | 8 GB
To request a machine from the mblade12 category, supply this option to qsub
-q '*@@mblade12
LIV group permissions
Only students with advanced liv group privileges can request the machines in the liv
category.
Most introductory students do not have such privileges yet. However, there are plenty of other fine machines on the grid, so this is not too much of a disadvantage. The mblade12 machines have basically the same specs, so this will only be a disadvantage if you really need much more that 4-8 GB per run.
Alternative way to specify qsub options: as comments in the script
If you find you use some options all the time, it can be a pain to type them out each time you run qsub
. Instead, you can bake them into your script header, as follows
#!/contrib/projects/anaconda-python/miniconda2/bin/python # ------ set working directory #$ -cwd # ------ set required RAM to 2GB #$ -l vf=2G # ------ attach environment variables #$ -v HOME -v PATH -v PYTHONPATH -v OMP_NUM_THREADS < main script text goes here>
Making a script grid-ready
Any script that can be run from your terminal can run on the grid. This includes python scripts and bash scripts.
Always make sure to do chmod 755
to ensure the script is executable. Also, include a shebang so the program used to execute the script is clear.
Here's the recommended shebang for python files.
#!/contrib/projects/anaconda-python/miniconda2/bin/python
Using the grid with bnpy
RunBNPYonGrid.py
Download Python source file: RunBNPYonGrid.py
I highly recommend using this wrapper script to run any bnpy jobs on the grid. It works the same as Run.py (as any wrapper script should), but includes a few extra preprocessing steps to make your life on the grid a lot easier. Specifically, it does things like
- configure the task id variable within bnpy to match the task id from the grid.
- set up log files for stdout and stderr that are saved within the same directory that Run.py outputs all its logs and saved files to
- fixes weird default behavior for Python on the grid where log files are not written frequently, sometimes not until a run is completely finished (see below)
Basic usage
Consider the following basic bnpy call
python -m bnpy.Run AsteriskK8 MixModel Gauss VB --nLap 10 --K 25
Here's how we would execute the same call via RunBNPYonGrid.py` in the same terminal
./RunBNPYonGrid.py AsteriskK8 MixModel Gauss VB --nLap 10 --K 25
And here's how we'd execute the same call via qsub, to run on a grid machine.
qsub -v BNPYOUTDIR -v PYTHONPATH -cwd -o stdout -e stderr -t 1 -l hour \ RunBNPYonGrid.py AsteriskK8 MixModel Gauss VB --nLap 10 --K 25
To run 10 independent tasks, we can just replace -t 1
with -t 1-10
in the above line. These all run in parallel on separate grid machines. Awesome!
Log files and Python
Frequently, we wish to run a script for a long time, and monitor progress as we go. Unfortunately, the default behavior of file writing on the Brown filesystem is weird, and doesn't save buffered output to disk frequently enough to be useful. There are some fixes in RunBNPYonGrid.py that take care of this.
Using jobname intelligently
Remember that the bnpy keyword pair --jobname <myname>
controls the name of the folder where all bnpy output for this run is saved. Make sure to set this as an option when submitting jobs to the grid, so that all your results will be stored in the proper place.
Updated