Wiki

Summary: Tips for using GKW on IFERC Helios.

(Much content COPIED directly from an internal wiki)

Official resources

You must connect from an IP address associated with your account on creation
Login via SSH once the account has been created:
```
> ssh -o ServerAliveInterval=300 -X user@helios.iferc-csc.org
```
(the option ServerAliveInterval is (was?) needed to avoid freezing terminals)

User Environment Settings

Add the following lines to your ~/.bashrc

# If not running interactively, don't do anything (avoids problems for non login shells)
[-z "$PS1" ] && return

## Needed to compile GKW
module load intel bullxmpi fftw/3.3/default

## Needed for GKW scripts
export GKW_HOME=${HOME}/gkw
export PATH=${PATH}:${GKW_HOME}/scripts
export EMAIL_ADDRESS=something # for job notifications
export HOST=`hostname`
export HOSTNAME=`hostname`
export IFERC_PROJECT=RESIDUAL #This determines which project gkwnlin uses
#Set to either RESIDUAL or TURBISLE, or RESIDUAL-0 or TURBISLE-0 for low priority (free) queue

## The rest is just for taste ##

# nicer prompt
export PS1="\u@\h:\w>"

# Use $SCRATCHDIR for running GKW
echo WORKDIR= $WORKDIR        "Lustre L1: Mounted without flock"
echo SCRATCHDIR= $SCRATCHDIR  "Lustre L1: Mounted with flock (use for MPI-IO)"
echo PROJECT= $PROJECT        "Lustre L2: Storage shared with others in your project"

# Resource reporting
echo
echo "Project resource usage (node-hours):"
alias gresource='gresource -o PROJECT,Days,Passed,Total,NA,Used,Loss,NLoss'
gresource
echo
uresource
#To see top users since certain US format date (slow)
#sreport user TopUsage TopCount=50 Group Start=073112 | nl -v -5

# Set your local time zone to correctly display file dates
TZ='Europe/Berlin'; export TZ

# don't use ls color option - very slow on large dirs
# standard(ish) ls abbreviations
alias ll='ls -l'
alias la='ls -la'
alias l='ls -alF'
alias ls-l='ls -l'
alias lt="ls -lat"
alias lr="ls -lat | head"

# If you are used to pbs
alias qsta='squeue'
alias qstat='squeue -o "%.7i %.9P %.8j %.8u %.2t %.10M %.10L %.6D %.10r %.3p" -S "Mp"'
alias qdel='scancel'
alias qsub='sbatch'

Error: mpirun not found

The module setup is not very robust (ticket #1384). It can go wrong, for instance with module purge not behaving as expected. The module settings seem to propagate into the job scripts. If the incorrect settings are found during a job script, one can receive an message such as 'mpirun not found':

Problem: Job error: mpirun not found or mpiexec not found
Diagnosis: Add module list to your batch script and see what is loaded.
Possible solution 1: Before submitting jobs, logout once from the system, re-login to the system and submit the jobs without changing the module environment
Possible solution 2: Add the following to your batch script to reset the module system before any module load commands:
```
. /etc/profile.d/00-modules.sh
module ()
{
> > eval `/csc/softs/cscst/modules-3.2.10/bin/modulecmd bash $*`
}
```
Robust solution 3: Do not use any module commands in your job script. Instead, use the full path the the mpirun you need, and explicitly set the LD_LIBRARY_PATH variable to the value it has when the modules you need are correctly loaded. gkwnlin now does this.

CPU quota

alias gresource='gresource -o PROJECT,Days,Passed,Total,NA,Used,Loss,NLoss'

Available CPU time is allocated daily from the total quota for the year. The maximum available at any one time is (or seems to be) three months worth. Further info here

gresource reports number of node hours available. NLoss is the number of days until CPU time will be lost (if nothing is run). More info on the fields is available with man gresource.

uresource reports total node-hour usage per user on your project.

If you want to run a low priority job which is uncharged, the project name should have a "-0" appended to it (maximum 12 hours).

export IFERC_PROJECT=RESIDUAL-0
#or
export IFERC_PROJECT=TURBISLE-0

File transfer from IFERC-CSC

The speed of the transfer can depends very much on the machine you are connect from, so try a few options, possibly due to the size of the receive buffers (and not the connection speed)

Using rsync:

rsync -avuWPz user@helios.iferc-csc.org:~/dir_to_transfer ./local

Much faster for many small files (does not open a new connection each time)
Faster for big binary files too (6Mb/s in my tests).
The z flag uses compression, which may or may not help, depending on your data type, but is good for our ascii files.
Using scp:
```
scp user@helios.iferc-csc.org from to;
```
Large binary files 1.5Mb/s in my tests
Many small files are very slow (best to tar them first)

To copy a selected subset of the gkw project folders using rsync, I use a script like the one below to pull files to my local machine. It can be called repeatedly to update the sync with only the newer files. While a run is still running, it will pull the fluxes timetraces (because they are temporarily softlinked by gkwnlin to where they will end up in the final folder.)

#!/bin/bash
# The script is called gpull_helios and I call it like this:
#> cd ~/runs/gkw
#> gpull_helios project_to_sync
# Do not add a trailing /

#Function for starting an ssh-agent and ssh-add.  (2 hours) Also gets settings across shells.
#SSH_ENV="$HOME/.ssh/environment"

#function start_agent {
#     echo "Initialising new SSH agent..."
#     /usr/bin/ssh-agent -t 7200 | sed 's/^echo/#echo/' > "${SSH_ENV}"
#     echo succeeded
#     chmod 600 "${SSH_ENV}"
#     . "${SSH_ENV}" > /dev/null
#     /usr/bin/ssh-add -t 7200;
#}

# You can use the above function if you have an SSH key setup on Helios, to remember your login for 2 hours.  Otherwise you will have to type the password every time.  If you don't know how to setup an SSH key login it is probably easier not to bother, but there are many explanations on google

#start_agent

until [[-z "$1" ]]; do
> echo '**'
> echo Syncing $1 to `pwd`
> echo
> rsync -avuLzPW --exclude='other' --exclude='distr**' --exclude='hamada' --exclude='runs' --exclude='restart' --exclude='DMP' --exclude='FDS' --exclude='job_out' use@helios.iferc-csc.org:~/scratch/$1 ./
> echo '**'
> shift
done

Just a note: If you use have echo statements in your .bashrc file which print to screen then both rsync and scp will fail as they do not expect screen output. To overcome this any screen outputs can be suppressed within an if statement such as:

if ["$SSH_TTY" ]
then
> !ECHO STATEMENTS
fi

which still outputs any wanted information when logging into Helios. Alternatively, as in the above example, the first line of bashrc ensures it does nothing when it is not in an interactive shell.

Job Control via the SLURM resource manager

sbatch jobscript ### Submit jobscript
squeue -u user   ### Show status of all (running) jobs for specified user
scontrol show jobs     ### Show details for all jobs
scancel jobid          ### Cancel a job
sinfo                  ### View information about SLURM nodes and partitions

More docs on SLURM

The queues are currently a bit inflexible. Your job may get rejected if you do not ask for 2^n cores, or a round number of hours or minutes (try various).

Chaining jobs to keep restarting

Edit your input file to have read_file=.true.
Submit using gkwnlin -f .... filename (Say the job number is 168673)
cd ../runs
sbatch --dependency=afterok:168673 filename.pbs (Say it gives job no 168683)
sbatch --dependency=afterok:168683 filename.pbs (Say it gives job no 168692)
sbatch --dependency=afterok:168692 filename.pbs
etc....
Run the filename.mv script only when all the jobs have finished: bash ./filename.mv

Optimization Experiences

Preliminary... YMMV

Intel Fortran Compiler

Some quick experiments with GKW showed NO performance gains using the ifort optimisation flags -O3 or -axCORE-AVX2 (which turns on all special instruction sets for intel processors). There might be more benefit from selecting only a subset of SSE instruction optimizations, but this has not yet been investigated (see man ifort for all the options). The best GKW performance was obtained with -O2 -ip -ipo -no-prec-div.

ORB5 finds a significant slowdown with -O3 as compared to -O2.

MPI libraries

gkwnlin and GKW compilation are setup for bullxmpi by default.

Tests with GKW found bullxmpi to be better than intelmpi when communicating derived MPI datatypes in the ghost cells. However, intelmpi seems to do better for large MPI allreduce operations, even after the OMPI environment variables have been tweaked for bullxmpi using export OMPI_MCA_coll="^ghc". Therefore the optimal MPI library actually depends on the problem. For small jobs (< 1024) cores, the differences are not really noticeable. For very large jobs, you are advised to test both on a small number of time-steps (and since results can be quite variable, take a median result of 3).

Note that intelmpi compiles with mpiifort, whilst bullxmpi compiles with mpif90. To use intelmpi, one should do the following:

module swap bullxmpi intelmpi
make clean
make FC=mpiifort  # or softlink the intelmpi.mk file to your username.mk
export USE_IMPI=1 # for gkwnlin settings