Getting Started on Farm

This page walks you through the process of using Farm with the example code contained in the Computing repository. For more details see the Farm page.

Outline

Create & Test Code locally (on your computer)
Create Cluster Version of Code
Upload Data
Upload Code
Run Code
- How to setup a job
- How to setup a batch job
- Keep tabs on the work
- Get notified when done
Retrieve results

Example, Step 1 Creating and Testing code locally

1. Get a copy of the code

git clone https://bitbucket.org/hijmans-lab/computing.git
#git clone git@bitbucket.org:hijmans-lab/computing.git

2. Generate sample data

In the code there is a script to generate some fake data for testing purposes. It's 10 text files of random numbers 0-100.

Run the following script once. Note, if you're not running this from command-line you will need to set the working directory

Rscript generateexample.R

3. Test a simple script

Once you have the sample data created you can create a local script to test working with 1 file at a time.

The single-test.R script calculates summary statistics and plots a histogram. 1 argument is taken, which is the path to the file you want to process.

Rscript single-test.R example/1.csv

4. Try adding looping

The single-loop-test.R script calculates summary statistics and plots a histogram for all numbered #.csv files in a directory. 1 argument is taken, which is the path to the directory you want to process.

Rscript single-loop-test.R example

Example, Step 2+ Moving to the cluster

1. Clone the code to the cluster

For Biogeo users we have a deployment key, otherwise login to [FARM] and use git clone as you normally do on other machines.

Upload, using SFTP (e.g.) Filezilla you can upload your code and data to Farm.

Login to Farm

ssh <user>@farm.cse.ucdavis.edu

Clone your Code if you didn't upload it already

git clone git@bitbucket.org:hijmans-lab/computing.git

2. Running code

For sanity start with code that does the bare minimum on the cluster. You can actually test the single scripts from the previous section to get used to calling commands on a Linux machine.

Next using the cluster-slurm-test-simple.R , this script just prints the array task id. sbatch requires you send it a shell script (the .sh). In this case we've provided a generic one that can wrap any R script you want to run on the cluster. Have a peak in clusterR.sh to see how it works.

sbatch --array=1-10 --job-name=test1 --partition=serial --mail-type=ALL --mail-user=<you>@ucdavis.edu /home/<username>/computing/clusterR.sh /home/<username>/computing/cluster-slurm-test-simple.R

The directory you are in when you run this command will automatically be the place where the log files are written (.out). Note that any print commands in R will end up in the .out files, this is great for debugging.

3. Running complicated code

Now that you've gotten a simple script to work on the cluster you can try the same analysis we did earlier but utilize the cluster so they are run in parallel (all at the same time instead of one by one). To do this you'll want the cluster version of the single-test, called cluster-slurm-test.R. You might ask why the single version. Well it turns out that when you run things in parallel, it's the single unit running independently from all other units that is the what makes it work.

Of course to initially test you might want to only run 1 item. To do this simply provide a single number to the --array option

sbatch --array=2 --job-name=test2 --partition=serial --mail-type=ALL --mail-user=<you>@ucdavis.edu /home/<username>/computing/clusterR.sh /home/<username>/computing/cluster-slurm-test.R

In this example, only 2.csv will get processed.

Once you've confirmed that it works as expected you can create an array as long as the number of items you need to process. In our example we have 10 files to process so --array=1-10

sbatch --array=1-10 --job-name=test10 --partition=serial --mail-type=ALL --mail-user=<you>@ucdavis.edu /home/<username>/computing/clusterR.sh /home/<username>/computing/cluster-slurm-test.R

Now you wait a little while (emailing when done doesn't seem to be working)

Want to check on your job

squeue -u $(whoami)

4. Getting Results

When it finishes, you can reconnect your SFTP client (e.g. Filezilla) and download your results. As a curtosy to others please clean up your files after you've confirmed your downloads. See Checksums for tips on how to verify your downloaded all your files correctly. At this point if you are using Hadoop, see the relevant wiki page about how to directly transfer your results in to Hadoop instead of downloading.

Next Steps

There are more helpful tips about using FARM on the FARM page. In particular learning to use the array mechanism and commandline arguments to control what analysis is run will take a little practice.

Wiki

computing / getting-started-farm