Wiki

Clone wiki

computing / getting-started

This is a guide to scaling your code (using bigger compute power). At the current time that means using our BigR server, the College of Ag FARM compute cluster, or improving your efficiency and use of your current computer.

#Software

There are a few tools (software) and steps you'll need in order to utilize any of the compute resources.

  1. SSH software - To connect to remote machines with a shell and to generate a secure SSH key.
    • Linux, Mac, Windows 10 - you should already have this, just open a Terminal (aka Command Prompt)
    • Windows 7 - you will need Putty and Puttygen - we suggest you use putty-*-installer.msi, look for the text "A Windows MSI installer package for everything except PuTTYtel" in the binaries section.
  2. SFTP software (Optional) - Makes bulk transfering files easier
    • Filezilla, however any SFTP program will work.

#Create Accounts

  1. Generate an SSH key
  2. Farm - A traditional compute cluster for doing many analysis in parallel.
  3. Biogeo - A flexible computing cloud for analysis, websites, databases, and other server needs.
    • contact Robert or Alex for setup, see Biogeo for more details.

#Getting Work Done

BigR

BigR is more like your regular computer, just bigger and easier to leave long running jobs on. It's very similar to using your own computer except the code you write might need minor adjustments to file and directory paths. For more details see the BigR page.

Paths If you used a hard coded full path like c:/someplace/code.r that won't work on the servers. Instead you need to change to relative paths or full paths that point to your user's home directory.

/home/you/someplace/code.R OR ~/somplace/code.R
~ is a shortcut, which can be nice as other people can also run the same code under their user as long has they have the same folder structure.

Using BigR

  1. Create your account (by contacting Alex).
  2. Start by opening a web browser and connecting to https://bigr.biogeo.ucdavis.edu
  3. Login with your campus kerberos/cas username and password (we authenticate directly against the campus servers and never store your password)
  4. Use the interface just like a local Rstudio

Farm

FARM is a computer cluster. It consists of many identical computers, and a job management tool to queue and send jobs to machines as they are needed and available. We currently have access to 32 computers, each with 24 GB of RAM, and 8 core CPUs.

Learning to use FARM relies on one main princple, learning to code for parallel processing. While not required to use farm, it's the key to effectively using the resources.

Basic Usage

  1. Create an account as described above.
  2. Login to FARM with ssh
  3. Upload code with SFTP or by cloning code from Bitbucket
  4. Test running 1 job (fix any bugs if it breaks)
  5. Queue 1-10,000 jobs at once.

More details

See the Getting Started on Farm page for a full example of how to take local code and convert it to code that will run in parallel on farm.

Useful tricks using ssh

Since you login to bigr or farm often, it is useful to create an alias for that. Assuming you have your public key stored in ~/.ssh, in a text editor type the following and save it with name 'config' in ~/.ssh

Host bigr
    HostName  bigr.biogeo.ucdavis.edu
    User yourusername
    IdentityFile  ~/.ssh/id_rsa.pub
    IdentitiesOnly yes
    Compression yes
    TCPKeepAlive no
    ServerAliveInterval 180
    ServerAliveCountMax 10

Host farm
    HostName  farm.cse.ucdavis.edu
    User yourusername
    IdentityFile  ~/.ssh/id_rsa.pub
    IdentitiesOnly yes
    Compression yes
    TCPKeepAlive no
    ServerAliveInterval 180
    ServerAliveCountMax 10
Copy your key to bigr so that you don't need to type it each time
ssh-copy-id bigr
Now you just type ssh bigr or ssh farm to start.

Updated