Wiki
Clone wikicomputing / getting-started
This is a guide to scaling your code (using bigger compute power). At the current time that means using our BigR server, the College of Ag FARM compute cluster, or improving your efficiency and use of your current computer.
#Software
There are a few tools (software) and steps you'll need in order to utilize any of the compute resources.
- SSH software - To connect to remote machines with a shell and to generate a secure SSH key.
- Linux, Mac, Windows 10 - you should already have this, just open a Terminal (aka Command Prompt)
- Windows 7 - you will need Putty and Puttygen - we suggest you use putty-*-installer.msi, look for the text "A Windows MSI installer package for everything except PuTTYtel" in the binaries section.
- SFTP software (Optional) - Makes bulk transfering files easier
- Filezilla, however any SFTP program will work.
#Create Accounts
- Generate an SSH key
- Farm - A traditional compute cluster for doing many analysis in parallel.
- Biogeo - A flexible computing cloud for analysis, websites, databases, and other server needs.
- contact Robert or Alex for setup, see Biogeo for more details.
#Getting Work Done
BigR
BigR is more like your regular computer, just bigger and easier to leave long running jobs on. It's very similar to using your own computer except the code you write might need minor adjustments to file and directory paths. For more details see the BigR page.
Paths If you used a hard coded full path like c:/someplace/code.r that won't work on the servers. Instead you need to change to relative paths or full paths that point to your user's home directory.
/home/you/someplace/code.R OR ~/somplace/code.R
Using BigR
- Create your account (by contacting Alex).
- Start by opening a web browser and connecting to https://bigr.biogeo.ucdavis.edu
- Login with your campus kerberos/cas username and password (we authenticate directly against the campus servers and never store your password)
- Use the interface just like a local Rstudio
Farm
FARM is a computer cluster. It consists of many identical computers, and a job management tool to queue and send jobs to machines as they are needed and available. We currently have access to 32 computers, each with 24 GB of RAM, and 8 core CPUs.
Learning to use FARM relies on one main princple, learning to code for parallel processing. While not required to use farm, it's the key to effectively using the resources.
Basic Usage
- Create an account as described above.
- Login to FARM with ssh
- Upload code with SFTP or by cloning code from Bitbucket
- Test running 1 job (fix any bugs if it breaks)
- Queue 1-10,000 jobs at once.
More details
See the Getting Started on Farm page for a full example of how to take local code and convert it to code that will run in parallel on farm.
Useful tricks using ssh
Since you login to bigr or farm often, it is useful to create an alias for that. Assuming you have your public key stored in ~/.ssh, in a text editor type the following and save it with name 'config' in ~/.ssh
Host bigr HostName bigr.biogeo.ucdavis.edu User yourusername IdentityFile ~/.ssh/id_rsa.pub IdentitiesOnly yes Compression yes TCPKeepAlive no ServerAliveInterval 180 ServerAliveCountMax 10 Host farm HostName farm.cse.ucdavis.edu User yourusername IdentityFile ~/.ssh/id_rsa.pub IdentitiesOnly yes Compression yes TCPKeepAlive no ServerAliveInterval 180 ServerAliveCountMax 10
ssh-copy-id bigr
ssh bigr
or ssh farm
to start.
Updated