galaxy / galaxy-central
Main development repository for Galaxy. Active development happens here, and this repository is thus intended for those working on Galaxy development. See http://bitbucket.org/galaxy/galaxy-dist/ for a more stable repository intended for end-users.
$ hg clone http://bitbucket.org/galaxy/galaxy-central
Galaxy on the cloud
With sporadic availability of data, individuals and labs may have a need to, over a period of time, process greatly variable amounts of data. Such variability in data volume imposes variable requirements on availability of compute resources used to process given data. Rather than having to purchase and maintain desired compute resources or having to wait a long time for data processing jobs to complete, the Galaxy Team has enabled Galaxy to be instantiated on cloud computing infrastructures, including Amazon Elastic Compute Cloud (EC2) and Eucalyptus. An instance of Galaxy on the cloud behaves just like a local instance of Galaxy except that it offers the benefits of cloud computing resource availability and pay-as-you-go resource ownership model. Having simple access to Galaxy on the cloud enables as many instances of Galaxy to be acquired and started as is needed to process given data. Once the need subsides, those instances can be released as simply as they were acquired. With such paradigm, one pays only for the resources they need and use while all the other concerns and costs are eliminated.
Slides with talks about Galaxy Cloud (including screenshots) can be found here.
Instantiating Galaxy instance
For the purposes of executing Galaxy on the cloud, we have packaged Galaxy and Galaxy-required tools as a virtual machine (VM) image that resides with Amazon (referred to as an AMI - Amazon Machine Image). This VM acts as a complete unit that can be easily instantiated offering the same functionality as any other instance of Galaxy. The following steps will guide you to starting your own Galaxy Cloud (GC) cluster:
- Create an Amazon Web Services (AWS) account and sign up for Elastic Compute Cloud (EC2) and Simple Storage Service (S3) services
- Use AWS Management Console to start an EC2 instance
- Use GC web interface on started EC2 instance to start a desired number of worker instances
- Enjoy a personal instance of Galaxy on the cloud
Detailed steps
(for even more detailed steps, see this page)
Step 1 (one time only):
- Because AWS services implement pay-as-you-go access model for compute resources, it is necessary for every user of the service to register with Amazon. Once registered, the user is assigned an AWS access key and accompanying secret key.
- Once your account has been approved by Amazon (note that this may take up to one business day), log into AWS Management Console (http:/aws.amazon.com/ -> Sign in to the AWS Management Console)
- Create a Key Pair by clicking 'Key Pairs -> Create Key Pair' (near bottom of menu on left). Save created key pair to your local machine. Created key pair is used to access started instances from command line (usefully only in case of errors).
- Create a Security Group by clicking 'Security Groups -> Create Security Group'. Specify a name (e.g., galaxyWeb) and provide a brief description. Then, add HTTP and SSH rules (by selecting those from the drop-down menu). Lastly, add a 'Custom' rule with following parameters: Protocol=TCP, From Port=42284, To Port=42284, Source=0.0.0.0/0 This rule opens a port on the remote instance allowing access to the cloud controller web interface.
Step 2 (required every time a cloud instance of Galaxy is desired):
- Choose an AMI. On AWS EC2 Dashboard, click 'Launch Instances'->'Community AMIs' and then search for 'galaxy'. Galaxy AMI's are described by a manifest (i.e., name) in the following format: 115971652512/galaxy-<DATE_RELEASED> (e.g., 115971652512/galaxy-2010-04-20). When choosing Galaxy AMI, use the latest released one unless you are reinstating a previously created cluster. In that case, use the same AMI as you have before.
- Specify 1 instance, choose availability zone, and instance type. Remember which availability zone you chose because every time you instantiate the same cluster, you must select the same availability zone! See this page for instance capabilities and pricing.
- Supply user data. User data specifies a desired name of the cloud cluster and provides GC with the user account information. The user data must follow the following format '<CLUSTER_NAME>|<AWS_ACCESS_KEY>|<AWS_SECRET_KEY>|<Desired Galaxy Cloud Password>' (e.g., galaxy_cloud_cluster|NUEINXSAXCA|NCEW/2OJCCCXLDS|SomePwd). This information can be obtained from your AWS account under Security Credentials, as shown in following images (http:/aws.amazon.com/ -> Account -> Security Credentials and then copy values from Access Key ID and Secret Access Key):
Extra information about user data: Because nothing is stopping a given user from simultaneously starting multiple clusters on the cloud, cluster name is needed by GC (and the user) to identify given cluster. In addition, GC needs user account information because it will need to create persistent data storage volumes as well as start user-specified number of additional cloud instances. Lastly, because anyone could start/stop instances on your behalf if they know your instance's URL, you should specify a password that will allow only you to log into the Galaxy Cloud Web Console.
- Choose the key pair you created during the initial setup.
- Choose 'default' and 'galaxyWeb' security groups (select multiple security groups by holding 'Command' or 'Ctrl' buttons and clicking desired groups)
- 'Launch' the instance and wait (less than 5 minutes on average) for the instance and GC to boot.
- Check the status of the instance by clicking on 'Instances' on main AWS console (you may have to refresh the page manually to get current instance status). Note that once the instance status shows 'running', it will take a an extra minute or two for GC to start. Given instance (and thus GC) is accessible by copying instance's Public DNS (available from AWS Management Console under instance details) into a web browser. NOTE: to access GC Web Console, you must specify 'cloud' subdomain to the instance's public DNS, for example: http:/ec2-184-73-10-5.compute-1.amazonaws.com/cloud
Step 3:
- Once available, use the GC web interface to start additional worker instances (use the password you provided as part of user data to log in (leave 'User' field empty)). Once the GC Web Console is visible, you can see the status of individual services needed to run Galaxy on the Cloud. In order to actually start Galaxy and be able to run jobs, at least one worker instance is needed (note that this instance is in addition to the master instance where GC Web Interface runs). To start worker node(s), click the 'Start Cluster' button and specify amount of persistent data storage you would like to associate with given cluster and a number of worker instances to start. The amount of storage requested should be based on expected cluster usage and is provided only the first time a cluster is instantiated.
- As the cluster configures itself and worker instances boot, relevant log messages are displayed in the Cluster Log console. Once at least one worker instance is ready, Galaxy will started and a link 'Access Galaxy' will appear toward the top of the page. Click it to access the new instance of Galaxy.
Step 4:
- Use Galaxy as you normally would. Once the need for a cluster diminishes, visit the GC web page again and power off the cluster. Once instances have terminated, through the AWS console, terminate the master instance manually. Also, do a sanity check and make sure all other instances associated with the given cluster have been terminated.
- Next time the same cluster is needed, follow these steps and start a new instance of the cluster. Note that all data uploaded to the cloud and analyzed through Galaxy will be preserved even though the cluster is not running. In order to access this data, an instance of the cluster must be running.
Galaxy AMI's
Latest AMI:
- AMI: ami-ed03ed84
- Manifest: 115971652512/galaxy-2010-04-20_2
Notes
Amazon EC2 service is a pay-as-you-go service where all that is need to use it is a valid credit card. Rates for Amazon EC2 can be found here.
Instantiating Galaxy on the cloud is a brand new feature being developed by the Galaxy team so code base is under rapid development cycle and continuous changes. As a result, functionality may suddenly break or not operate as intended.
This revision is from 2010-09-03 01:36


