Wiki

Clone wiki

biobakery / biobakery_basic

bioBakery: Basic Usage

The information that follows describes how to install and run bioBakery (locally or in Google Cloud).



1. Installation

bioBakery can be installed on Linux, Mac, and Windows operating systems. We recommend installing on a machine with at least 12 GB of RAM and 16 GB of available disk space.

1.1. Prerequisites

Before you install bioBakery, please install the following software:

The current bioBakery was built with Vagrant 1.8.1 and VirtualBox 5.0.16_RPMFusionr105871. It is compatible with VirualBox versions <= v5.1 but not with the latest VirtualBox major release (v5.2 2017-10-18).

1.2. Install bioBakery

  1. Download and unpack bioBakery
  2. Run bioBakery by double clicking (or executing from the command line) the file for your operating system
    1. Linux and MacOS: start_biobakery.command
      1. When installing on Linux, if the double click opens the file in a text editor, go to "Files -> Preferences -> Behavior". Change the selection for "Executable Text Files" to "Ask each time". Then double click on the file and select "Run in Terminal" from the prompt.
    2. Windows: start_biobakery.bat

Please note the first time you start bioBakery it will download and extract the box. This might take about 30 minutes (depending on your download rate) as the box is around 7.5 GB. When you start bioBakery in the future, it should take much less time (~30 seconds).

bioBakery will automatically be installed at $HOME/biobakery ( %HOMEPATH%\biobakery\ on Windows ).

1.3. Install bioBakery dependencies

Some dependencies of the bioBakery tool suite require licenses. Therefore, they must be installed manually.

  1. Install USEARCH
    1. Request a USEARCH Linux License
    2. Install USEARCH (replace $USEARCH_URL with url in the license email)
      1. $ sudo wget -O /usr/local/bin/usearch $USEARCH_URL
      2. $ sudo chmod +x /usr/local/bin/usearch

2. Tool Suite

bioBakery currently includes the following tools in three categories.

Composition Analysis

  1. HUMAnN2
  2. MelonnPan *
  3. MetaPhlAn2
  4. PanPhlAn
  5. PhyloPhlAn **
  6. PICRUSt
  7. PPANINI
  8. ShortBRED
  9. StrainPhlAn

Statistical Analysis

  1. Arepa **
  2. CCREPE *
  3. BAnOCC *
  4. HAllA
  5. LEfSe
  6. MaAsLin
  7. MicroPITA
  8. SparseDOSSA

Infrastructure and Utilities

  1. AnADAMA2
  2. bioBakery workflows
  3. BreadCrumbs
  4. GraPhlAn
  5. Hclust2
  6. KneadData

Most bioBakery tools (18 of 23) are installed with Linuxbrew in the Linuxbrew default location at $HOME/.linuxbrew on the Vagrantbox and at /usr/local/ on the cloud images. Pure R packages (*) are installed with R in the default folder of /usr/local/. Tools that require users run from the source folder (**) are installed at $HOME and are only included in the Vagrantbox (since it has a persistent user of vagrant).

Run $ brew list --versions for a list of all installed Linuxbrew tools and their version numbers.


3. Shared folders

By default, the bioBakery /vagrant folder becomes the shared folder between the bioBakery VM (guest OS) and your local machine (host OS). The shared folder on your local machine will be located at $HOME/biobakery. Any files that you place in this folder will automatically be accessible from the /vagrant folder in your bioBakery VM.

For more information on shared folders, see the advanced configuration section on shared folders.


4. Demo analysis

Once you have installed bioBakery the tool suite is ready for use on your datasets. Please refer to the following demo for an example of how to perform a GraPhlAn visualization analysis.

  1. Start bioBakery (all steps that follow should be performed in the bioBakery environment)
  2. Open a terminal, by clicking on the Terminal icon on the left pane.
terminal icon.png
  1. Run the GraPhlAn demo (all GraPhlAn commands run will be printed to the screen)
    1. $ biobakery_demos --tool graphlan --mode run --output ~/Desktop/vagrant/
    2. This should create output files. For your reference the generated cladogram is shown below.
https://bitbucket.org/biobakery/biobakery/raw/tip/demos/biobakery_demos/data/graphlan/output/merged_abundance.png

For more information about how to use the other tools, please refer to their documentation or the associated tutorials.


5. Upgrade bioBakery

We continue to improve the bioBakery experience for our users, which involves adding new tools and updating existing tools. If you wish to upgrade bioBakery to the latest version, please follow the instructions below.

  1. Uninstall the current bioBakery
  2. Install the latest version of bioBakery

If you would like to manually upgrade individual tools, please refer to the section on Installing individual bioBakery tools .


6. Uninstall bioBakery

bioBakery can be uninstalled by removing the bioBakery Vagrant and Virtualbox files. Before uninstalling, please move any data in the shared vagrant folder you would like to keep. For more information about shared folders, please refer to the advanced configuration section on shared folders.

  1. Remove all bioBakery Vagrant files (the top level folder will contain the bioBakery Vagrantfile)
    1. This folder is $HOME/biobakery ( %HOMEPATH%\biobakery\ on Windows )
  2. Remove all bioBakery Virtualbox files
    1. Open Virtualbox
    2. Right click on the bioBakery VM
    3. Select "Remove"
    4. Then select "Delete all files"

7. bioBakery in Google Cloud

A public bioBakery Google Cloud image is available. To get started running bioBakery in Google Cloud, create a Google Cloud Account or login to an existing account then follow these instructions. As of August 2017, running this instance for 8 hours is estimated to cost about $1.00. For current costs, please refer to the Google Cloud Cost Calculator.

  1. Create your own bioBakery image from the public image
    1. From the left-hand menu select Compute Engine -> Images
    2. From the Images page, select Create Image
    3. On the Create an Image page, select the following and then click Create:
      1. Name: image-biobakery
      2. Source: Cloud Storage file
      3. Cloud Storage file: biobakery_bucket/biobakery_image_v1.6.tar.gz
  2. Create your own bioBakery instance
    1. From the left-hand menu select Compute Engine -> VM Instances
    2. From the VM Instances page, select Create Instance
    3. On the Create an Instance page, select the following and then click Create:
      1. Machine type: n1-standard-2
        1. This machine has 2 CPUs and 7.5 GB of RAM. The machine selected is the minimal configuration we suggest.
      2. Boot disk -> Your image: image-biobakery (this is the image created in the previous step)

7.1 SSH access

  1. When your instance is ready, ssh to the instance by clicking on the SSH selection
    1. A terminal window will appear containing a command prompt.
    2. If there are any issues connecting to your instance with SSH, please refer to the Google Cloud documentation about SSH access .
  2. In the SSH prompt, run the commands in the instructions to install bioBakery dependencies that require licences.

7.2 Desktop access

To access the bioBakery desktop, first connect with SSH and then follow these instructions.

  1. Set up desktop access to your instance
    1. Start the vncserver
      1. From the SSH prompt for your instance run: $ vncserver
      2. Remember the password you selected as it will be used to login to the machine.
    2. Add a new firewall rule to allow desktop access
      1. Go to Networking -> Firewall rules -> Create rule, select the following and click Create:
        1. Source Filter: Allow from any source
        2. Allowed protocols and ports: tcp:5901
      2. This will only need to be done once as the rule will apply to all current and future instances.
  2. Use a VNCViewer to login to the bioBakery desktop
    1. Install a VNCViewer ( we suggest RealVNC Viewer )
    2. Start the VNCViewer and login to bioBakery with the following settings
      1. Address: Instance External IP Address : 5901
        1. Replace Instance External IP Address with the external IP address of your instance.
        2. The external IP address of your instance is listed to the left of the SSH button.
      2. Password: This is the password you selected when you started the vncserver

Please stop your bioBakery instance when it is not in use. As of June 2016, if an instance is not in use you will only be charged for the disk space and not the instance.

Refer to the following Google Cloud documentation about how to access files from your cloud instance with your file browser .


8. bioBakery in Amazon EC2

A public bioBakery Amazon Machine Image (AMI) is available. To get started running bioBakery in Amazon EC2, create a Amazon Web Services Account or login to an existing account then follow these instructions. As of August 2017, running this instance for 8 hours is estimated to cost about $0.80. For current costs, please refer to the Amazon EC2 On Demand Pricing.

Create a bioBakery Amazon EC2 instance (from region east1) by selecting "Launch Instance" from Services -> EC2 -> Instances. From the "Community AMIs" list search for "biobakery" to find the public bioBakery AMI. Select this AMI and then at minimum select t2.large (2 CPUs and 8GB memory; the min required to run the demos). Then click "Review and Launch" to start the instance (by creating a new key pair or using an existing key pair).

8.1 SSH access

You can ssh directly to the instance using its IP address, the private key (you selected when starting the instance), and the username ubuntu. Refer to the Amazon EC2 instructions on How to connect to your instance with SSH.

Depending on the bioBakery tools you will be running, you might need to install dependencies with licenses. In the SSH prompt, run the commands in the instructions to install bioBakery dependencies that require licences.

8.2 Desktop access

You can connect to your instance through a desktop by following the same instructions as those for connecting to a Google Cloud instance through the desktop. The only difference is for AWS you would update the firewall rules through the "Security Groups". For detailed information refer to the Amazon EC2 instructions on Authorizing Inbound Traffic for Your Linux Instances.

Updated