Wiki

Clone wiki

BAM-matcher / Installation

Step-by-step guide for installing BAM-matcher

This installation guide has been tested on Ubuntu 14.04 LTS. However, they should work for most Linux environments and Mac OS X (with some modifications).


Installing requirements and dependencies

Requirements

1. Python 2.7

BAM-matcher was written in Python 2.7, which should be available by default in most Linux distributions.

You can check the version of Python available on your system by running:

python --version

If you don't have Python available, please see online resources for instructions on how to get and install Python for your system.

2. git

To install BAM-matcher using git, you will also need to have git available on your system.

For Ubuntu/Debian-based distributions, you can install git by:

apt-get install git (as root)
or
sudo apt-get install git

3. Python libraries

BAM-matcher also requires some extra Python libraries:

  • PyVCF
  • ConfigParser
  • Cheetah
  • fisher

The simplest method to install these is via pip.

If pip is not available on your machine, you can install it by:

sudo apt-get install python-pip
(for Ubuntu/Debian-based distributions).

Then you can install the extra packages by:

# PyVCF, for parsing VCF files
sudo pip install PyVCF

# ConfigParser, for parsing config files
sudo pip install ConfigParser

# Cheetah, for writing/reading templates
sudo pip install Cheetah

# pysam, for reading BAM file header
sudo apt-get install python-dev zlib1g-dev
sudo pip install pysam

# fisher, Fisher's exact test
# sudo apt-get install python-dev    ## required to install numpy, but should be installed already for pysam
sudo pip install numpy               ## required for fisher
sudo pip install fisher

For non-Ubuntu systems, you may need to find the equivalent packages for python-dev and zlib1g-dev.

Once these dependencies have been installed, you are ready to install BAM-matcher.

4. Third-party variant/genotype callers

BAM-matcher requires third-party germline variant callers to calculate genotype data. Currently it supports:

Please see the caller documentation on installation.

You will only need one, however having alternative methods is useful.

GATK is a very popular variant caller, and is probably the fastest (BAM-matcher uses the UnifiedGenotyper for genotyping) out of the three options. However, it is most fastidious about the input BAM files. For example, GATK will not accept BAM files that don't have readgroup (RG) information.

Freebayes is a little bit slower, but is more tolerant of the input BAM files.

VarScan is the slowest method, as it requires generating pileup data (using SAMtools). However, it is probably the most robust.

VERY BRIEF INSTALLATION GUIDE FOR THE VARIANT CALLERS

  1. GATK: just need to download the GATK .jar file. But Java is required, and specific GATK version runs on specific version of the Java VM.
  2. Freebayes: you will likely need to compile from source
  3. VarScan: just need to download the .jar file. But also requires Java and SAMtools http://samtools.sourceforge.net/.

SAMtools can also be installed (in Ubuntu) by:

sudo apt-get install samtools


Installing BAM-matcher

1) Navigate to the directory where you wish to install BAM-matcher:

cd /path/to/install/

2) Clone the BAM-matcher repository

git clone https://bitbucket.org/sacgf/bam-matcher.git

And now BAM-matcher will be installed into /path/to/install/bam-matcher/

3) [Optional] If you want BAM-matcher to be executable from anywhere on your machine, you'll need to add the BAM-matcher directory path to your PATH variable.

In Ubuntu, edit your ~/.bashrc file by adding the following line:

export PATH=$PATH:/path/to/install/bam-matcher/

Although the file to modify and the specific command to use may vary depending on your system. If you are not sure, you should check your distribution's documentation to see how to do this.

You should also make sure that the bam-matcher.py is executable:

chmod a+x bam-matcher.py

Once this is done, you should be able to run bam-matcher.py from anywhere on your system after you have restarted a new terminal session. Or you can just type bash or reload the .bashrc file (. ~/.bashrc) in the same session.


Quick check of your installation

To quickly check BAM-matcher was installed correctly, run:

bam-matcher.py --help

This should bring up the help message:

$ ./bam-matcher.py -h
usage: bam-matcher.py [-h] --bam1 BAM1 --bam2 BAM2 [--config CONFIG]
                      [--generate-config GENERATE_CONFIG] [--output OUTPUT]
                      [--short-output] [--html] [--no-report]
                      [--scratch-dir SCRATCH_DIR] [--vcf VCF]
                      [--caller {gatk,freebayes,varscan}]
                      [--dp-threshold DP_THRESHOLD]
                      [--number_of_snps NUMBER_OF_SNPS] [--fastfreebayes]
                      [--gatk-mem-gb GATK_MEM_GB] [--gatk-nt GATK_NT]
                      [--varscan-mem-gb VARSCAN_MEM_GB]
                      [--reference REFERENCE] [--ref-alternate REF_ALTERNATE]
                      [--chromosome-map CHROMOSOME_MAP]
                      [--about-alternate-ref ABOUT_ALTERNATE_REF]
                      [--do-not-cache] [--recalculate] [--cache-dir CACHE_DIR]
                      [--debug] [--verbose]

Compare two BAM files to see if they are from the same samples, using
frequently occuring SNPs reported in the 1000Genome database

(and so on...)

Not quite there yet...

However, to run a test with example data (and to use BAM-matcher in general), you will need to setup the configuration file. See this page for detailed steps on setting up the configuration file.

Updated