Wiki

Clone wiki

BAM-matcher / troubleshooting

Troubleshooting

Installation

If you have problems installing BAM-matcher, please follow the instruction in the installation guide.

Make sure that you have all the dependencies and requirements installed as well.

BAM-matcher was written specifically for Linux, as some parts of it requires standard Linux command tools. It may also work on Mac OS X, however, this has not been tested. If you have successfully installed and used BAM-matcher on OS X, we would be very happy to hear from you. Similarly, if you would like to give it a try, I would be happy to help you, just get in touch with me (paul.wang @ sa.gov.au).


Configuration

BAM-matcher does not work out of the box without manual configuration, this is because there are many system-specific variables and some choices that should be consciously made by the user.

Please follow the configuration guide to setup your configuration file for BAM-matcher.

BAM-matcher should also report informative errors that will help you in determining the nature of most problems you may encounter.

Important things to check:

1) make sure you have a configuration file, even if it is not fully and correctly set up. Use

bam-matcher.py --generate-config [OUTPUT_PATH, optional]

to generate a template of the configuration file.

2) try to specify full paths to files and executables wherever possible. Only cases where this is not necessary are system-recognised commands (e.g. samtools, freebayes and java).

3) make sure genome reference fasta files are indexed by samtools:

samtools faidx genome.fasta

4) make sure BAM files are indexed by samtools:

samtools index sample.bam

5) make sure to specify a valid cache directory (CACHE_DIR) in the configuration file. This directory should be read-/writeable by all potential users of BAM-matcher. Unlike the scratch directory, which BAM-matcher will automatically create if not specified, BAM-matcher requires deliberate user input for cache directory, as the cache directory potentially stores patient/individual data for long term, and thus the user should be aware of the exact location where this is stored.

In contrast, the scratch directories are deleted upon completion of each run, unless specified otherwise by the user (--debug/-d). In which case, BAM-matcher will inform the user where the scratch directory and temporary files are located.


Running BAM-matcher

If you are having problem running BAM-matcher, first check that your installation of BAM-matcher works with the test data (see this page).

Also try enabling debug (--debug/-d) and verbose (--verbose/-v) modes, which provides details about each processing steps.


Common Problems

1. GATK / Java incompatible

BAM-matcher error message:

+-----------------------+
| VARIANT CALLING ERROR |
+-----------------------+
Caller check failed.

Command tested:
java6 -jar /tools/GATK/GenomeAnalysisTK.jar -version

Python error msg:
Command '['java6', '-jar', '/tools/GATK/GenomeAnalysisTK.jar', '-version']' returned non-zero exit status 1

See log file for system error msg:
/tmp/########/caller_check.log

Please check the caller command or path to the binary.

and the caller_check.log looks something like:

Exception in thread "main" java.lang.UnsupportedClassVersionError: org/broadinstitute/gatk/engine/CommandLineGATK : Unsupported major.minor version 51.0
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClassCond(Unknown Source)
    at java.lang.ClassLoader.defineClass(Unknown Source)
    at java.security.SecureClassLoader.defineClass(Unknown Source)
    at java.net.URLClassLoader.defineClass(Unknown Source)
    at java.net.URLClassLoader.access$000(Unknown Source)
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
Could not find the main class: org.broadinstitute.gatk.engine.CommandLineGATK. Program will exit.

This is likely caused by incompatible version of Java being used with GATK.

Please check GATK documentation on which version of Java is required for your version of GATK.

2. Using GATK

If you run BAM-matcher using GATK and see error message:

+-----------------------+
| VARIANT CALLING ERROR |
+-----------------------+
Variant calling failed.

Caller command:
java -jar -Xmx4g -XX:ParallelGCThreads=1 /tools/GATK/GenomeAnalysisTK.jar -T UnifiedGenotyper -R /hg19/genome.fa -I /sampleX.bam -o /tmp/######/bam1.vcf --output_mode EMIT_ALL_SITES -nt 1 -L /tmp/######/bam1.intervals

Check caller log: /tmp/######/caller0.log

You should check the generated caller0.log (or caller1.log, depending on which sample caused the problem).

GATK has more strict requirements about the input files, see the GATK documentation on this issue.

Common GATK issues:

1) Sample does not contain readgroup information (RG) (see GATK guide on this issue).

Example GATK error message:

##### ERROR MESSAGE: SAM/BAM/CRAM file /data/sacgf/others/data/aligned/160127JanKokavec_Nextseq/stRNA/bams/CL_N_stRNA/CL_N_stRNA.star.hg19.bam is malformed: SAM file doesn't have any read groups defined in the header.  The GATK no longer supports SAM files without read groups

2) Mismatching contig order in BAM file and reference.

Example GATK error message:

##### ERROR MESSAGE: Lexicographically sorted human genome sequence detected in reads.
##### ERROR For safety's sake the GATK requires human contigs in karyotypic order: 1, 2, ..., 10, 11, ..., 20, 21, 22, X, Y with M either leading or trailing these contigs.
##### ERROR This is because all distributed GATK resources are sorted in karyotypic order, and your processing will fail when you need to use these files.
##### ERROR You can use the ReorderSam utility to fix this problem: http://gatkforums.broadinstitute.org/discussion/58/companion-utilities-reordersam
##### ERROR   reads contigs = [chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr2, chr20, chr21, chr22, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrM, chrX, chrY]

You can either fix the BAM files as recommended by GATK, or switch to a different caller. Both Freebayes and VarScan should have no problem with these issues.

3. Multi-sample BAM files

Current BAM-matcher does not work well with BAM files containing multiple samples (indicated by multiple read groups).

If your BAM file contains multiple RG IDs, you may see an error message similar to this:

Traceback (most recent call last):
  File "/home/paul/localwork/bam-matcher/bam-matcher.py", line 1380, in <module>
    if is_hom(gt1) and is_hom(gt2):
  File "/home/paul/localwork/bam-matcher/bammatcher_methods.py", line 133, in is_hom
    if gt_[0] == gt_[1]:
IndexError: list index out of range

And even if you don't see an error, BAM-matcher will only compare one of the RG groups in the multi-RG BAM file.

4. Mismatching VCF file and genome reference

The VCF file should contain variant genomic positions compatible to the default genome reference file (REFERENCE in the configuration file, or --reference/-R in run time arguments), NOT the alternate reference file (REF_ALTERNATE).

Example Freebayes error:

unable to find FASTA index entry for '1'

Example GATK error:

##### ERROR MESSAGE: Badly formed genome loc: Contig '1' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?

Example VarScan error (VarScan will fail at the SAMtools mpileup stage):

[E::mpileup] fail to parse region '1:1376567-1376567' with sample.bam

Updated