Wiki

Clone wiki

SWAN / FAQ

Install

Error in unloadNamespace(package) after "preparing package for lazy loading":

This is most likely due to unclean unload from previous R sessions or the newly installed package libraries are not ready, e.g.:

  1. Error in unloadNamespace(package) : namespace ‘Rsamtools’ is imported by ‘GenomicAlignments’, ‘rtracklayer’, ‘BSgenome’ so cannot be unloaded

If will often fix by itself if you quit current R and close the shell session. Then start a new shell and R session to finish the installation from where you left off.

swan.so: "Symbol not found: __ZN5boost13match_resultsIN9..." on OSX/Linux?

Two possibilities for this one. The first one is easy. If you notice there is an -L is empty in the last g++ line of your install log, like in this one:

g++ -m64 -shared -L/usr/lib64/R/lib -Wl,-z,relro -o swan.so libswan.o libswan_sclip.o **-L** -lboost_regex -fopenmp -lgomp /root/R/x86_64-redhat-linux-gnu-library/3.2/swan/usrlib/libbam.a /root/R/x86_64-redhat-linux-gnu-library/3.2/swan/usrlib/libbwa.a -lz -L/usr/lib64/R/lib -lR

It is then simply you didn't set $BOOST_INCLUDE_DIR and $BOOST_LIBRARY_DIR environment variables. Or likewise, if -L is not empty but point to an incorrect place would also cause the same error. If correctly set, this -L should point to $BOOST_LIBRARY_DIR where the libboost_regex[mt].{a,so,dylib} is. There are slides you can refer to for how to set these variables, see doc/SWAN_Installation.ppt .

Another more complex possibility is, this is caused by swan.so and libboost_regex-mt.so/.a/.dylib were linked to different version of libc++, most commonly, libc++.1.dylib of clang++ or libstdc++.6.dylib of g++. The only way to resolve this is to have your boost library compiled with EXACT THE SAME compiler of SWAN. You can check the linked library of swan.so and libboost_regex-mt.so/.a/.dylib:

otool -L src/swan.so
otool -L $BOOST_LIBRARY_DIR/libboost_regex-mt.dylib

Building boost with the same gcc compiler solves the problem

brew install boost --build-from-source --cc=gcc-4.7 --cxx=g++-4.7        # OSX(homebrew)
sudo port upgrade -s --force boost configure.compiler=macports-gcc-4.8   # OSX(macports)

On Linux, try

ldd src/swan.so
ldd $BOOST_LIBRARY_DIR/libboost_regex-mt.so

Fixing this is more involved. You might want to install your own libboost library to link to your current libc++ library and set your own $LD_LIBRARY_PATH. There is no guide from me but advanced g++ user should know howto. Anyway, make sure correct libstdc++ is either in user's or system's $LD_LIBRARY_PATH.

R lazy load: "memory not mapped", "segfault" on OSX/Linux?

This is typically caused by unpristine installation of "Rcpp" and "RcppArmadillo". reinstall them by:

R> install.packages(pkgs=c("Rcpp","RcppArmadillo"),type="source")

R compiling: "not found lgfortran" or "not found lquadmath" on OSX/Linux?

This is typicaly fortran libraries are linked with different gcc versions

On Ubuntu:

sudo dpkg --purge gfortran;
sudo apt-get install gfortran.
sudo apt-get install r-base-dev
sudo ln -s /usr/lib/x86_64-linux-gnu/libgfortran.so.3 /usr/lib/libgfortran.so # as needed, see comments below.

Depending on Ubuntu versions, you might need manually to create the symlink for libgfortran. This is a known issue of Ubuntu as widely discussed in the stack-overflow thread here: http://stackoverflow.com/questions/6302209/building-r-package-getting-error-ld-cannot-find-lgfortran

On OSX:

curl -O http://r.research.att.com/libs/gfortran-4.8.2-darwin13.tar.bz2
sudo tar fvxz gfortran-4.8.2-darwin13.tar.bz2 -C /

see also: http://thecoatlessprofessor.com/programming/rcpp-rcpparmadillo-and-os-x-mavericks-lgfortran-and-lquadmath-error/

General: resolving dependencies to older R<3.1 compatible packages

When install SWAN to R<3.1, you might get dependencies errors like following:

ERROR: dependencies ‘httr’, ‘curl’, ‘rversions’, ‘git2r’ are not available for package ‘devtools’ ERROR: dependency ‘xml2’ is not available for package ‘rversions’

You might be able to resolve these dependencies by manually download and install compatible version of older packages.

General: how to install R dependencies with a non-root R?

To setup non-root R and other dependencies, users need to set an "R_LIBS_USER" environment variable: $ export "R_LIBS_USER=/path/user/lib" and do the same dependency installation process as root R and make sure "/path/user/lib" is in .libPaths() . User may refer to the author's development document here

Usage

1. What is a gap definition file and what is chromosome index format?

A gap definition file is a BED formatted file containing information of big gaps (consecutive Ns) within a specific genome. The file must be supplied to scan.R and be correctly paired with reference sequence used in producing your input BAM file. SWAN need this file to skip the gap region for scanning and maintain a meaningful estimation of global parameters.

A chromosome index format simply refer to the difference in common practice for naming human chromosomes in reference. UCSC uses "chr"+seq_index format while one thousand genome uses just seq_index. Even though most studies are using hg19/v37 now with the same coordinates, unfortunately, their indexing format may vary. The safe way is to check before hand by "samtools view -H your.chr.bam" and see the SQ field there to determine.

You will need the reference genome FASTA file and the gap definition file corresponding to the correct index format in hand before running SWAN. People usually have the reference FASTA file, here, we provide two gap definition files for hg19 genome along with the SWAN source code. You can find the files inside the data subdir when you unzip the SWAN package locally.

For without "chr" prefix index format use data/human_g1k_v37.fasta.gap.txt.gz

For with "chr" prefix index format use data/human_g1k_v37.ucsc.fasta.gap.txt.gz

Here is an easy adapted bash code for paring up your reference and gap definition files.

#!/usr/bin/env bash
echo "usage: run_scan.sh sample_dir chr_format"
echo "generating pbs for real data run in sample_dir{$1}/*.chr*.bam"
echo "chromosome reference is formated as chr_format{$2}+seq_index"
chr_format=$2
if [ -z $2 ]; then # formatted as X, use 1kg
  ref_file="/home/lixia/work/hg/hg19/human_g1k_v37.fasta"
  gap_file="/home/lixia/work/hg/hg19/human_g1k_v37.fasta.gap.txt.gz"
else
  ref_file="/home/lixia/work/hg/hg19/human_g1k_v37.ucsc.fasta"
  gap_file="/home/lixia/work/hg/hg19/human_g1k_v37.ucsc.fasta.gap.txt.gz"
fi

In case you need to generate your own gap definition file. For human genome, one way is to go to UCSC genome browser [[http://genome.ucsc.edu/cgi-bin/hgTables?command=start | http://genome.ucsc.edu/cgi-bin/hgTables?command=start]], pick the version of genome you are using, pick the group "all tables" and pick table "gap" and then click "get the output" to export to .txt.gz file or select the output format to "BED" and export to BED file. These official gap files are also included in the SWAN source code data folder\

2. what does "cannot allocate vector of size" mean?

This seems you've reached the limit of your machine memory when using SWAN. The way now is to do memory adjustment by trial-and-error. The best is to increase your memory allocation to SWAN until you don't have any more. This maintains the speed of SWAN. Then the important option to change is the trunk size argument, which is default to 5M bp. You can reduce it (e.g. -t 1000000) until you don't encounter error anymore. We have tested default with success at 16G pmem and 32G of vmem with Ubuntu clusters for all chromosome size of human at about 50x. Yet your miles may vary. Another option -j may also help.

3. how to get somatic variants between normal and cancer file?

You can use ibed2vcf.R in combination with bedtools to achieve this. Say you have $tumor_bed and $normal_bed from bed2vcf.R, you can do:

intersectBed -v -a $tumor_bed -b $normal_bed >$somatic_bed
ibed2vcf.R -q -c 9,10,11 -t $seqname -r 'IMPRECISE/IMPRECISE;SOMATIC&^/SS:&^/3:' -d $somatic_bed,synthetic,no,DNA -y $ref_file $somatic_bed

4. scanFa not found?

Your SWAN installation is broken. The samtools lib was not compiled. You need to download and reinstall the swan package.

> rg = .Call("scanFa", "/mnt/genomics/REF/hg19/human_g1k_v37.fasta", PACKAGE="swan")
Error in .Call("scanFa", "/mnt/genomics/REF/hg19/human_g1k_v37.fasta",  :
"scanFa" not available for .Call() for package "swan"

5. getting all zero statistics from swan_stat although the bam file is good?

e.g.

rl  cvg     p_left  p_right q_right q_left  nreads  is      sdR     sdL     lCd     lCi     lDl     lDr     lSl     lSr
0   0       0       0       0       0       1393762 0       0       0       0       0       TRUE    TRUE    TRUE    TRUE

Most likely you are using samtools <0.19; SWAN is making use subsampling samtools feature which is buggy for samtools <0.19 and lead to this error. Solution is to upgrade your samtools.

Updated