Wiki

Clone wiki

SWAN / Manual

Manual

$SWAN_BIN

The environmental variable $SWAN_BIN needs to be set to point to where SWAN binaries are. $SWAN_BIN can be either added to $PATH or the binaries can be called prefixed by $SWAN_BIN: $SWAN_BIN/binary.

swan_stat

The inst/swan_stat.R script gives library statistics such as coverage, insert size mean, standard deviation, global hanging read rate, clipping read rate for user and downstream SWAN analysis. The input is bamfile(s): spX.bam, where the script is multilib-aware and the bam has to be splitted into libwise bams and input as comma separated filenames, such as "spX.lib1.bam,spX.lib2.bam,...". The output is a summary statistics table with headers and also a histogram plot of insert-size distribution with fitted curves.

Usage: $SWAN_BIN/swan_stat [options] bamfile


Options:
    -x XMAX, --xmax=XMAX
            x limit on histogram, [default 2000]

    -y YIELD, --yield=YIELD
            number of reads to be sampled for stats [default 1000000]

    -c CHRNAME, --chrname=CHRNAME
            use all if bam have all chromosomes or use the chr name if only have one, or chr name separated by ',' if multiple [default]

    -o OPREFIX, --oprefix=OPREFIX
            bam-wise stat output prefixs [default none]

    -m MPREFIX, --mprefix=MPREFIX
            merged stat output prefixs [default none]

    -s STEP, --step=STEP
            bin step size on histogram [default 10]

    -q, --noQuiet
            show verbose, [default FALSE]

    -a, --debug
            save debug, [default FALSE]

    -h, --help
            Show this help message and exit
Output:
  • spX.stat:

    ::

    rl cvg p_left p_right q_right q_left nreads is sdR sdL lCd lCi lDl lDr lSl lSr 100 20 0.000056 0.000068 0.011 0.011 1122801 300 30 30 TRUE TRUE TRUE TRUE FALSE FALSE

    rl is read length; cvg is mean coverage; p_left/right is hanging rate for read1 or read2; q_left/right is up or down stream soft-clipping rate; nreads is total number of mapped reads; is is mean of insert size; sdR/L is the right or left fit of insert size distribution; lCd/lCi/lDl/lDr/lSl/lSr are preliminary assessment of the quality of LCd, LCi, LD, LU tracks and left/right soft-clipping clusters and indicate to user.

  • spX.hist.pdf:

    test.png

swan_scan

The inst/swan_scan.R script does the genome-wide likelihood scan. The input is the reference file "hg19.fasta" and bamfile(s): "spX.bam". And the script is multilib-aware and the bam has to be splitted into libwise bams and input as comma separated filenames, such as "spX.lib1.bam,spX.lib2.bam,...". The output is a summary parameter table spX.swan.par.txt with actually used parameters and headers. Plus, spX.swan.txt.gz, a zipped txt file with likelihood scores marked by window; plus spX.bigd.txt, spX.disc.txt and spX.anch.txt for clustered read pairs with mapping abnormalities.

Usage: swan_scan [options] ref_file sp.rg1.bam,sp.rg2.bam,sp.rg3.bam,...


Options:
    -c CHROMOSOMENAME, --chromosomeName=CHROMOSOMENAME
            chromosome to scan [default 11]

    -u SCANSTART, --scanStart=SCANSTART
            1-indexed scan start, [default 1]

    -v SCANEND, --scanEnd=SCANEND
            1-indexed scan end, [default 300000000]

    -r MIXINGRATE, --mixingRate=MIXINGRATE
            mixing rates, [default 0.5]

    -w WINDOWWIDTH, --windowWidth=WINDOWWIDTH
            scan widow witdh, must be an integer >0 [default 100]

    -g LWWINDOWWIDTH, --lwWindowWidth=LWWINDOWWIDTH
            Lw scan widow witdh, must be an integer >0 [default 1000]

    -s STEPSIZE, --stepSize=STEPSIZE
            scan window step size for the scan [default 10]

    -n GAP, --gap=GAP
            gap/N locations of hg19 in ucsc format [default ]

    -k, --stat
            provide precomputed stat file to disable tracks, [default FALSE]

    -x PROPCLIP, --propClip=PROPCLIP
            required aligned length soft isize; e.g. 0=> use all [default learn, .5xRL]

    -y HANGCLIP, --hangClip=HANGCLIP
            required aligned length soft hang; e.g. RL=> use all [default learn, RL-5]

    -b COVERAGEMEAN, --coverageMean=COVERAGEMEAN
            coverage mean, [default learn]

    -l READLENGTH, --readLength=READLENGTH
            read length, [default learn]

    -i INSERTSIZE, --insertSize=INSERTSIZE
            biological insert size mean,sdR,sdL [default learn]

    -m MARGINDELTA, --marginDelta=MARGINDELTA
            margin/delta size [default learn, IS+6*ISSD]

    -e BIGDEL, --bigDel=BIGDEL
            big deletion size [default learn, IS+3*ISSD]

    -p PROBHANG, --probHang=PROBHANG
            global probablity seeing hang read [default learn]

    -d PROBSOFT, --probSoft=PROBSOFT
            global probablity seeing soft read [default learn]

    -t OTHEROPT, --otherOpt=OTHEROPT
            other options [default smallDel=20,smallIns=20,maxInsert=learn,multiCore=1]

    -z TRUNKSIZE, --trunkSize=TRUNKSIZE
            trunk size for processing scanning bamfile, for 50x within 8G mem use, must be multiples of stepsize -s and blocksize -k [default 1000000]

    -o SPOUT, --spout=SPOUT
            sample output prefix, [default input]

    -f FASTSAVE, --fastSave=FASTSAVE
            compute fast, can use normal, fast or super [default normal]

    -j, --memSave
            save memory, [default FALSE]

    -q, --noQuiet
            show verbose, [default FALSE]

    -a, --debug
            save debug, [default FALSE]

    -h, --help
            Show this help message and exit
Output:
  • spX.swan.par.txt
delta     hang_clip       prop_clip       rl      coverage        isize   isize_sdR       isize_sdL       smallDel        smallIns        bigDel  maxInsert       p_left  p_right q_left  q_right start   end     chr     w       lw_width        r       lambda  r_start r_end   success trunk_size      block_size      n_wins  speed_factor    stepsize        fy_cap  lCd     lW      lCi     lDl     lDr     lSl     lSr
1000      80      50      100     20      300     30      30      20      20      1200    2200    0.0011  0.0012  0.00026 0.0003  1       3799223 2       100     1000    0.5     0.2     16      3799215 TRUE    1000000 1000000 379923  0       10      20      TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE

delta is size of the vinicity used; hang_clip is the 1-percentage_aligned to consider read as clipped (currently inactive); prop_clip is the 1-percentage_aligned to consider read has usable insert size (currently inactive); rl is read length; coverage is mean coverage; isize is mean of insert size; isize_sdR/L is the right or left fit of insert size distribution; smallIns/Del is minimum size of indel to look for within cigar string; bigDel is the minimum insert size to look for large deletions; maxInsert is the maximum MPR insert allowed to be used in LCd scan; q_left/right is up or down stream soft-clipping rate; chr,start,end coordinates of the scan range; w scan window size for LC,LU and LD; lw_width scan window size for LW; r formal fraction; lambda square root of read coverage; r_start/end actual scan start/end excluding leading and trailing gap regions; success if the scan is successful; trunk_size one time trunk for scan into the memory; block_size scan blocks within trunks (currently inative); n_wins,stepsize total number of scan windows and sliding window step size;**speed_fator** speed up scan by ignoring reads within 1sd (fast) or 2sd (super) ranges for LC scores; fy_cap is capping LC score contribution from individual MPR; lW/lCd/lCi/lDl/lDr/lSl/lSr are indicators whether correspoinding LCd, LCi, LD, LU tracks were actually activated in the scan.

  • spX.swan.txt.gz
start     lW      lCd     lCi     lDr     lDl     lSr     lSl     cvg     cCd     cCi     cDr     cDl     ins     del     HAF     HAR
49841     -39.2323        -26.5053        -34.2676        0       0       0       0       19      40      54              0       NA      NA      0       0
49851     -40.6186        -27.8215        -35.653 0       0       0       0       20      42      56      0               NA      NA      0       0

start is start of current window; lW/lCd/lCi/lDl/lDr/lSl/lSr are row score tracks; cvg is window wise coverage; cCd/cCi/cDl/cDr window-wise number of MPRs contributed to corresponding score; ind/del is window-wise piled cigar I/Ds; HAF/R is windows piled read1 and read2 hanging reads.

  • spX.{bigd,disc}.txt
617483    617588  619613  619729  6
1120235   1120455 1143327 1143427 4

first and second column is upstream confidence interval of break point; third and fourth column is downstream confidence interval of break point; fifth column is MPRs supporting such bigd/disc cluster.

sclip_scan

The inst/sclip_scan.R script does the genome-wide soft-sclip scan. The input is the reference file "hg19.fasta" and bamfile(s): "spX.bam". And the script is multilib-aware and the bam has to be splitted into libwise bams and input as comma separated filenames, such as "spX.lib1.bam,spX.lib2.bam,...". The output is a RData file spX.sclip.RData with stored results for downstream swan_join.R (non human readable). Plus optionally spX.sclip.vcf which contains the standalone sclip_scan.R results in VCF format.

Usage: $SWAN_BIN/sclip_scan [options] ref_file [spY.rg1.bam,spY.rg2.bam]:spX.rg1.bam,spX.rg2.bam


Options:
  -c CHROMOSOMENAME, --chromosomeName=CHROMOSOMENAME
          chromosome to scan [default 11]

  -n GAPFILE, --gapfile=GAPFILE
          gap/N locations of hg19 in ucsc format [default none]

  -i MINREADPERCLUSTER, --minReadPerCluster=MINREADPERCLUSTER
          minimal number of reads per cluster, [default 3,5]

  -j MINBASEPERCLUSTER, --minBasePerCluster=MINBASEPERCLUSTER
          minimal number of total bases per cluster, [default 30,30]

  -u SCANSTART, --scanStart=SCANSTART
          1-indexed scan start, [default 1]

  -v SCANEND, --scanEnd=SCANEND
          1-indexed scan end, [default 300000000]

  -z TRUNKSIZE, --trunkSize=TRUNKSIZE
          trunk size for scanning bamfile [default 1000000]

  -d CONTDIR, --contdir=CONTDIR
          contrast directory, [default none]

  -r SAMPLE, --sample=SAMPLE
          mannual override of spX information [default spX,INFO,MIX,DESCRIPTION]

  -s STAT, --stat=STAT
          .par file, necessary if contrast bamfile given [default none]

  -t CONSTAT, --constat=CONSTAT
          contrast .par file, necessary if contrast bamfile given [default none]

  -e DELTHRESH, --delthresh=DELTHRESH
          foldchange threshold for deletion events, [default 0.8]

  -k DUPTHRESH, --dupthresh=DUPTHRESH
          foldchange threshold for duplication events, [default 1.2]

  -m MAXFC, --maxfc=MAXFC
          maximum region size for fold change check (due to memory considerations), [default 20000000]

  -b MINGAPPAIR, --minGapPair=MINGAPPAIR
          A breakpoint and its mate must be separated by at least this value, [default 25]

  -f MINFC, --minfc=MINFC
          minimum region size for fold change check for del/dup calls, [default 10000]

  -y INSPARAM, --insparam=INSPARAM
          parameters for calling insertions, 0 means to estimate from data [default 0:0]

  -x HOTSPOT, --hotspot=HOTSPOT
          setting for hotspot filtering, [default 10000:3]

  -g GAPDIST, --gapdist=GAPDIST
          setting for gaps (centromere or telomere) filtering, [default 1e+06]

  -o SPOUT, --spout=SPOUT
          sample output prefix, [default input]

  -p PLOT, --plot=PLOT
          file for diagnostic plots, [default sclip_events.pdf]

  --vcf
          output VCF file, [default FALSE]

  --nobam
          Use bam file for calling, [default FALSE]

  -a, --debug
          save debug, [default FALSE]

  -q, --noQuiet
          show verbose, [default FALSE]

  -h, --help
          Show this help message and exit

swan_join

The inst/swan_join.R script does the multiple evidence joining part. The input is reference file "hg19.fasta", bamfile(s): "spX.bam" and any combinations of following swan_scan.R, sclip_scan.R and seqcbs_scan.R generated files (see usage). And the script is multilib-aware and the bam has to be splitted into libwise bams and input as comma separated filenames, such as "spX.lib1.bam,spX.lib2.bam,...". The output is BED file spX.{raw,conf}.bed plus optionally spX.{raw,conf}.vcf.

Usage: $SWAN_BIN/swan_join [options] refFile [spY.rg1.bam,spY.rg2.bam,...:]spX.rg1.bam,spX.rg2.bam,...


Options:
  -c CHRNAME, --chrname=CHRNAME
          chromosome name, [default: 22]


  -t STAT, --stat=STAT
          stat inputs: [spY.stat:]spX.stat;
    [spY.stat:]spX.stat implicitly assumed


  -i SWAN, --swan=SWAN
          swan inputs: [spY.swan.txt.gz:]spX.swan.txt.gz;
    [spY.swan.par.txt:]spX.swan.par.txt implicitly assumed


  -j BIGD, --bigd=BIGD
          big deletion inputs: [spY.bigd.txt:]spX.bigd.txt;
    [spY.swan.par.txt:]spX.swan.par.txt implicitly assumed


  -k SEQCBS, --seqcbs=SEQCBS
          seqcbs inputs: spX.seqcbs.txt;
    spX.seqcbs.par.txt implicitly assumed


  -l SCLIP, --sclip=SCLIP
          sclip inputs: spX.sclip.Rdata;
    spX.sclip.par.txt implicitly assumed


  -m DISC, --disc=DISC
          discordant cluster inputs: [spY.disc.txt:]spX.disc.txt;
    [spY.swan.par.txt:]spX.swan.par.txt implicitly assumed


  -u SWAN_OPT, --swan_opt=SWAN_OPT
          swan options: [spY_opt:]track=t1_key1=value1_key2=value2,track=t2_...,
 default1: track=lCd,method=empr,thresh=9,sup=100,gap=100_track=lDr+lDl,method=theo,thresh=level3,sup=100,gap=100_track=ins,sup=50,cvg=5_track=del,sup=50,cvg=5
 default2: track=lCd,method=empr,thresh=8,sup=50,gap=100_track=lDr+lDl,method=theo,thresh=level2,sup=50,gap=100_track=ins,sup=20,cvg=2_track=del,sup=20,cvg=2:track=lCd,method=empr,thresh=9,sup=100,gap=100_track=lDr+lDl,method=theo,thresh=level3,sup=100,gap=100_track=ins,sup=50,cvg=5_track=del,sup=50,cvg=5


  -v BIGD_OPT, --bigd_opt=BIGD_OPT
          swan big deletion options: [spY_opt:]key1=value1,key2=value2,...,
 default1: minmpr=5,maxins=50000
 default2: minmpr=2,maxins=100000:minmpr=5,maxins=50000


  -w SEQCBS_OPT, --seqcbs_opt=SEQCBS_OPT
          seqcbs options: key1=value1,key2=value2,..., default: minstat=0,sup=1500,gap=1000,expand=2000,good=4


  -x SCLIP_OPT, --sclip_opt=SCLIP_OPT
          sclip inputs: key1=value1,key2=value2,..., default:


  -y DISC_OPT, --disc_opt=DISC_OPT
          swan discordent clusters options: [spY_opt:]key1=value1,key2=value2,...,
 default1: minmpr=5,maxins=10000
 default2: minmpr=2,maxins=20000:minmpr=5,maxins=10000


  -d OVERRIDE, --override=OVERRIDE
          bed formatted with colnames, parameter overriding files for swan calling:
    [spY.swan.ovrd.txt:]spX.swan.ovrd.txt


  -f, --fineconf
          fine conf mode and .bam is assumed for all inputs, see manual [default FALSE]

  -o OUTPREFIX, --outprefix=OUTPREFIX
          prefix for output file [default input]

  -p SAMPLE, --sample=SAMPLE
          mannual override of spX information [default spX,INFO,MIX,DESCRIPTION]

  -q, --noQuiet
          verbose mode and additional information outputs [default FALSE]

  -r CONFIRM, --confirm=CONFIRM
          use which confirmation? [default dedup]

  -s SAVEVCF, --savevcf=SAVEVCF
          whether to savevcf file (slower) and parameters, e.g.
                                  species=human_sapien:other_opt=other_value
                                  [default ]

  -a, --debug
          debug mode and additional .RData is assumed for all inputs, see manual [default FALSE]

  -h, --help
          Show this help message and exit

see also Example. Have fun!

Updated