Wiki
Clone wikiSWAN / Manual
Manual
$SWAN_BIN
The environmental variable $SWAN_BIN needs to be set to point to where SWAN binaries are. $SWAN_BIN can be either added to $PATH or the binaries can be called prefixed by $SWAN_BIN: $SWAN_BIN/binary.
swan_stat
The inst/swan_stat.R script gives library statistics such as coverage, insert size mean, standard deviation, global hanging read rate, clipping read rate for user and downstream SWAN analysis. The input is bamfile(s): spX.bam, where the script is multilib-aware and the bam has to be splitted into libwise bams and input as comma separated filenames, such as "spX.lib1.bam,spX.lib2.bam,...". The output is a summary statistics table with headers and also a histogram plot of insert-size distribution with fitted curves.
Usage: $SWAN_BIN/swan_stat [options] bamfile Options: -x XMAX, --xmax=XMAX x limit on histogram, [default 2000] -y YIELD, --yield=YIELD number of reads to be sampled for stats [default 1000000] -c CHRNAME, --chrname=CHRNAME use all if bam have all chromosomes or use the chr name if only have one, or chr name separated by ',' if multiple [default] -o OPREFIX, --oprefix=OPREFIX bam-wise stat output prefixs [default none] -m MPREFIX, --mprefix=MPREFIX merged stat output prefixs [default none] -s STEP, --step=STEP bin step size on histogram [default 10] -q, --noQuiet show verbose, [default FALSE] -a, --debug save debug, [default FALSE] -h, --help Show this help message and exit
- Output:
spX.stat:
- ::
rl cvg p_left p_right q_right q_left nreads is sdR sdL lCd lCi lDl lDr lSl lSr 100 20 0.000056 0.000068 0.011 0.011 1122801 300 30 30 TRUE TRUE TRUE TRUE FALSE FALSE
rl is read length; cvg is mean coverage; p_left/right is hanging rate for read1 or read2; q_left/right is up or down stream soft-clipping rate; nreads is total number of mapped reads; is is mean of insert size; sdR/L is the right or left fit of insert size distribution; lCd/lCi/lDl/lDr/lSl/lSr are preliminary assessment of the quality of LCd, LCi, LD, LU tracks and left/right soft-clipping clusters and indicate to user.
spX.hist.pdf:
swan_scan
The inst/swan_scan.R script does the genome-wide likelihood scan. The input is the reference file "hg19.fasta" and bamfile(s): "spX.bam". And the script is multilib-aware and the bam has to be splitted into libwise bams and input as comma separated filenames, such as "spX.lib1.bam,spX.lib2.bam,...". The output is a summary parameter table spX.swan.par.txt with actually used parameters and headers. Plus, spX.swan.txt.gz, a zipped txt file with likelihood scores marked by window; plus spX.bigd.txt, spX.disc.txt and spX.anch.txt for clustered read pairs with mapping abnormalities.
Usage: swan_scan [options] ref_file sp.rg1.bam,sp.rg2.bam,sp.rg3.bam,... Options: -c CHROMOSOMENAME, --chromosomeName=CHROMOSOMENAME chromosome to scan [default 11] -u SCANSTART, --scanStart=SCANSTART 1-indexed scan start, [default 1] -v SCANEND, --scanEnd=SCANEND 1-indexed scan end, [default 300000000] -r MIXINGRATE, --mixingRate=MIXINGRATE mixing rates, [default 0.5] -w WINDOWWIDTH, --windowWidth=WINDOWWIDTH scan widow witdh, must be an integer >0 [default 100] -g LWWINDOWWIDTH, --lwWindowWidth=LWWINDOWWIDTH Lw scan widow witdh, must be an integer >0 [default 1000] -s STEPSIZE, --stepSize=STEPSIZE scan window step size for the scan [default 10] -n GAP, --gap=GAP gap/N locations of hg19 in ucsc format [default ] -k, --stat provide precomputed stat file to disable tracks, [default FALSE] -x PROPCLIP, --propClip=PROPCLIP required aligned length soft isize; e.g. 0=> use all [default learn, .5xRL] -y HANGCLIP, --hangClip=HANGCLIP required aligned length soft hang; e.g. RL=> use all [default learn, RL-5] -b COVERAGEMEAN, --coverageMean=COVERAGEMEAN coverage mean, [default learn] -l READLENGTH, --readLength=READLENGTH read length, [default learn] -i INSERTSIZE, --insertSize=INSERTSIZE biological insert size mean,sdR,sdL [default learn] -m MARGINDELTA, --marginDelta=MARGINDELTA margin/delta size [default learn, IS+6*ISSD] -e BIGDEL, --bigDel=BIGDEL big deletion size [default learn, IS+3*ISSD] -p PROBHANG, --probHang=PROBHANG global probablity seeing hang read [default learn] -d PROBSOFT, --probSoft=PROBSOFT global probablity seeing soft read [default learn] -t OTHEROPT, --otherOpt=OTHEROPT other options [default smallDel=20,smallIns=20,maxInsert=learn,multiCore=1] -z TRUNKSIZE, --trunkSize=TRUNKSIZE trunk size for processing scanning bamfile, for 50x within 8G mem use, must be multiples of stepsize -s and blocksize -k [default 1000000] -o SPOUT, --spout=SPOUT sample output prefix, [default input] -f FASTSAVE, --fastSave=FASTSAVE compute fast, can use normal, fast or super [default normal] -j, --memSave save memory, [default FALSE] -q, --noQuiet show verbose, [default FALSE] -a, --debug save debug, [default FALSE] -h, --help Show this help message and exit
- Output:
- spX.swan.par.txt
delta hang_clip prop_clip rl coverage isize isize_sdR isize_sdL smallDel smallIns bigDel maxInsert p_left p_right q_left q_right start end chr w lw_width r lambda r_start r_end success trunk_size block_size n_wins speed_factor stepsize fy_cap lCd lW lCi lDl lDr lSl lSr 1000 80 50 100 20 300 30 30 20 20 1200 2200 0.0011 0.0012 0.00026 0.0003 1 3799223 2 100 1000 0.5 0.2 16 3799215 TRUE 1000000 1000000 379923 0 10 20 TRUE TRUE TRUE TRUE TRUE TRUE TRUEdelta is size of the vinicity used; hang_clip is the 1-percentage_aligned to consider read as clipped (currently inactive); prop_clip is the 1-percentage_aligned to consider read has usable insert size (currently inactive); rl is read length; coverage is mean coverage; isize is mean of insert size; isize_sdR/L is the right or left fit of insert size distribution; smallIns/Del is minimum size of indel to look for within cigar string; bigDel is the minimum insert size to look for large deletions; maxInsert is the maximum MPR insert allowed to be used in LCd scan; q_left/right is up or down stream soft-clipping rate; chr,start,end coordinates of the scan range; w scan window size for LC,LU and LD; lw_width scan window size for LW; r formal fraction; lambda square root of read coverage; r_start/end actual scan start/end excluding leading and trailing gap regions; success if the scan is successful; trunk_size one time trunk for scan into the memory; block_size scan blocks within trunks (currently inative); n_wins,stepsize total number of scan windows and sliding window step size;**speed_fator** speed up scan by ignoring reads within 1sd (fast) or 2sd (super) ranges for LC scores; fy_cap is capping LC score contribution from individual MPR; lW/lCd/lCi/lDl/lDr/lSl/lSr are indicators whether correspoinding LCd, LCi, LD, LU tracks were actually activated in the scan.
- spX.swan.txt.gz
start lW lCd lCi lDr lDl lSr lSl cvg cCd cCi cDr cDl ins del HAF HAR 49841 -39.2323 -26.5053 -34.2676 0 0 0 0 19 40 54 0 NA NA 0 0 49851 -40.6186 -27.8215 -35.653 0 0 0 0 20 42 56 0 NA NA 0 0start is start of current window; lW/lCd/lCi/lDl/lDr/lSl/lSr are row score tracks; cvg is window wise coverage; cCd/cCi/cDl/cDr window-wise number of MPRs contributed to corresponding score; ind/del is window-wise piled cigar I/Ds; HAF/R is windows piled read1 and read2 hanging reads.
- spX.{bigd,disc}.txt
617483 617588 619613 619729 6 1120235 1120455 1143327 1143427 4first and second column is upstream confidence interval of break point; third and fourth column is downstream confidence interval of break point; fifth column is MPRs supporting such bigd/disc cluster.
sclip_scan
The inst/sclip_scan.R script does the genome-wide soft-sclip scan. The input is the reference file "hg19.fasta" and bamfile(s): "spX.bam". And the script is multilib-aware and the bam has to be splitted into libwise bams and input as comma separated filenames, such as "spX.lib1.bam,spX.lib2.bam,...". The output is a RData file spX.sclip.RData with stored results for downstream swan_join.R (non human readable). Plus optionally spX.sclip.vcf which contains the standalone sclip_scan.R results in VCF format.
Usage: $SWAN_BIN/sclip_scan [options] ref_file [spY.rg1.bam,spY.rg2.bam]:spX.rg1.bam,spX.rg2.bam Options: -c CHROMOSOMENAME, --chromosomeName=CHROMOSOMENAME chromosome to scan [default 11] -n GAPFILE, --gapfile=GAPFILE gap/N locations of hg19 in ucsc format [default none] -i MINREADPERCLUSTER, --minReadPerCluster=MINREADPERCLUSTER minimal number of reads per cluster, [default 3,5] -j MINBASEPERCLUSTER, --minBasePerCluster=MINBASEPERCLUSTER minimal number of total bases per cluster, [default 30,30] -u SCANSTART, --scanStart=SCANSTART 1-indexed scan start, [default 1] -v SCANEND, --scanEnd=SCANEND 1-indexed scan end, [default 300000000] -z TRUNKSIZE, --trunkSize=TRUNKSIZE trunk size for scanning bamfile [default 1000000] -d CONTDIR, --contdir=CONTDIR contrast directory, [default none] -r SAMPLE, --sample=SAMPLE mannual override of spX information [default spX,INFO,MIX,DESCRIPTION] -s STAT, --stat=STAT .par file, necessary if contrast bamfile given [default none] -t CONSTAT, --constat=CONSTAT contrast .par file, necessary if contrast bamfile given [default none] -e DELTHRESH, --delthresh=DELTHRESH foldchange threshold for deletion events, [default 0.8] -k DUPTHRESH, --dupthresh=DUPTHRESH foldchange threshold for duplication events, [default 1.2] -m MAXFC, --maxfc=MAXFC maximum region size for fold change check (due to memory considerations), [default 20000000] -b MINGAPPAIR, --minGapPair=MINGAPPAIR A breakpoint and its mate must be separated by at least this value, [default 25] -f MINFC, --minfc=MINFC minimum region size for fold change check for del/dup calls, [default 10000] -y INSPARAM, --insparam=INSPARAM parameters for calling insertions, 0 means to estimate from data [default 0:0] -x HOTSPOT, --hotspot=HOTSPOT setting for hotspot filtering, [default 10000:3] -g GAPDIST, --gapdist=GAPDIST setting for gaps (centromere or telomere) filtering, [default 1e+06] -o SPOUT, --spout=SPOUT sample output prefix, [default input] -p PLOT, --plot=PLOT file for diagnostic plots, [default sclip_events.pdf] --vcf output VCF file, [default FALSE] --nobam Use bam file for calling, [default FALSE] -a, --debug save debug, [default FALSE] -q, --noQuiet show verbose, [default FALSE] -h, --help Show this help message and exit
swan_join
The inst/swan_join.R script does the multiple evidence joining part. The input is reference file "hg19.fasta", bamfile(s): "spX.bam" and any combinations of following swan_scan.R, sclip_scan.R and seqcbs_scan.R generated files (see usage). And the script is multilib-aware and the bam has to be splitted into libwise bams and input as comma separated filenames, such as "spX.lib1.bam,spX.lib2.bam,...". The output is BED file spX.{raw,conf}.bed plus optionally spX.{raw,conf}.vcf.
Usage: $SWAN_BIN/swan_join [options] refFile [spY.rg1.bam,spY.rg2.bam,...:]spX.rg1.bam,spX.rg2.bam,... Options: -c CHRNAME, --chrname=CHRNAME chromosome name, [default: 22] -t STAT, --stat=STAT stat inputs: [spY.stat:]spX.stat; [spY.stat:]spX.stat implicitly assumed -i SWAN, --swan=SWAN swan inputs: [spY.swan.txt.gz:]spX.swan.txt.gz; [spY.swan.par.txt:]spX.swan.par.txt implicitly assumed -j BIGD, --bigd=BIGD big deletion inputs: [spY.bigd.txt:]spX.bigd.txt; [spY.swan.par.txt:]spX.swan.par.txt implicitly assumed -k SEQCBS, --seqcbs=SEQCBS seqcbs inputs: spX.seqcbs.txt; spX.seqcbs.par.txt implicitly assumed -l SCLIP, --sclip=SCLIP sclip inputs: spX.sclip.Rdata; spX.sclip.par.txt implicitly assumed -m DISC, --disc=DISC discordant cluster inputs: [spY.disc.txt:]spX.disc.txt; [spY.swan.par.txt:]spX.swan.par.txt implicitly assumed -u SWAN_OPT, --swan_opt=SWAN_OPT swan options: [spY_opt:]track=t1_key1=value1_key2=value2,track=t2_..., default1: track=lCd,method=empr,thresh=9,sup=100,gap=100_track=lDr+lDl,method=theo,thresh=level3,sup=100,gap=100_track=ins,sup=50,cvg=5_track=del,sup=50,cvg=5 default2: track=lCd,method=empr,thresh=8,sup=50,gap=100_track=lDr+lDl,method=theo,thresh=level2,sup=50,gap=100_track=ins,sup=20,cvg=2_track=del,sup=20,cvg=2:track=lCd,method=empr,thresh=9,sup=100,gap=100_track=lDr+lDl,method=theo,thresh=level3,sup=100,gap=100_track=ins,sup=50,cvg=5_track=del,sup=50,cvg=5 -v BIGD_OPT, --bigd_opt=BIGD_OPT swan big deletion options: [spY_opt:]key1=value1,key2=value2,..., default1: minmpr=5,maxins=50000 default2: minmpr=2,maxins=100000:minmpr=5,maxins=50000 -w SEQCBS_OPT, --seqcbs_opt=SEQCBS_OPT seqcbs options: key1=value1,key2=value2,..., default: minstat=0,sup=1500,gap=1000,expand=2000,good=4 -x SCLIP_OPT, --sclip_opt=SCLIP_OPT sclip inputs: key1=value1,key2=value2,..., default: -y DISC_OPT, --disc_opt=DISC_OPT swan discordent clusters options: [spY_opt:]key1=value1,key2=value2,..., default1: minmpr=5,maxins=10000 default2: minmpr=2,maxins=20000:minmpr=5,maxins=10000 -d OVERRIDE, --override=OVERRIDE bed formatted with colnames, parameter overriding files for swan calling: [spY.swan.ovrd.txt:]spX.swan.ovrd.txt -f, --fineconf fine conf mode and .bam is assumed for all inputs, see manual [default FALSE] -o OUTPREFIX, --outprefix=OUTPREFIX prefix for output file [default input] -p SAMPLE, --sample=SAMPLE mannual override of spX information [default spX,INFO,MIX,DESCRIPTION] -q, --noQuiet verbose mode and additional information outputs [default FALSE] -r CONFIRM, --confirm=CONFIRM use which confirmation? [default dedup] -s SAVEVCF, --savevcf=SAVEVCF whether to savevcf file (slower) and parameters, e.g. species=human_sapien:other_opt=other_value [default ] -a, --debug debug mode and additional .RData is assumed for all inputs, see manual [default FALSE] -h, --help Show this help message and exitsee also Example. Have fun!
Updated