Wiki

Clone wiki

PracticalHaplotypeGraph / UserInstructions / SampleHaplotypeConfigFile

Sample Config file for Loading Haplotypes.

To Load haplotypes a config file will be needed. Here is a sample one with most of the needed information filled out. Anything marked with UNASSIGNED will need to be updated with a correct value and anything OPTIONAL is optional and depending on your use should be removed.

#!

###Example config file. 
### Anything marked with UNASSIGNED needs to be set for at least one of the steps
### If it is marked as OPTIONAL, it will only need to be set if you want to run specific steps. 
host=localHost
user=sqlite
password=sqlite
DB=/phg/smallSeqDB.db
DBtype=sqlite

#System parameters.  Xmx is the java heap size and numThreads will be used to set threads available for multithreading components.
Xmx=10G
numThreads=10

liquibaseOutdir=/phg/outputDir

anchors=***UNASSIGNED***
genomeData=***UNASSIGNED***

referenceFasta=***UNASSIGNED***

asmMethodName=mummer4
asmKeyFile=***OPTIONAL***

wgsMethodName=GATK_PIPELINE
wgsKeyFile=***OPTIONAL***

consensusMethodName=CONSENSUS
inputConsensusMethods=GATK_PIPELINE

fastqFileDir=/phg/inputDir/loadDB/fastq/
dedupedBamDir=/phg/inputDir/loadDB/bam/dedup/

gvcfFileDir=/phg/inputDir/loadDB/gvcf/
filteredBamDir=/phg/inputDir/loadDB/bam/mapqFiltered/

# BAM and GVCF uploading parameters
mapQ=48
refRangeMethods=FocusRegion,FocusComplement
extendedWindowSize=1000

# WGS Haplotype Filtering criteria.  These are the defaults.
GQ_min=50
QUAL_min=200
DP_poisson_min=.01
DP_poisson_max=.99
filterHets=true

## If you have a sentieon license you can set the server location here(and remove the #).  If it is set, it will use Sentieon instead of GATK
#sentieon_license= ***OPTIONAL***


##Consensus Plugin Parameters
minFreq=0.5
maxClusters=30
minSite=30
minCoverage=0.1
maxThreads=10
minTaxa=1
mxDiv=0.01

#This sets the type of clustering mode.
#Valid params are: upgma, upgma_assembly, and kmer_assembly
#The two assembly parameters are designed for assembly haplotypes and will choose a representative haplotype as the consensus instead of attempting to merge calls like with upgma.
clusteringMode=upgma

#If you want to use an assembly clusteringMode, you must have a ranking file.
#The ranking file must be a tab separated list of taxon\trankingScore where higher numbers are a better rank.  This file is used to chose the representative haplotype
rankingFile=***OPTIONAL***

##Optional if you want to use kmer_assembly as the clusteringMode. Otherwise is ignored 
kmerSize=7
distanceCalculation=Euclidean

Updated