Commits

opticall committed 6d0772f Draft

Edited online

Comments (0)

Files changed (1)

-==== REQUIREMENTS ====
+You can find the new (& user friendly!) docs here:
 
-opticall includes all the libraries it needs, and requires nothing beyond the C++ standard. It compiles using g++ on unix/mac based systems.
-
-==== INSTALLATION ====
-
-You can download the latest release of opticall from:
-
-https://bitbucket.org/tss101/opticall/downloads
-
-download the most recent compressed file in the format you're most comfortable with. In the case of a .gz download, uncompress the file by running:
-
-tar xvzf <filename>
-
-which creates a folder of the same name as filename. cd into the optical folder:
-
-cd <foldername>/opticall
-
-then compile the opticall code by running the make command:
-
-make
-
-after which the opticall executable will appear, and you've successfully installed opticall!
-
-==== USAGE ====
-
-./opticall [options] [-in INPUT_FILE] [-out OUTPUT_FILE] [options]
-
- 
-==== DESCRIPTION ====
-
-opticall reads in file of intensity data (currently Illumina normalized intensities) and clusters them considering both per-SNP and per-sample information, and provides genotype calls as output. The intensity input file is tab separated, with SNPs are rows, and samples as columns. So a line would be: 
-
-rsid<tab>rscoord<tab>allelesAB<tab>id1A<tab>id1B<tab>id2A<tab>id2B etc.
-
-where id1A is the allele A intensity value for sample 1, and id1B is the allele B intensity value for sample 1. The algorithm is known to perform well with Illumina normalized intensities. Any missing intensities should be input as NaN for both the A and B alleles.
-
-The first line of the file should also be a header line of the form:
-
-SNP<tab>Coor<tab>Alleles<tab>sample1idA<tab>sample1idB<tab>sample2idA<tab>sample2idB etc.
-
-where sample1id is your identifier for the first sample, and the A, B correspond to the different allele intensities. 
-
-An example input intensity file is provided with the program for your information.
-
-**We're still in the process of optimising code execution, and recommend chunking the input data into separate files for each chromosome (each one in the format above), and calling each chunk separately in parallel on a different processing node to significantly reduce execution time. **
- 
-==== INPUT/OUPUT ====
-
--in FILE
- 
-The input intensity file name, including the path
-
--out FILE
- 
-The output file name, including the path. Two files will be created by the algorithm with the filepath specified. One will have the suffix '.calls' appended to it for the genotype calls, and the other '.probs' for the posterior probabilities. 
-
-The output format is space-delimited with columns: rs, coordinate, allelesAB, pertubation value, call_1, call_2, call_3,.... The order of the calls/probs is given by the order of the sample ids in the header line of the output file. 
-
-The calls are encoded as 1 = AA, 2 = AB (heterozygote), 3 = BB, 4 = NN (no call). The pertubation value is merely output to make call files more compatible with those of the Illuminus caller, and not reflective of any pertubation analysis. The probs file gives probabilities in the order: P(AA), P(BB), P(AB), P(NN). In cases where the maximum genotype probability is less than the probability threshold, the call will be NN but the posterior probabilities might not have P(NN) as the highest value.
-
-==== OUTLIER HANDLING OPTIONS ====
-
--nointcutoff
-
-This will stop optiCall excluding outliers with intensity values that are too high. Use this if you already have filtered samples/SNPs for intensity outliers
-
--meanintfilter
-
-By running this flag, optiCall identifies samples with mean intensity (across SNPs) values that are too high or too low, and excludes them from the clustering. This samples have their genotypes set to NN at all SNPs. If you already deal with outlying intensities before using optiCall, there's no need to use this flag.
-
--noblank
-
-By default, optiCall calls all samples at a SNPs as NN if the result of genotype clustering produces a Hardy-Weinberg Equilibrium (HWE) p-value of less than optiCall's tbreshold (by default 1e-15). Setting this flag means optiCall will make a call, even if significantly deviated from HWE.
-
-==== THRESHOLD SETTING OPTIONS ====
-
--hwep NEW_VALUE
-
-By default this is 1e-15. This is the threshold Hardy-Weinberg Equilibrium (HWE)  p-value at which optiCall determines a clustering has failed. If the HWE p-value is less than this threshold optiCall will first attempt a rescue clustering for the SNP, and if that fails, call all genotypes at that SNP NN (use the -noblank option to stop this). To get the best calls, we recommend you set this value to the HWE p-value QC threshold for your study so that any SNPs that may be potentially lost could be fixed by the rescue clustering. If you're not sure about your QC thresholds, or want faster execution, the default will suffice.
-
--minp NEW_VALUE
-
-By default this is 0.7. This is the threshold at which optiCall will make a call. If no posterior genotype probability is above this value, then optiCall sets the genotype to NN.
-
-==== PROVIDING SAMPLE INFORMATION & CALLING SUBSETS & DIFFERENT BATCHES/ETHNICITIES ====
-
--info FILE
-
-The info file specifies sample genders and also whether samples should be excluded from calling. It is whitespace separated and the format is:
-
-sampleid gender excludeflag batchid
-
-with a line for all the samples in the intensity data supplied. An example info file is provided with the optiCall download.
-
-sampleid should match the sampleid given in the header of the intensity file. gender is either 1 for male or 2 for female - and any other integer value is considered as unknown
-excludeflag is 1 if the sample is to be excluded from calling, or zero if it is to be included in calling.
-
-The batchid is designed to account for possible batch/ethnicity heterogeneity. For example when optiCall calculates hardy-weinberg equilibrium, calling different ethnicities together could pose a problem. To handle this, give a separate batchid to each unique group being called. Batchids are integers greater than or equal to 0, and a batch id of -9 will exclude the sample from any hardy-weinberg equilibrium calculations. 
-
-
-==== X,Y Chromosome, and Mitochondrial DNA calling ====
-
-simply run optiCall as normal but including the flag -X, -Y or -MT corresponding to the type of data you're trying to call.
-
-
+http://opticall.bitbucket.org/