Wiki

Clone wiki

FAST / OutputFileFormats

FAST analysis options and output file format for various methods

Note: FAST will append the chr number to the output file name prefix provided as input with the ‘--out-file’ option; e.g. --out-file outfile with --chr 10 options will result in output file prefix outfile.chr10. So for GWiS method with linear regression, FAST will generate output file named outfile.chr10.GWiS.Linear.txt. Similarly for all other methods.

Method : GWiS using Linear Regression

Options : --linear-gwis (no permutations) --linear-gwis-perm (with permutations)

Output Filename:  Out.chrXX.GWiS.Linear.txt

Multiple lines are present for each multi-SNP model in a gene :-
  1. First line of each model has SNP.name=NONE indicating the null (intercept only model when no 
     covariates, or covariates only model when covariates are present).
  2. Followed by one or more lines for each SNP added to the model. 
  3. Line with SNP.name=SUMMARY indicates end of the model for the gene. This line also prints the
     values of K, SSM, BIC, f.stat n.stop, n.better and pval for the  final K-snp model in the gene.

Format
   Chr                :   Chromosome
   GeneID             :   Unique gene id 
   Name               :   Gene name
   Start              :   Gene start in bp
   End                :   Gene end in bp 
   Length             :   Gene length in bp
   SNPs               :   No. of snps in the gene
   Tests              :   Effective no. of snps in the gene
   SNP.name           :   SNP entering the model
   SNP.pos            :   SNP position in bp     
   SNP.MAF            :   SNP minor allele frequency    
   SNP.qual           :   SNP imputation quality      
   K                  :   Current model size
   SSM                :   Sum of the squares of the model
   BIC                :   BIC increment  for the snp
   F.stat             :   Current model F-statistic 
   R2                 :   Multiple R2 of the snp with the others in the model.
   n.stop             :   No of permutations executed
   n.better           :   No of permutations with better BIC score
   pval               :   Gene pvalue

Method : GWiS using Logistic Regression

Options : --logistic-gwis (no permutations) --logistic-gwis-perm (with permutations)

Output Filename:  Out.chrXX.GWiS.Logistic.txt

Multiple lines are present for each multi-SNP model in a gene :-
  1. First line of each model has SNP.name=NONE indicating the null (intercept only model when no 
     covariates, or covariates only model when covariates are present).
  2. Followed by one or more lines for each SNP added to the model. 
  3. Line with SNP.name=SUMMARY indicates end of the model for the gene. This line also prints the
     values of K, SSM, BIC, chi2, n.stop, n.better and pval for the  final K-snp model in the gene.

Format
   Chr                :   Chromosome
   GeneID             :   Unique gene id 
   Name               :   Gene name
   Start              :   Gene start in bp
   End                :   Gene end in bp 
   Length             :   Gene length in bp
   SNPs               :   No. of snps in the gene
   Tests              :   Effective no. of snps in the gene
   SNP.name           :   SNP entering the model
   SNP.pos            :   SNP position in bp     
   SNP.MAF            :   SNP minor allele frequency    
   SNP.qual           :   SNP imputation quality      
   K                  :   Current model size
   SSM                :   Sum of the squares of the model
   BIC                :   BIC increment  for the snp
   chi2               :   Current model chi squared  
   R2                 :   Multiple R2 of the snp with the others in the model.
   n.stop             :   No of permutations executed
   n.better           :   No of permutations with better BIC score
   pval               :   Gene pvalue

Method : minSNP using Linear Regression

Options : --linear-minsnp (no permutations) --linear-minsnp-perm (with permutations)

Output Filename : Out.chrXX.minSNP.Linear.txt (for minSNP)

Description of methods: One line for each snp mapped to a gene. Both methods assign pvalue of a 
gene with the pvalue of its most significant SNP, which in the output file is indicated by the 
row whose isBest column has value 1. Minsnp calculates this pvalue with empirical methods 
(F-statistic/Chi2), while minsnp-perm gets this pvalue with permutations (this is because for 
variants with low MAF, empirical method loses power). For Minsnp-perm, permutations are performed 
only for the most significant SNP in the original test. n.tot and n.better are output only for minsnp-perm.

Format
   Chr          :   Chromosome
   GeneID       :   Unique gene id 
   Name         :   Gene name
   Start        :   Gene start in bp (includes flank)
   End          :   Gene end in bp (includes flank)
   Length       :   Gene length in bp (includes flank)
   SNPs         :   No. of snps in the gene
   Tests        :   Effective no. of snps in the gene
   SNP.name     :   SNP entering the model
   SNP.pos      :   SNP position in bp   
   SNP.MAF      :   SNP minor allele frequency  
   SNP.qual     :   SNP imputation quality  
   chi2         :   SNP chi squared statistic
   n.tot        :   No of permutations executed
   n.better     :   No of permutations with better chi2
   pval         :   p-value
   isBest       :   0/1 indicating if this SNP has best chi2 in the gene.

Method : minSNP using Logistic Regression

Options : --logistic-minsnp (no permutations) --logistic-minsnp-perm (with permutations)

Output Filename : Out.chrXX.minSNP.Logistic.txt (for minSNP)

Description of methods: One line for each snp mapped to a gene. Both methods assign pvalue of a 
gene with the pvalue of its most significant SNP, which in the output file is indicated by the 
row whose isBest column has value 1. Minsnp calculates this pvalue with empirical methods 
(F-statistic/Chi2), while minsnp-perm gets this pvalue with permutations (this is because for 
variants with low MAF, empirical method loses power). For Minsnp-perm, permutations are performed 
only for the most significant SNP in the original test. n.tot and n.better are output only for minsnp-perm.

Format
   Chr          :   Chromosome
   GeneID       :   Unique gene id 
   Name         :   Gene name
   Start        :   Gene start in bp (includes flank)
   End          :   Gene end in bp (includes flank)
   Length       :   Gene length in bp (includes flank)
   SNPs         :   No. of snps in the gene
   Tests        :   Effective no. of snps in the gene
   SNP.name     :   SNP entering the model
   SNP.pos      :   SNP position in bp   
   SNP.MAF      :   SNP minor allele frequency  
   SNP.qual     :   SNP imputation quality      
   chi2         :   SNP chi squared statistic
   n.tot        :   No of permutations executed
   n.better     :   No of permutations with better chi2
   pval         :   p-value
   isBest       :   0/1 indicating if this SNP has best chi2 in the gene.

Method : minSNP Gene using Linear Regression

Options : --linear-minsnp-gene-perm (with permutations)

Output Filename : Out.chrXX.minSNP_Gene.Linear.txt

Description of method: Different from minSNP, which does permutations on the single best SNP in a gene; for 
Minsnp-gene-perm, permutations are done for each of the SNPs in a gene. The test statistic(or pvalue) of the 
most significant SNP in each of the permutations is compared with the most significant SNP from original data,
to get a pvalue on the gene-level. The meaning of the columns are the same as with minsnp and minsnp-perm methods.

Format
   Chr          :   Chromosome
   GeneID       :   Unique gene id
   Name         :   Gene name
   Start        :   Gene start in bp (includes flank)
   End          :   Gene end in bp (includes flank)
   Length       :   Gene length in bp (includes flank)
   SNPs         :   No. of snps in the gene
   Tests        :   Effective no. of snps in the gene
   SNP.name     :   SNP entering the model
   SNP.pos      :   SNP position in bp
   SNP.MAF      :   SNP minor allele frequency
   SNP.qual     :   SNP imputation quality
   chi2         :   SNP chi squared statistic
   n.tot        :   No of permutations executed
   n.better     :   No of permutations with better chi2
   pval         :   p-value
   isBest       :   0/1 indicating if this SNP has best chi2 in the gene.

Method : minSNP Gene using Logistic Regression

Options : --logistic-minsnp-gene-perm (with permutations)

Output Filename : Out.chrXX.minSNP_Gene.Logistic.txt

Description of method: Different from minSNP, which does permutations on the single best SNP in a gene; for 
Minsnp-gene-perm, permutations are done for each of the SNPs in a gene. The test statistic(or pvalue) of the 
most significant SNP in each of the permutations is compared with the most significant SNP from original data,
to get a pvalue on the gene-level. The meaning of the columns are the same as with minsnp and minsnp-perm methods.

Format
   Chr          :   Chromosome
   GeneID       :   Unique gene id
   Name         :   Gene name
   Start        :   Gene start in bp (includes flank)
   End          :   Gene end in bp (includes flank)
   Length       :   Gene length in bp (includes flank)
   SNPs         :   No. of snps in the gene
   Tests        :   Effective no. of snps in the gene
   SNP.name     :   SNP entering the model
   SNP.pos      :   SNP position in bp
   SNP.MAF      :   SNP minor allele frequency
   SNP.qual     :   SNP imputation quality
   chi2         :   SNP chi squared statistic
   n.tot        :   No of permutations executed
   n.better     :   No of permutations with better chi2
   pval         :   p-value
   isBest       :   0/1 indicating if this SNP has best chi2 in the gene.

Method : Bimbam using Linear Regression

Options : --linear-bf (no permutations) --linear-bf-perm (with permutations)

Filename : Out.chrXX.BF.Linear.txt

One line for each gene in the chromosome.

Format
   Chr          :   Chromosome
   GeneID       :   Unique gene id 
   Name         :   Gene name
   Start        :   Gene start in bp
   End          :   Gene end in bp
   Length       :   Gene length in bp
   SNPs         :   No. of snps in the gene
   Tests        :   Effective no. of snps in the gene
   BF_sum       :   Linear regression based Bayes  Factor sum for the gene.
   n.tot        :   No of permutations executed
   n.better     :   No of permutations with better BF_sum
   pval         :   pvalue

Method : Bimbam using Logistic Regression

Options : --logistic-bf (no permutations) --logistic-bf-perm (with permutations)

Filename : Out.chrXX.BF.Logistic.txt

One line for each gene in the chromosome.

Format
   Chr          :   Chromosome
   GeneID       :   Unique gene id 
   Name         :   Gene name
   Start        :   Gene start in bp
   End          :   Gene end in bp
   Length       :   Gene length in bp
   SNPs         :   No. of snps in the gene
   Tests        :   Effective no. of snps in the gene
   BF_sum       :   Logistic regression based Bayes  Factor sum for the gene.
   n.tot        :   No of permutations executed
   n.better     :   No of permutations with better BF_sum
   pval         :   pvalue

Method : Vegas using Linear Regression

Options : --linear-bf (no permutations) --linear-bf-perm (with permutations)

Filename : Out.chrXX.Vegas.Linear.txt

One line for each gene in the chromosome.

Format
   Chr          :   Chromosome
   GeneID       :   Unique gene id 
   Name         :   Gene name
   Start        :   Gene start in bp
   End          :   Gene end in bp
   Length       :   Gene length in bp
   SNPs         :   No. of snps in the gene
   Tests        :   Effective no. of snps in the gene
   Vegas_sum    :   Linear regression based Vegas score for the gene.
   n.tot        :   No of permutations executed
   n.better     :   No of permutations with better Vegas_sum
   pval         :   pvalue

Method : Vegas using Logistic Regression

Options : --logistic-bf (no permutations) --logistic-bf-perm (with permutations)

Filename : Out.chrXX.Vegas.Logistic.txt

One line for each gene in the chromosome.

Format
   Chr          :   Chromosome
   GeneID       :   Unique gene id 
   Name         :   Gene name
   Start        :   Gene start in bp
   End          :   Gene end in bp
   Length       :   Gene length in bp
   SNPs         :   No. of snps in the gene
   Tests        :   Effective no. of snps in the gene
   Vegas_sum    :   Logistic regression based Vegas score for the gene.
   n.tot        :   No of permutations executed
   n.better     :   No of permutations with better Vegas_sum
   pval         :   pvalue

Method : Gates using Linear Regression

Options : --linear-bf (no permutations)

Filename : Out.chrXX.Gates.Linear.txt

One line for each gene in the chromosome.

Format
   Chr          :   Chromosome
   GeneID       :   Unique gene id
   Name         :   Gene name
   Start        :   Gene start in bp
   End          :   Gene end in bp
   Length       :   Gene length in bp
   SNPs         :   No. of snps in the gene
   Tests        :   Effective no. of snps in the gene
   Gates        :   Linear regression based Gates score for the gene.
   pval         :   pvalue

Method : Gates using Logistic Regression

Options : --logistic-bf (no permutations)

Filename : Out.chrXX.Gates.Logistic.txt

One line for each gene in the chromosome.

Format
   Chr          :   Chromosome
   GeneID       :   Unique gene id
   Name         :   Gene name
   Start        :   Gene start in bp
   End          :   Gene end in bp
   Length       :   Gene length in bp
   SNPs         :   No. of snps in the gene
   Tests        :   Effective no. of snps in the gene
   Gates        :   Logistic regression based Gates score for the gene.
   pval         :   pvalue

Tip 
  1. When a method is specified with the –perm suffix, permutations are performed when mode=genotype, 
     simulations are performed when mode=summary.
  2. Tip If you only have a few genes on a chromosome, use the option ----linear-snp-gene or 
     --logistic-snp-gene. This will limit the single SNP computations to only these genes.

Method : Single SNP Cox PH Regression

Options : --cox-snp

Filename : Out.chrXX.allSNP.COX.txt

One line for each SNP.

Format
   SNP.id          : SNP name
   chr             : chromosome
   pos             : SNP position in base pairs
   NonCoded.Allele : Allele coded as 0
   Coded.Allele    : Allele coded as 1
   Beta            : Regression coefficient
   Se              : Regression standard error
   Z               : Z-test score
   log10BF         : Log Bayes Factor
   Coded.Af        : Allele frequency of coded allele
   Qual            : SNP imputation quality
   eSampleSize     : Effective sample size for the SNP computed as (#samples) x Qual x 2 x MAF x (1-MAF)
   nGenes          : No of genes to which this SNP belongs (set to 0)
   Nmiss           : Number of samples with missing values 
   pvalue          : SNP parametric pvalue 
   loglik          : Log-liklihood ratio of current model over null model

Method : Gene-based Cox PH Regression

Options : --cox-gene

Filename : Out.chrXX.allSNP.COX.GENE.txt

  1. First line of each model has SNP.name=NONE indicating the null (intercept only model when no 
     covariates, or covariates only model when covariates are present).
  2. Followed by one or more lines for each SNP added to the model. 
  3. Line with SNP.name=SUMMARY indicates end of the model for the gene. This line also prints the
     values of K, loglik, BIC for the  final K-snp model in the gene.

Format
   Chr                :   Chromosome
   GeneID             :   Unique gene id 
   Name               :   Gene name
   Start              :   Gene start in bp
   End                :   Gene end in bp 
   Length             :   Gene length in bp
   SNPs               :   No. of snps in the gene
   Tests              :   Effective no. of snps in the gene
   SNP.name           :   SNP entering the model
   SNP.pos            :   SNP position in bp     
   SNP.MAF            :   SNP minor allele frequency    
   SNP.qual           :   SNP imputation quality      
   K                  :   Current model size
   loglik             :   Log-liklihood ratio of current model over null model
   BIC                :   GWiS model score based on Cox PH model

Method : SAPPHO

Options : --sapphoI/--sapphoC

Filename : Out.sapphoI.result.txt/Out.sapphoC.result.txt

One line for each SNP in the model.

Format
   SNP.id          : SNP name
   Chr             : Chromosome Number
   Pos             : SNP position in base pairs
   NonCoded.Allele : Allele coded as 0
   Coded.Allele    : Allele coded as 1
   SNP.MAF         : SNP minor allele frequency 
   SNP.qual        : SNP imputation quality
   K               : Current number of association in the model
   log|Det(SIGMA)| : Determinant of the var-cov matrix of the residuals
   SapphoScore     : SAPPHO model score
   Pheno           : The phenotype that current SNP is associated with
   SapphoScoreDiff : Score for that SNP in the model, a measure of importance of that SNP

Additional Output files

Out.chrXX.allSNP.Linear.txt : This file lists the single SNP linear regression results for each SNP.

Format 
   SNP.id          : SNP name
   pos             : SNP position in base pairs
   NonCoded.Allele : Allele coded as 0
   Coded.Allele    : Allele coded as 1
   Beta            : Regression coefficient
   Se              : Regression standard error
   Chi2            : Chi Square
   logBF           : Log Bayes Factor
   Coded.Af        : Allele frequency of coded allele
   Qual            : SNP imputation quality
   eSampleSize     : Effective sample size for the SNP computed as (#samples) x Qual x 2 x MAF x (1-MAF)
   nGenes          : No of genes to which this SNP belongs
   Fmiss           : Fraction of samples with missing values 
   pvalue          : SNP parametric pvalue 

Out.chrXX.allSNP.Logistic.txt : This file lists the single SNP logistic regression results for each SNP.

Format 
   SNP.id          : SNP name
   Chr             : Chromosome
   pos             : SNP position in base pairs
   NonCoded.Allele : Allele coded as 0
   Coded.Allele    : Allele coded as 1
   Beta            : Regression coefficient
   Se              : Regression standard error
   Wald            : Wald-statistic
   logBF           : Log Bayes Factor
   Coded.Af        : Allele frequency of coded allele
   Qual            : SNP imputation quality
   eSampleSize     : Effective sample size for the SNP computed as (#samples) x Qual x 2 x MAF x (1-MAF)
   nGenes          : No of genes to which this SNP belongs
   Fmiss           : Fraction of samples with missing values 
   pvalue          : SNP parametric pvalue

Out.chrXX.geneSNP.txt : This file lists the mapping of each SNP and gene. A SNP can appear multiple times in this file if it belong to multiple overlapping genes.

Format
   SNP.name    : SNP name
   SNP.chr     : chromosome
   SNP.bp      : SNP position in base pairs
   GeneID      : Unique gene id
   Gene.name   : Gene name    
   Gene.start  : Gene start in bp 
   Gene.end    : Gene end in bp 
   SNP.maf     : Minor allele frequency
   Qual        : SNP quality
   eSampleSize : SNP effective sample size

Updated