Wiki

Q. How to run FAST on a specific chromosomal region ?

A : You can use the gene-set file (specified using option --gene-set filename) to define contiguous regions on the chromosome (i.e. hypothetical genes) for FAST to run.

Q. What are the input data formats FAST accepts ?

A: FAST can accept both text files as input as well as compressed files using gzip. The details of the input file formats are described in Input Formats. In addition to a specific file format that FAST accepts, it also accepts input files formatted as output from the imputation software IMPUTE2. However, note that in a typical imputation with IMPUTE2, the imputed data contains each chromosome split into chunks. Before running FAST, it is necessary to concatenate all the chunks for each chromosome into a single file so that gene boundaries do not span multiple chunks.

Q. How do I run case-control analysis ?

A : You can use logistic regression (option --logistic-<method name>) with either genotype or summary data.

Q. I have genotype data in PLINK format. How can I convert to FAST format ?

A : You can use the plink2fast utility located at Utils/PLINK2FAST/. You must provide PLINK tped files, NOT ped files as input. Read the Readme.txt file at Utils/PLINK2FAST/ for more details.

Q. Do I need any other software to run FAST ?

A : Yes, you will need the GNU Scientific Library (available from http://www.gnu.org/software/gsl/). For installation follow the steps mentioned in the INSTALL file within GSL directory, or follow the steps here. Also for running the utility script FAST.utils.sh to combine the results from multiple methods, you will need to install R and perl and the module ‘Statistics::Distributions’ available from http://search.cpan.org/~mikek/Statistics-Distributions-1.02/Distributions.pm. You can install this perl module by following the instructions here.

Q. Why did FAST crash ?

A : This can happen for a no. of reasons - mostly related to errors in formatting of the input data. Make sure that all the data files are tab-delimited and follow the one of the required input file format specified in Input File Formats. Also make sure that the snp names do not exceed 20 characters in length, all snps are biallelic, no hidden spaces or special characters are present in the input files and all snps have the same number of genotypes.

Another cause may be lack of memory especially when you are running FAST genome-wide with imputed data that may contain large genes spanning several thousand SNPs. Make sure you allocate enough memory for your process e.g when you run FAST in Sun Grid Engine based clusters.