What is this repository for?
This repository contains scripts and Java code used in the maize HapMap3 project.
The pipeline consists of four steps: raw genote calling, IBD-based filtering
of the raw calls, LD filtering, and LDKNN imputation. Each of these steps is accomplished
using custom Java code and some
perl helper scripts. The Java code utilizes
classes from the TASSEL package developed in Buckler Lab at Cornell University.
Different versions of TASSEL are used in different pipeline programs. They are all included
in this distribution.
The HapMap3 project involves large amounts of data, processed over several years. Individual parts of the pipeline have been executed separately, generating intermediate output files to be used in subsequent steps. The computations have been parallelized in different ways (some over taxa, others - over genomic coordinate) and distributed over multiple machines.
The code runs on Linux only.
How do I get set up?
Download the repository
git clone https://email@example.com/bukowski1/maize_hapmap3_code.git
To do this, you will need to have git installled on your machine.
git clone command will create a directory
maize_hapmap3_code in your current
directory where the command was executed. This directory will contain the following
- IBD: helper scripts for the IBD filtering step
- LD : helper scripts for the LD filtering and coversion of genotypes to VCF format with proper INFO field parameters
- LDKNNi: helper scripts for the LDKNN imputation step
- raw_genos: helper scripts for the raw genotyping step
- java_code: contains Netbeans projects with the Java code used in the pipeline
Each of the subfolders contains a
README file explaining how to execute this
In each of the subfolders of
maize_hapmap3_code except for
edit all shell and perl scripts (all
*.pl files) and change declaration
of the variable
BINDIR to the full path of the
To examine or modify the java code, it would be convenient to set it up in a Java IDE, such
as Netbeans or Eclipse. In the directory
java_code, each of the subfolders
tassel5 is a separate Nebeans project and should
be straightforward to open in Netbeans, possibly with some minor confiburation adjustments.
The pipeline requires
samtools to be available. The
samtools command should be in the
How to run tests
The are currently no test data to run on.
Owner and admin of the repository
Institute of Biotechnology
Biotechnology Resource Center
Bioinformatics Facility (formerly: CBSU)
620 Rhodes Hall, Ithaca, NY 14853