Overview

HTTPS SSH
harp -- Haplotype Analysis of Reads in Pools
Created by Darren Kessner with John Novembre at UCLA.

Copyright (c) 2012 Regents of the University of California

------------------------------------------------------------------------------------------

harp implements an EM algorithm to calculate the frequencies of known haplotypes
from pooled sequence data.

Details on the method can be found in the paper:

Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data 
Darren Kessner; Tom Turner; John Novembre
2013 Molecular Biology and Evolution; doi: 10.1093/molbev/mst016

Link to Open Access article:
http://mbe.oxfordjournals.org/cgi/content/abstract/mst016

------------------------------------------------------------------------------------------

Running harp:

Running harp on the command line with no arguments will give usage information.
Arguments may be passed on the command line or in a configuration file.

harp can be run in two modes:

1) Single reference:  pooled reads have been mapped to a single reference, with
the assumption that the haplotypes represent strains that are identical to the
reference except for single-nucleotide variants.  In this mode, harp needs:
  - single BAM file with mapped reads
  - reference sequence in FASTA format
  - SNP file in DGRP format, which is a comma-separated table containing variants
    by genomic position and strain (see test_files/sparse_snps.txt for an example).
    The included tool index_snp_table can be used to speed up access to this file.

2)  Multiple reference:  pooled reads have been mapped to multiple references,
one for each strain/species of interest.  In this mode, harp needs:
  - a list of BAM files (one for each mapping)
  - a corresponding list of reference sequences in FASTA format

Typical usage:

1) Use the harp subfunction "like" or "like_multi" (for single or multiple
reference, respectively) to create an intermediate binary file (extension .hlk)
containing the computed haplotype likelihoods.

2) Use the harp subfunction "freq" on the .hlk file to run the frequency estimation.

More details and examples may be found in harp_docs.pdf.

------------------------------------------------------------------------------------------

Building harp:

Source code for harp is hosted on Bitbucket:
https://bitbucket.org/dkessner/harp 

harp uses the samtools API for accessing BAM files.  Before building harp, you
will need to download and build samtools.  This can be done with the script
get_samtools.sh in the source directory.

harp makes extensive use of the Boost C++ libraries, as well as the Boost build
system, i.e. you will need to install Boost before building harp.  To build,
run "bjam" from the project directory, where it will find Jamroot for the build
instructions.  Executables will be placed in the "bin" subdir.