What are the different motif files?
Documentation for JASPAR motifs are available here
- Jaspar.txt : JASPAR CORE motifs only
- Jasparfix.txt : JASPAR motifs including uniprot and other collections
Why does jasparfix only contain 1316 motifs
The original set of 1331 motifs included 15 lower quality motifs generated by us which we have chosen not to include due to lower quality compared to JASPAR.
Which base is the 'coordinate' in the output
PIQ uses the leftmost base (closest to zero) of the motif in its output.
Call file includes entries with purity < 0.7
In the current version (v1.2) the caller outputs at least 50 binding sites for each TF regardless of purity. If this behavior is unwanted, the beta branch does not do this.
In addition, purity is not strictly monotone with score, so there will be some sites with purity < 0.7, as the caller finds the score cutoff which allows for the largest number of sites with purity > 0.7
Will this work with FAIRE / ATAC-seq?
We've observed ATAC-seq data of the same coverage seems to perform similarly to DNase-seq. Some users at the NIH have reported success with FAIRE-seq but only for factors with large effects on chromatin, such as CTCF or NSRF.
Note: We don't do the 4/5 bp offsetting of the atac-seq data. This does not matter if your goal is just motif bound/unbound calls, since we will just learn a TF binding footprint that is shifted by 4/5 bp depending on strand. If you are looking at the diag.pdf output of PIQ and interpreting the positioning of the footprint relative to motif start, then each strand will have to be shifted by the appropriate amount.
What about paried-end data?
Using the function pairedbam2rdata.r allows for paired-end aware PIQ.