WREP - Workflow for REPertoire sequencing
Data analysis workflow for T- and B-cell receptor repertoire sequencing. The workflow identifies clones and calculates their frequency from sequence data (in fastq format) and includes steps for quality control and bias correction.
Paths to this software is defined in execute-all.sh, run-fastqc.sh, batch-pear.sh and align-sequences.sh
Required for V and J assignment
Download the reference sequences of the chains you are interested in (nucleotide sequences in fasta format) from the IMGT website.
- Select species, gene type, functionality (functional) and the locus in the GeneDB
- Select all genes and download the "F+ORF+all P nucleotide sequences", store as e.g. TRBV_human.fasta and TRBJ_human.fasta
- Do this for the variable and joining genes
- Build a BWA index and a samtools faidx on the fasta files
Required for CDR3 identification
Download the peptide sequences of the variable genes (F+ORF+in-frame P amino acid sequences with IMGT gaps). Convert the downloaded fasta entries with the helper-ref-table.py script
- Python 2.7
- future (print_function)
How to run
The input files (in fastq format) can be specified by putting the paths in the file SAMPLES. At the top of execute-all.sh other parameters have to be set.
Job monitoring (on distributed resources)
Divide the samples over multiple (virtual) machines and run everything in parallel. You can download a lightweight job monitoring tool HERE.
How to cite
Barbera D. C. van Schaik, Paul L. Klarenbeek, Marieke E. Doorenspleet, Sabrina Pollastro, Anne Musters, Giulia Balzaretti, Rebecca E. Esveldt, Frank Baas, Niek de Vries and Antoine H. C. van Kampen (2016) T- and B-cell Receptor Repertoire Sequencing: Quality Control and Clone Identification. In prep.
WREP - Workflow for REPertoire data analysis Copyright (C) 2016 Barbera DC van Schaik This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
======= WREP has a new name:
RESEDA (REPertoire SEquencing Data Analysis)