HTTPS SSH

<<<<<<< HEAD

WREP - Workflow for REPertoire sequencing

Data analysis workflow for T- and B-cell receptor repertoire sequencing.
The workflow identifies clones and calculates their frequency from sequence data (in fastq format) and includes steps for quality control and bias correction.

Workflow

workflow

Required software

Paths to this software is defined in execute-all.sh, run-fastqc.sh, batch-pear.sh and align-sequences.sh

Required data

Required for V and J assignment

Download the reference sequences of the chains you are interested in (nucleotide sequences in fasta format) from the IMGT website.

  • Select species, gene type, functionality (functional) and the locus in the GeneDB
  • Select all genes and download the "F+ORF+all P nucleotide sequences", store as e.g. TRBV_human.fasta and TRBJ_human.fasta
  • Do this for the variable and joining genes
  • Build a BWA index and a samtools faidx on the fasta files

Required for CDR3 identification

Download the peptide sequences of the variable genes (F+ORF+in-frame P amino acid sequences with IMGT gaps). Convert the downloaded fasta entries with the helper-ref-table.py script

Other requirements

  • Bash
  • Python 2.7
    • biopython
    • future (print_function)
    • gzip
    • matplotlib
    • numpy
    • os
    • regex
    • sqlite3
    • sys
  • R
    • beeswarm

How to run

The input files (in fastq format) can be specified by putting the paths in the file SAMPLES. At the top of execute-all.sh other parameters have to be set.

Job monitoring (on distributed resources)

Divide the samples over multiple (virtual) machines and run everything in parallel. You can download a lightweight job monitoring tool HERE.

How to cite

Barbera D. C. van Schaik, Paul L. Klarenbeek, Marieke E. Doorenspleet, Sabrina Pollastro, Anne Musters, Giulia Balzaretti, Rebecca E. Esveldt, Frank Baas, Niek de Vries and Antoine H. C. van Kampen (2016) T- and B-cell Receptor Repertoire Sequencing: Quality Control and Clone Identification. In prep.

License

WREP - Workflow for REPertoire data analysis
Copyright (C) 2016 Barbera DC van Schaik

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

=======
WREP has a new name:

RESEDA (REPertoire SEquencing Data Analysis)

bf7ac3c05245257ac59e733c0da9e0334abe90f5