Wiki

Clone wiki

Tango / Home

What is Tango ?

Tango stands for a Toolkit for ANalysis nGs Outputs. It's a collection of 3 (currently) command line tools for Short-Reads FASTQ preprocessing and analysis.

It has been designed in order to assist biologists to cope with Illumina data in the context of bacterial genomics.

SNIFER (SNp Illumina FindER)

Is a little tool searching and displaying information bout snp in BAM formated files.

Usage :

-b	Alignment file in BAM format
-f	Reference file in FASTA format
-w	Window size used to calculate the average quality and depth around the mismatches

FIRE (FIlter for illumina REads)

Is a full featured reads quality filter for FASTQ files.

Usage :

-s	Name of the single-end fastq file (-s single_file.fq)
-1	Name of the first paired-end fastq file (-1 paired_file_1.fq)
-2	Name of the second paired-end fastq file (-2 paired_file_2.fq)
-C	Quality values from Illumina pipeline v1.2 or higher have to be converted in PHRED quality values
-d	Describe the reads: calculation of base composition by position, calculation of minimum, maximum, average, median, quartiles 1 and 3 and inter quartile range quality by position, calculation of the number of homopolymers
	If the option -P is not used, a read is considered homopolymer if it is exclusively made of one nucleoide
	If the option -P is used, a read is considered homopolymer if it contains REMOVE_POLY or more consecutive times the same nucleotide)
-D	Describe the reads, same description as -d option but without calculation of median and quartiles quality per position
	This option is prefered to the -d option for very large data sets since it is less memory consuming
-g	Good quality threshold (bases with a quality lower than this value will be seen as low quality bases)
	If this option is used without the options -l or -n, only the reads for which all nucleotides have a quality bigger than or equal to this value will be kept
	If this option is used with the option -l, the reads for which LENGHT_QPERCENT % of their length has a quality bigger than or equal to this value will be kept
	If this option is used with the option -n, the reads for which CONSECUTIVE_NUCLEOTIDES or more nucleotides have a quality bigger than or equal to this value will be kept
-l	Percentage of read length that has to be of quality bigger than or equal to GOODQ_THRESHOLD for a read to be seen as a good quality read
	This option requires the option -g
-b	Bad quality threshold (bases with a quality lower than this value will be seen as bad quality bases)
	This option requires the option -m
-m	Maximum number of nucleotides with quality lower than or equal to BADQ_THRESHOLD allowed in a read
	This option requires the option -b
-a	Reads with an average quality bigger than or equal to this value are kept
-n	Number of consecutive nucleotide bases that have to be of quality bigger than or equal to GOODQ_THRESHOLD for a read to be seen as a good quality read
	This option requires the option -g
-c	Reads occurring more than COPY_NUMBER times are displayed
-P	Remove reads containing the same nucleotide (A, C, G or T) REMOVE_POLY times or morein a raw
-N	Remove reads containing Ns REMOVE_NS times or more
-t	Quality trimming threshold for the first bases of reads (All nucleotides of the begin of the reads with a quality lower than this threshold are trimmed)
-T	Quality trimming threshold for the last bases of reads (All nucleotides of the end of the reads with a quality lower than this threshold are trimmed)
-L	Reads shorter than MINIMUM_READ_LENGTH are discarded

DUPLEX (DUPLicates EXpurger)

Is a tool for searching duplicated reads in FASTQ formated formated files.

Usage :

NAME
./duplex -- Remove duplicates reads

SYNOPSIS
	Usage: ./duplex -i <in.seq> Ä-z <int>Å Ä-n <int>Å Ä-e <double>Å Ä-fÅ Ä-hÅ

DESCRIPTION
	INPUT: -i <in.fastq> a fastq file
	OUTPUT:
		<output_name>.fq.gz a fastq file of unique reads
		<output_name>_stats.txt.gz a simple tab delimited file with the sequence of a read and number of copy
	-z Compression level: with -z in Ä-1:9Å.
		 -1: Default compression
		  0: No compression
		  1: Best speed
		  9: Best compression
	-n Reads Number
	-e False positive rate: with -e in Å0:1Ä.
	-f Force. If output files already exists, remove it and create a new file without prompting for confirmation regardless of its permissions. 
	-h Display Usage.

Have fun!

Updated