Wiki
Clone wikiTango / Home
What is Tango ?
Tango stands for a Toolkit for ANalysis nGs Outputs. It's a collection of 3 (currently) command line tools for Short-Reads FASTQ preprocessing and analysis.
It has been designed in order to assist biologists to cope with Illumina data in the context of bacterial genomics.
SNIFER (SNp Illumina FindER)
Is a little tool searching and displaying information bout snp in BAM formated files.
Usage :
-b Alignment file in BAM format -f Reference file in FASTA format -w Window size used to calculate the average quality and depth around the mismatches
FIRE (FIlter for illumina REads)
Is a full featured reads quality filter for FASTQ files.
Usage :
-s Name of the single-end fastq file (-s single_file.fq) -1 Name of the first paired-end fastq file (-1 paired_file_1.fq) -2 Name of the second paired-end fastq file (-2 paired_file_2.fq) -C Quality values from Illumina pipeline v1.2 or higher have to be converted in PHRED quality values -d Describe the reads: calculation of base composition by position, calculation of minimum, maximum, average, median, quartiles 1 and 3 and inter quartile range quality by position, calculation of the number of homopolymers If the option -P is not used, a read is considered homopolymer if it is exclusively made of one nucleoide If the option -P is used, a read is considered homopolymer if it contains REMOVE_POLY or more consecutive times the same nucleotide) -D Describe the reads, same description as -d option but without calculation of median and quartiles quality per position This option is prefered to the -d option for very large data sets since it is less memory consuming -g Good quality threshold (bases with a quality lower than this value will be seen as low quality bases) If this option is used without the options -l or -n, only the reads for which all nucleotides have a quality bigger than or equal to this value will be kept If this option is used with the option -l, the reads for which LENGHT_QPERCENT % of their length has a quality bigger than or equal to this value will be kept If this option is used with the option -n, the reads for which CONSECUTIVE_NUCLEOTIDES or more nucleotides have a quality bigger than or equal to this value will be kept -l Percentage of read length that has to be of quality bigger than or equal to GOODQ_THRESHOLD for a read to be seen as a good quality read This option requires the option -g -b Bad quality threshold (bases with a quality lower than this value will be seen as bad quality bases) This option requires the option -m -m Maximum number of nucleotides with quality lower than or equal to BADQ_THRESHOLD allowed in a read This option requires the option -b -a Reads with an average quality bigger than or equal to this value are kept -n Number of consecutive nucleotide bases that have to be of quality bigger than or equal to GOODQ_THRESHOLD for a read to be seen as a good quality read This option requires the option -g -c Reads occurring more than COPY_NUMBER times are displayed -P Remove reads containing the same nucleotide (A, C, G or T) REMOVE_POLY times or morein a raw -N Remove reads containing Ns REMOVE_NS times or more -t Quality trimming threshold for the first bases of reads (All nucleotides of the begin of the reads with a quality lower than this threshold are trimmed) -T Quality trimming threshold for the last bases of reads (All nucleotides of the end of the reads with a quality lower than this threshold are trimmed) -L Reads shorter than MINIMUM_READ_LENGTH are discarded
DUPLEX (DUPLicates EXpurger)
Is a tool for searching duplicated reads in FASTQ formated formated files.
Usage :
NAME ./duplex -- Remove duplicates reads SYNOPSIS Usage: ./duplex -i <in.seq> Ä-z <int>Å Ä-n <int>Å Ä-e <double>Å Ä-fÅ Ä-hÅ DESCRIPTION INPUT: -i <in.fastq> a fastq file OUTPUT: <output_name>.fq.gz a fastq file of unique reads <output_name>_stats.txt.gz a simple tab delimited file with the sequence of a read and number of copy -z Compression level: with -z in Ä-1:9Å. -1: Default compression 0: No compression 1: Best speed 9: Best compression -n Reads Number -e False positive rate: with -e in Å0:1Ä. -f Force. If output files already exists, remove it and create a new file without prompting for confirmation regardless of its permissions. -h Display Usage.
Have fun!
Updated