A series of wrappers for external calls to various bioinformatics tools.
A base class that handles generic wrapper functionality.
Wrappers for specific programs should inherit this class, call self.init to specify their name (which is a key into the executable entries in the BioLite configuration file), and append their arguments to the self.args list.
By convention, a wrapper should call self.run() as the final line in its __init__ function. This allows for clean syntax and use of the wrapper directly, without assigning it to a variable name, e.g.
wrappers.MyWrapper(arg1, arg2, ...)
When your wrapper runs, BaseWrapper will do the following:
A shortcut for calling the BaseWrapper __init__ from a subclass.
If value evaluates to True, append flag and value to the argument list.
Indicates that this wrapper should use threading by appending an argument with the specified flag followed by the number of threads specified in the BioLite configuration file.
Indicates that this wrapper should use OpenMP by setting the $OMP_NUM_THREADS environment variable equal to the number of threads specified in the BioLite configuration file.
Generates and logs a hash to distinguish this particular installation of the program (on a certain host, with a certain compiler, program version, etc.)
Specify the optional ‘binary’ argument if the wrapper name is not actually the program, e.g. if your program has a Perl wrapper script. Set ‘binary’ to the binary program that is likely to change between versions.
Specify the optional ‘cmd’ argument if the command to run for version information is different than what will be invoked by run (e.g. if the program has a perl wrapper script, but you want to version an underlying binary executable).
For tools that need insert sizes, use available estimates from the diagnostics database, or resort to the default values in the BioLite configuration file.
Returns an AttributeDict with the fields mean, stddev and max.
Bases: biolite.wrappers.BaseWrapper
usage: count_lines [-t THREADS] [INPUT ...]
Count the number of lines in the INPUT files using multiple threads to increase throughput.
Bases: biolite.wrappers.BaseWrapper
usage: coverage [-i SAM] [-o STATS]
Parses a SAM alignment file and writes a coverage table to STATS with columns for the reference name, the length of the referene, and the number of reads covering it in the alignment.
Bases: biolite.wrappers.BaseWrapper
usage: exclude -x EXCLUDE_FILE [-k] [...] [-i INPUT ...] [-o OUTPUT ...]
Filters all the reads in the input files (FASTA or FASTQ is automatically detected) and excludes those with ids found in any of the EXCLUDE_FILEs.
If multiple input files are specified, these are treated as paired files. So if a sequence in one input is excluded, its pair is also excluded from the same position in all other input files.
If the -k flag is specified, invert the selection to keep instead of exclude.
Bases: biolite.wrappers.BaseWrapper
Converts each FASTQ input file to a FASTA file and quality score file with the names <basename>.fasta and <basename>.fasta.qual, where <basename> is the name of INPUT up to the last period (or with the names FASTA and QUAL if specified).
FASTA and QUAL are appended to (not truncated).
Bases: biolite.wrappers.BaseWrapper
usage: fasta2fastq -i FASTA [...] -q QUAL [...] [-o FASTQ] [-a] [-t OFFSET]
Merges each FASTA file and its corresponding QUAL file into a FASTQ file with the name <basename>.fastq, where <basename> in the FASTA name up to the last period (or with name FASTQ if specified).
FASTQ is appended to (not truncated).
Bases: biolite.wrappers.BaseWrapper
Filters out low-quality and adapter-contaminated reads from Illumina data.
If multiple input files are specified, these are treated as paired files. So if a sequence in one input is filtered, its pair is also filtered from the same position in all other input files.
Bases: biolite.wrappers.BaseWrapper
usage: interleave -i INPUT [...] [-o OUTPUT] [-s SEP]
Interleaves the records in the input files (FASTA or FASTQ is automatically detected) and writes them to OUTPUT, or to stdout if no OUTPUT is specified.
Bases: biolite.wrappers.BaseWrapper
usage: randomize [-i INPUT] [-o OUTPUT] [-r READ-ORDER] [-w WRITE-ORDER]
Randomizes the order of sequences in each INPUT file and writes these to a corresponding OUTPUT file. By default, a new random write order is generated and saved to WRITE-ORDER, if specified. Alternatively, specifying a READ-ORDER file uses that order instead of a random one.
Bases: biolite.wrappers.BaseWrapper
usage: insert_stats -i SAM -o HIST -m MAX_INSERT
Reads a SAM alignment file and uses it to estimate the mean and std. dev. of the insert size of the mapped paired-end reads. A histogram of all insert sizes encountered is written to the HIST file.
Bases: biolite.wrappers.BaseWrapper
usage: pileup_stats -i PILEUP -o HIST -m OVERLAP
Reads a SAMtools pileup file and uses it to find potential sequence disconnects. A histogram of all disconnect events encountered is written to the HIST file.
Bases: biolite.wrappers.BaseWrapper
A wrapper for FastQC. http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
Bases: biolite.wrappers.BaseWrapper
A wrapper for dustmasker from NCBI Blast+. http://nebc.nerc.ac.uk/bioinformatics/docs/blast+.html
Bases: biolite.wrappers.BaseWrapper
A wrapper for segmasker from NCBI Blast+. http://nebc.nerc.ac.uk/bioinformatics/docs/blast+.html
Bases: biolite.wrappers.BaseWrapper
A wrapper for blastn from NCBI Blast. http://blast.ncbi.nlm.nih.gov/
Bases: biolite.wrappers.BaseWrapper
A wrapper for blastn from NCBI Blast. http://blast.ncbi.nlm.nih.gov/
Bases: biolite.wrappers.BaseWrapper
A wrapper for blastx from NCBI Blast. http://blast.ncbi.nlm.nih.gov/
Bases: biolite.wrappers.BaseWrapper
A wrapper for blastn from NCBI Blast. http://blast.ncbi.nlm.nih.gov/
Bases: biolite.wrappers.BaseWrapper
usage: multiblast BLAST THREADS QUERY_LIST OUT [ARGS]
Runs a Blast PROGRAM (e.g. blastx, blastn, blastp) in parallel on a list of queries (in QUERY_LIST). Additional arguments to PROGRAM can be appended as ARGS.
The PROGRAM is called on each query with threading equal to THREADS. Recommendation: set THREADS to the number of cores divided by the number of query files.
The individual XML outputs for each query file are concatenated into a single output file OUT.
Example usage: multiblast blastn 4 “query1.fa query2.fa” all-queries.xml -e 1e-6
Bases: biolite.wrappers.BaseWrapper
A wrapper for makeblastdb from NCBI Blast. http://blast.ncbi.nlm.nih.gov/
Bases: biolite.wrappers.BaseWrapper
A wrapper for the bowtie2 short-read aligner. http://bowtie-bio.sourceforge.net/
Bases: biolite.wrappers.BaseWrapper
A wrapper for bowtie2-build component of Bowtie2. http://bowtie-bio.sourceforge.net/
Bases: biolite.wrappers.BaseWrapper
Bases: biolite.wrappers.BaseWrapper
Bases: biolite.wrappers.BaseWrapper
Bases: biolite.wrappers.BaseWrapper
Bases: biolite.wrappers.BaseWrapper
Bases: biolite.wrappers.BaseWrapper
Bases: biolite.wrappers.BaseWrapper
Bases: biolite.wrappers.BaseWrapper
A wrapper for Oases, a de novo transcriptome assembler. http://www.ebi.ac.uk/~zerbino/oases/
Bases: biolite.wrappers.BaseWrapper
A wrapper for the velveth component of the Velvet de novo assember. http://www.ebi.ac.uk/~zerbino/velvet/
If merge is True, input_path must be a list of transcript FASTA files. Otherwise, input_path should a single FASTQ filename containing shuffled short reads or a list of FASTQ files where the first two form a paired file and the third is unpaired short reads.
Bases: biolite.wrappers.BaseWrapper
A wrapper for the velvetg component of the Velvet de novo assember. http://www.ebi.ac.uk/~zerbino/velvet/
Bases: biolite.wrappers.BaseWrapper
Multiple alignment of coding sequences.
Bases: biolite.wrappers.BaseWrapper
Multiple alignment of coding sequences, run in parallel.
Bases: biolite.wrappers.BaseWrapper
Maximum Likelihood based inference of phylogenetic trees.
Bases: biolite.wrappers.BaseWrapper
Selection of conserved block from multiple sequence alignments for phylogenetics.
Bases: biolite.wrappers.BaseWrapper
Analysis of networks.
Bases: biolite.wrappers.BaseWrapper
GNU parallel utility http://www.gnu.org/software/parallel/
Bases: biolite.wrappers.BaseWrapper
String Graph Assembler