Calling external tools

wrappers Module

A series of wrappers for external calls to various bioinformatics tools.

class biolite.wrappers.BaseWrapper(name)[source]

A base class that handles generic wrapper functionality.

Wrappers for specific programs should inherit this class, call self.init to specify their name (which is a key into the executable entries in the BioLite configuration file), and append their arguments to the self.args list.

By convention, a wrapper should call self.run() as the final line in its __init__ function. This allows for clean syntax and use of the wrapper directly, without assigning it to a variable name, e.g.

wrappers.MyWrapper(arg1, arg2, ...)

When your wrapper runs, BaseWrapper will do the following:

  • log the complete command line to diagnostics;
  • optionally call the program with a version flag (invoked with version) to obtain a version string, then log this to the programs Table along with a hash of the binary executable file;
  • append the command’s stderr to a file called name.log in the CWD;
  • also append the command’s stdout to the same log file, unless you set self.stdout, in which case stdout is redirected to a file of that name;
  • on Linux, add a memory profiling library to the LD_PRELOAD environment variable;
  • call the command and check its return code (which should be 0 on success, unless you specify a different code with self.return_ok), optionally using the CWD specified in self.cwd or the environment specified in self.env.
  • parse the stderr of the command to find [biolite.profile] markers and use the rusage values from utils.safe_call to populate a profile entity in the diagnostics with walltime, usertime, systime, mem, and vmem attributes.
init(name)

A shortcut for calling the BaseWrapper __init__ from a subclass.

check_arg(flag, value)[source]

If value evaluates to True, append flag and value to the argument list.

add_threading(flag)[source]

Indicates that this wrapper should use threading by appending an argument with the specified flag followed by the number of threads specified in the BioLite configuration file.

add_openmp()[source]

Indicates that this wrapper should use OpenMP by setting the $OMP_NUM_THREADS environment variable equal to the number of threads specified in the BioLite configuration file.

version(flag=None, cmd=None, path=None)[source]

Generates and logs a hash to distinguish this particular installation of the program (on a certain host, with a certain compiler, program version, etc.)

Specify the optional ‘binary’ argument if the wrapper name is not actually the program, e.g. if your program has a Perl wrapper script. Set ‘binary’ to the binary program that is likely to change between versions.

Specify the optional ‘cmd’ argument if the command to run for version information is different than what will be invoked by run (e.g. if the program has a perl wrapper script, but you want to version an underlying binary executable).

version_jar()[source]

Special case of version() when the executable is a JAR file.

run(cmd=None)[source]

Call this function at the end of your class’s __init__ function.

run_jar(mem=None)[source]

Special case of run() when the executable is a JAR file.

biolite.wrappers.estimate_insert_size()[source]

For tools that need insert sizes, use available estimates from the diagnostics database, or resort to the default values in the BioLite configuration file.

Returns an AttributeDict with the fields mean, stddev and max.

class biolite.wrappers.CountLines(*inputs)[source]

Bases: biolite.wrappers.BaseWrapper

usage: count_lines [-t THREADS] [INPUT ...]

Count the number of lines in the INPUT files using multiple threads to increase throughput.

class biolite.wrappers.Coverage(sam, stats)[source]

Bases: biolite.wrappers.BaseWrapper

usage: coverage [-i SAM] [-o STATS]

Parses a SAM alignment file and writes a coverage table to STATS with columns for the reference name, the length of the referene, and the number of reads covering it in the alignment.

class biolite.wrappers.Exclude(exclude_files, input_files, output_files, keep=False)[source]

Bases: biolite.wrappers.BaseWrapper

usage: exclude -x EXCLUDE_FILE [-k] [...] [-i INPUT ...] [-o OUTPUT ...]

Filters all the reads in the input files (FASTA or FASTQ is automatically detected) and excludes those with ids found in any of the EXCLUDE_FILEs.

If multiple input files are specified, these are treated as paired files. So if a sequence in one input is excluded, its pair is also excluded from the same position in all other input files.

If the -k flag is specified, invert the selection to keep instead of exclude.

class biolite.wrappers.Fastq2Fasta(fastq_path, fasta_path=None, qual_path=None, suffix=None)[source]

Bases: biolite.wrappers.BaseWrapper

usage: fastq2fasta -i FASTQ [...] [-o FASTA ...] [-q QUAL ...] [-a]
[-t OFFSET] [-s SUFFIX]

Converts each FASTQ input file to a FASTA file and quality score file with the names <basename>.fasta and <basename>.fasta.qual, where <basename> is the name of INPUT up to the last period (or with the names FASTA and QUAL if specified).

FASTA and QUAL are appended to (not truncated).

class biolite.wrappers.Fasta2Fastq(fasta_path, qual_path, fastq_path=None)[source]

Bases: biolite.wrappers.BaseWrapper

usage: fasta2fastq -i FASTA [...] -q QUAL [...] [-o FASTQ] [-a] [-t OFFSET]

Merges each FASTA file and its corresponding QUAL file into a FASTQ file with the name <basename>.fastq, where <basename> in the FASTA name up to the last period (or with name FASTQ if specified).

FASTQ is appended to (not truncated).

class biolite.wrappers.FilterIllumina(inputs, outputs, unpaired_output=None, offset=None, quality=None, nreads=None, adapters=True, bases=True, sep=None)[source]

Bases: biolite.wrappers.BaseWrapper

usage: filter_illumina [-i INPUT ...] [-o OUTPUT ...] [-u UNPAIRED-OUTPUT]
[-t OFFSET] [-q QUALITY] [-n NREADS] [-a] [-b] [-s SEP]

Filters out low-quality and adapter-contaminated reads from Illumina data.

If multiple input files are specified, these are treated as paired files. So if a sequence in one input is filtered, its pair is also filtered from the same position in all other input files.

class biolite.wrappers.Interleave(inputs, output, sep=None)[source]

Bases: biolite.wrappers.BaseWrapper

usage: interleave -i INPUT [...] [-o OUTPUT] [-s SEP]

Interleaves the records in the input files (FASTA or FASTQ is automatically detected) and writes them to OUTPUT, or to stdout if no OUTPUT is specified.

class biolite.wrappers.Randomize(input, output, order_mode=None, order_file='order.txt')[source]

Bases: biolite.wrappers.BaseWrapper

usage: randomize [-i INPUT] [-o OUTPUT] [-r READ-ORDER] [-w WRITE-ORDER]

Randomizes the order of sequences in each INPUT file and writes these to a corresponding OUTPUT file. By default, a new random write order is generated and saved to WRITE-ORDER, if specified. Alternatively, specifying a READ-ORDER file uses that order instead of a random one.

class biolite.wrappers.InsertStats(input, histogram=None, histogram_max=None)[source]

Bases: biolite.wrappers.BaseWrapper

usage: insert_stats -i SAM -o HIST -m MAX_INSERT

Reads a SAM alignment file and uses it to estimate the mean and std. dev. of the insert size of the mapped paired-end reads. A histogram of all insert sizes encountered is written to the HIST file.

class biolite.wrappers.PileupStats(input, histogram=None, overlap=None)[source]

Bases: biolite.wrappers.BaseWrapper

usage: pileup_stats -i PILEUP -o HIST -m OVERLAP

Reads a SAMtools pileup file and uses it to find potential sequence disconnects. A histogram of all disconnect events encountered is written to the HIST file.

class biolite.wrappers.FastQC(input, outdir)[source]

Bases: biolite.wrappers.BaseWrapper

A wrapper for FastQC. http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/

class biolite.wrappers.Dustmasker(input, output, window=None, level=None, linker=None, infmt='fasta', outfmt='fasta')[source]

Bases: biolite.wrappers.BaseWrapper

A wrapper for dustmasker from NCBI Blast+. http://nebc.nerc.ac.uk/bioinformatics/docs/blast+.html

class biolite.wrappers.Segmasker(input, output, window=None, locut=None, hicut=None, infmt='fasta', outfmt='fasta')[source]

Bases: biolite.wrappers.BaseWrapper

A wrapper for segmasker from NCBI Blast+. http://nebc.nerc.ac.uk/bioinformatics/docs/blast+.html

class biolite.wrappers.Blastn(query, db, out, outfmt=5, evalue=0.0001, targets=20)[source]

Bases: biolite.wrappers.BaseWrapper

A wrapper for blastn from NCBI Blast. http://blast.ncbi.nlm.nih.gov/

class biolite.wrappers.Blastp(query, db, out, outfmt=5, evalue=0.0001, targets=20)[source]

Bases: biolite.wrappers.BaseWrapper

A wrapper for blastn from NCBI Blast. http://blast.ncbi.nlm.nih.gov/

class biolite.wrappers.Blastx(query, db, out, outfmt=5, evalue=0.0001, targets=20)[source]

Bases: biolite.wrappers.BaseWrapper

A wrapper for blastx from NCBI Blast. http://blast.ncbi.nlm.nih.gov/

class biolite.wrappers.Rpsblast(query, db, out, outfmt=5, evalue=0.0001)[source]

Bases: biolite.wrappers.BaseWrapper

A wrapper for blastn from NCBI Blast. http://blast.ncbi.nlm.nih.gov/

class biolite.wrappers.MultiBlast(blast, threads, qlist, db, out, evalue=0.0001, targets=20)[source]

Bases: biolite.wrappers.BaseWrapper

usage: multiblast BLAST THREADS QUERY_LIST OUT [ARGS]

Runs a Blast PROGRAM (e.g. blastx, blastn, blastp) in parallel on a list of queries (in QUERY_LIST). Additional arguments to PROGRAM can be appended as ARGS.

The PROGRAM is called on each query with threading equal to THREADS. Recommendation: set THREADS to the number of cores divided by the number of query files.

The individual XML outputs for each query file are concatenated into a single output file OUT.

Example usage: multiblast blastn 4 “query1.fa query2.fa” all-queries.xml -e 1e-6

class biolite.wrappers.MakeBlastDB(dbtype, in_name, db_name)[source]

Bases: biolite.wrappers.BaseWrapper

A wrapper for makeblastdb from NCBI Blast. http://blast.ncbi.nlm.nih.gov/

class biolite.wrappers.Bowtie2(inputs, mapping_file, output_path, local=True, sensitive=True, all=True, max_insert=None)[source]

Bases: biolite.wrappers.BaseWrapper

A wrapper for the bowtie2 short-read aligner. http://bowtie-bio.sourceforge.net/

class biolite.wrappers.Bowtie2Build(input_path, outdir_path)[source]

Bases: biolite.wrappers.BaseWrapper

A wrapper for bowtie2-build component of Bowtie2. http://bowtie-bio.sourceforge.net/

class biolite.wrappers.SamToBam(input_path, output_path)[source]

Bases: biolite.wrappers.BaseWrapper

class biolite.wrappers.SamView(input_path, regions, output_path)[source]

Bases: biolite.wrappers.BaseWrapper

class biolite.wrappers.SamSort(input_path, output_path)[source]

Bases: biolite.wrappers.BaseWrapper

class biolite.wrappers.SamIndex(input_path)[source]

Bases: biolite.wrappers.BaseWrapper

class biolite.wrappers.SamPileup(reference_path, bam_path, output_path)[source]

Bases: biolite.wrappers.BaseWrapper

class biolite.wrappers.Trinity(inputs, outdir, max_insert=None, min_length=None, seq_type='fq')[source]

Bases: biolite.wrappers.BaseWrapper

class biolite.wrappers.ParallelButterfly(commands, *args, **kwargs)[source]

Bases: biolite.wrappers.BaseWrapper

class biolite.wrappers.Oases(outdir, ins_length=None, ins_length_sd=None, min_length=None, merge=False)[source]

Bases: biolite.wrappers.BaseWrapper

A wrapper for Oases, a de novo transcriptome assembler. http://www.ebi.ac.uk/~zerbino/oases/

class biolite.wrappers.VelvetH(inputs, outdir, kmer=61, merge=False)[source]

Bases: biolite.wrappers.BaseWrapper

A wrapper for the velveth component of the Velvet de novo assember. http://www.ebi.ac.uk/~zerbino/velvet/

If merge is True, input_path must be a list of transcript FASTA files. Otherwise, input_path should a single FASTQ filename containing shuffled short reads or a list of FASTQ files where the first two form a paired file and the third is unpaired short reads.

class biolite.wrappers.VelvetG(outdir, ins_length=None, ins_length_sd=None, min_length=None, merge=False, exp_cov='auto')[source]

Bases: biolite.wrappers.BaseWrapper

A wrapper for the velvetg component of the Velvet de novo assember. http://www.ebi.ac.uk/~zerbino/velvet/

class biolite.wrappers.Macse(input, output, frameshift=-40, stopcodon=-150)[source]

Bases: biolite.wrappers.BaseWrapper

Multiple alignment of coding sequences.

class biolite.wrappers.ParallelMacse(inputs, outputs, frameshift=-40, stopcodon=-150, commands='macse.commands.txt')[source]

Bases: biolite.wrappers.BaseWrapper

Multiple alignment of coding sequences, run in parallel.

class biolite.wrappers.Raxml(input, output, model, output_dir, pars_rseed=None, extra_flags=None)[source]

Bases: biolite.wrappers.BaseWrapper

Maximum Likelihood based inference of phylogenetic trees.

class biolite.wrappers.Gblocks(input, t='p', b1=None, b2=None, b3=10, b4=5, b5='a')[source]

Bases: biolite.wrappers.BaseWrapper

Selection of conserved block from multiple sequence alignments for phylogenetics.

class biolite.wrappers.Mcl(input, output, inflation=2.1)[source]

Bases: biolite.wrappers.BaseWrapper

Analysis of networks.

class biolite.wrappers.Parallel(commands, *args, **kwargs)[source]

Bases: biolite.wrappers.BaseWrapper

GNU parallel utility http://www.gnu.org/software/parallel/

class biolite.wrappers.Sga(command, *args, **kwargs)[source]

Bases: biolite.wrappers.BaseWrapper

String Graph Assembler

class biolite.wrappers.Transdecoder(input)[source]

Bases: biolite.wrappers.BaseWrapper

Identification of candidate coding sequences http://transdecoder.sourceforge.net

class biolite.wrappers.Oma(workdir)[source]

Bases: biolite.wrappers.BaseWrapper

Table Of Contents

Previous topic

Generating reports

Next topic

Automating workflows

This Page