Provides a collection of helper functions that coordinate multiple wrappers from the wrappers Module to accomplish a unified goal or automate a common analysis task.
Workflows are available for the following groups of tasks:
Bases: tuple
BlastHit(query, title, definition, id, evalue, rank, orient, mask, score, bitscore, length, percent)
Alias for field number 9
Alias for field number 2
Alias for field number 4
Alias for field number 3
Alias for field number 10
Alias for field number 7
Alias for field number 6
Alias for field number 11
Alias for field number 0
Alias for field number 5
Alias for field number 8
Alias for field number 1
Bases: tuple
ContigHeader(locus, transcript, confidence, length)
Alias for field number 2
Alias for field number 3
Alias for field number 0
Alias for field number 1
Implements the Oases-M protocol for merging several Oases assemblies, as described in:
Schulz, M. H., Zerbino, D. R., Vingron, M., & Birney, E. (2012). Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics (Oxford, England), 1-7. doi:10.1093/bioinformatics/bts094
Performs Oases assemblies sweeping over the provided kmers list, then performs a Oases merge assembly with merge_kmer.
Sum up the length of all contigs in the given fasta file.
Iterates through the records in fasta_in and looks for a hit in a dict of BlastHit object, hits.
For each record with a hit, the RPKM (if provided), hit title, and evalue are added to the ID and the record is written to hits_out.
If there is no hit, the record is written to misses_out.
If all_out is True, then hits are also written to misses_out.
Reads an XML formatted BLAST report, and yields one named tuple per alignment, i.e. per hit between a query and a subject. Each named tuple has the following elements:
query title definition id evalue rank orient mask score bitscore length percent
where:
Similar to blast_hits, but returns an OrderedDict keyed by query name with only one hit (the top hit) per query.
Blastn against rRNA, transferring sequences with or without a hit to their own files. Even when rRNA reads are removed prior to assembly, some may make it through and be assembled from the full dataset (including low frequency contaminant rRNAs).
Blastn against SwissProt, transferring sequences with or without a hit to their own files, used in comparing assemblies.
Blastn against univec, transferring sequences with or without a hit to their own files This removes sequences that still have adapters, or that are contaminated with plasmids (including the protein expression plasmids used to manufacture sample prep enzymes).
Parses the assembled contigs in fasta_path and writes a histogram of contig length to hist_path.
Writes the total contig count, mean length, and N50 length to the diagnostics.
Extracts a single exemplar transcript for each locus in an Oases assembly at input_path and writes it to output_path. Only transcripts longer than min_length are considered.
The exemplar is chosen as the transcript with the highest confidence score.
Parse the fasta file and return a SeqRecord for the contig with the longest length.
Prepares a single query file for the multiblast by dividing the queries into nodes = threads/cores many chunks, where threads is from the BioLite configuration file.
Executes the Blast operation blast (e.g. ‘blastx’) in parallel on each node, then concatenates the XML output into a single XML file out.
Automates Oases assemblies that sweep multiple kmers.
If inputs is a list of FASTQ files, they are automatically shuffled together. Or, provide a singleton list with the path to a pre-shuffled FASTQ file.
Cleans up a work directory that was used for an Oases assembly.
Performs Oases assemblies sweeping over the provided kmers list, and concatenates all contigs to concat_path.
If inputs is a list of FASTQ files, they are automatically shuffled together. Or, provide a singleton list with the path to a pre-shuffled FASTQ file.
Bases: tuple
rRNAhit(locus, gene, confidence, orient, query)
Alias for field number 2
Alias for field number 1
Alias for field number 0
Alias for field number 3
Alias for field number 4
Reads an XML formatted BLAST report, and saves one top hit per locus, using the transcript with the highest confidence for the locus.
The locus name and confidence are extracted from the query name with the supplied ‘unpack_header_func’ function.
Returns both a set of all the queries in the XML report, and a dictionary keyed by locus and storing the rRNA hits:
set(queries), dict(hits)