Wiki
Clone wikienterobase-web / EnteroTools Old
EnteroTools currently consists of four “black boxes”: an engine for downloading and parsing metadata (MetaParser), a bulk downloading engine (RCatch), a calculation engine (CRobot) and a separate nomenclature server (NServ) which are accessible via APIs that communicate with the EnteroBase website. The website will also offer an API interface for access by external computers, but that is not yet available. EnteroTools is written in Python 2.7 with a PostgreSQL 9.3 database for storing information. Source code is available upon request.
MetaParser implements
- the automated downloading of all GenInfo Identifiers (GI numbers) in NCBI Short Read Archives (SRAs) or complete or partial assemblies with the genus designation Salmonella, Escherichia / Shigella, Yersinia or Moraxella, and the corresponding metadata (via ENTREZ utilities)
- parsing of the metadata into a consistent, EnteroBase format
Given a GI number RCatch downloads the corresponding short read archive from any of three major public sequence databases (SRA/NCBI, ENA/EBI and DRA/DDBJ)
CRobot is a calculation engine that assembles short read archives or user uploaded reads, evaluates and modifies the assemblies, and passes the final assemblies onto NServ for genotyping.
NServ genotypes genomes from assembly data. It currently handles the 7-gene MLST, rMLST, cgMLST and wgMLST genotyping schemes. Nserv does automatic nomenclature for all new genomes coming in, and concurrently synchronises 7-gene MLST with the main MLST web site and rMLST with the rMLST web site.
Components within CRobot
CRobot implements the following bioinformatic tools via the parameters and outputs that are documented in those links.
- QAssembly_ST - refers to sequential processing via QAssembly, QA evaluation, QAtoFasta and NServ.
- QAssembly - a one-stop solution from short reads to high quality assemblies, including read pre-processing, trimming, assembly, post-correction and filtering.
- QA evaluation - evaluates the quality of assemblies based on multiple criteria.
Workflow after receiving reads: 1. Automatic assembly
Assembly criteria for Salmonella
Metrics | Criteria |
---|---|
Number of bases | 4 Mbp – 5.8 Mbp |
N50 value | >20kb |
Number of contigs | <600 |
Proportion of scaffolding placeholders (N’s) | <3% |
Species assignment using Kraken | >70% contigs |
Assembly criteria for Escherichia/ Shigella
Metrics | Criteria |
---|---|
Number of bases | 3.7 Mbp – 6.4 Mbp |
N50 value | >20kb |
Number of contigs | <600 |
Proportion of scaffolding placeholders (N’s) | <3% |
Species assignment using Kraken | >70% contigs |
Assembly criteria for Yersinia
Metrics | Criteria |
---|---|
Number of bases | 3.7 Mbp – 5.5 Mbp |
N50 value | >15kb |
Number of contigs | <600 |
Proportion of scaffolding placeholders (N’s) | <3% |
Species assignment using Kraken | >65% contigs |
Assembly criteria for Moraxella
Metrics | Criteria |
---|---|
Number of bases | 1.8 Mbp – 2.6 Mbp |
N50 value | >20kb |
Number of contigs | <600 |
Proportion of scaffolding placeholders (N’s) | <3% |
Species assignment using Kraken | >65% contigs |
Updated