Wiki

EnteroTools currently consists of four “black boxes”: an engine for downloading and parsing metadata (MetaParser), a bulk downloading engine (RCatch), a calculation engine (CRobot) and a separate nomenclature server (NServ) which are accessible via APIs that communicate with the EnteroBase website. The website will also offer an API interface for access by external computers, but that is not yet available. EnteroTools is written in Python 2.7 with a PostgreSQL 9.3 database for storing information. Source code is available upon request.

MetaParser implements

the automated downloading of all GenInfo Identifiers (GI numbers) in NCBI Short Read Archives (SRAs) or complete or partial assemblies with the genus designation Salmonella, Escherichia / Shigella, Yersinia or Moraxella, and the corresponding metadata (via ENTREZ utilities)
parsing of the metadata into a consistent, EnteroBase format

Given a GI number RCatch downloads the corresponding short read archive from any of three major public sequence databases (SRA/NCBI, ENA/EBI and DRA/DDBJ)

CRobot is a calculation engine that assembles short read archives or user uploaded reads, evaluates and modifies the assemblies, and passes the final assemblies onto NServ for genotyping.

NServ genotypes genomes from assembly data. It currently handles the 7-gene MLST, rMLST, cgMLST and wgMLST genotyping schemes. Nserv does automatic nomenclature for all new genomes coming in, and concurrently synchronises 7-gene MLST with the main MLST web site and rMLST with the rMLST web site.

Components within CRobot

CRobot implements the following bioinformatic tools via the parameters and outputs that are documented in those links.

QAssembly_ST - refers to sequential processing via QAssembly, QA evaluation, QAtoFasta and NServ.
- QAssembly - a one-stop solution from short reads to high quality assemblies, including read pre-processing, trimming, assembly, post-correction and filtering.
- QA evaluation - evaluates the quality of assemblies based on multiple criteria.

Workflow after receiving reads: 1. Automatic assembly

Assembly criteria for Salmonella

Metrics	Criteria
Number of bases	4 Mbp – 5.8 Mbp
N50 value	>20kb
Number of contigs	<600
Proportion of scaffolding placeholders (N’s)	<3%
Species assignment using Kraken	>70% contigs

Assembly criteria for Escherichia/ Shigella

Metrics	Criteria
Number of bases	3.7 Mbp – 6.4 Mbp
N50 value	>20kb
Number of contigs	<600
Proportion of scaffolding placeholders (N’s)	<3%
Species assignment using Kraken	>70% contigs

Assembly criteria for Yersinia

Metrics	Criteria
Number of bases	3.7 Mbp – 5.5 Mbp
N50 value	>15kb
Number of contigs	<600
Proportion of scaffolding placeholders (N’s)	<3%
Species assignment using Kraken	>65% contigs

Assembly criteria for Moraxella

Metrics	Criteria
Number of bases	1.8 Mbp – 2.6 Mbp
N50 value	>20kb
Number of contigs	<600
Proportion of scaffolding placeholders (N’s)	<3%
Species assignment using Kraken	>65% contigs