Wiki

Clone wiki

enterobase-web / EnteroBase Backend Pipeline: SeroPred

Top level links:

SeroPred

Overview

SeroPred is a serovar prediction pipeline and is run on an assembly (after successful assembly from short reads by QAssembly) in order to predict the serovar from the assembled sequences in the case of the Salmonella and Escherichia/Shigella databases. (Serovar prediction is carried out using SISTR in the case of Salmonella.) The serovar predictions can be viewed in search results in the experimental data area on the right-hand side by selecting Serovar Prediction from the Experimental Data drop down menu.

SeroPred is currently in version 1.0.

Serovar prediction for Salmonella

Serovar prediction is done using SISTR in the case of Salmonella, by running the sistr_cmd.py Python program in the command line version of SISTR (version 0.3.2). SISTR makes serovar predictions from whole genome sequence assemblies by determination of antigen gene and cGMLST gene alleles using BLAST. (The command line NCBI BLAST version 2.2.31 is used.)

Serovar prediction for Escherichia/Shigella

Serovar prediction is done by initially creating a BLAST database with the program makeblastdb. Then the sequences in a FASTA file, used as a database of antigen genes, are aligned versus the assembly sequences, using the program blastn from BLAST. The serovar prediction is made on the basis of antigen sequences from the database giving highest total alignment scores (where the summation is over the different hits in the assembly sequences obtained by an antigen gene sequence).

(As with the usage in conjunction with SISTR in the case of Salmonella, the command line NCBI BLAST version 2.2.31 is used.)

Updated