Clone wiki

enterobase-web / GrapeTree

Go Back to Wiki Homepage

Go Back to GrapeTree Homepage

GrapeTree Documentation

GrapeTree is a novel visualization component within EnteroBase. It generates GrapeTree figures using the neighbor-joining (NJ) algorithm, the classical minimal spanning tree algorithm (MSTree) similar to PhyloViz, or an improved minimal spanning tree algorithm which we call MSTree V2.

GrapeTree is also available as a stand-alone version, Click Here.

Installation instructions, Manuals and Tutorials are available in this site

The source code for GrapeTree is available, Click Here.

GrapeTree is also available as a live online demo, Click Here.

Aims of GrapeTree

GrapeTree aims to address two central issues:

  • We have found that classical phylograms do not scale visually when showing large number of taxa.
  • Some implementations of Minimum spanning tree and Neighbor joining do not scale when showing large number of taxa and do not correctly handle missing data.

About GrapeTree

GrapeTree is our name for the cluster of related bacterial strains that tends to be presented in minimal spanning trees. Our GrapeTree GUI is available within EnteroBase once you have created a workspace or connected to somebody else's workspace. It is also available here as a stand-alone version. The EnteroBase version interacts directly with EnteroBase data whereas you need to provide your own data for the standalone version.

Our new approach for minimum spanning trees, MSTreeV2, calculates distances by Edmond's algorithm, which is a directed version of the minimal spanning tree algorithm that accounts for missing data correctly (

Edmond's algorithm is very important for core genome MLST (cgMLST) because we estimate that on average one or more of the 3002 cgMLST loci in Salmonella is not assigned an allele number for each entry because the genes are not assembled, disrupted or defective. The same applies to individual SNPs in a core genome SNP comparison. The result is that topologies are distorted when directional distances from more data to less data are not used.