Tools for interactive phylogeny visualization
The anatomy of a phylogeny
A phylogeny is a depictions of the evolutionary relationships, such as a family tree for a
group of species or DNA sequences.
A phylogeny can be unrooted (ie, an undirected acyclic graph), in which case it isn't
indicated which node is the oldest and to assertion about the direction of time is made on
any edge, or rooted (ie, a DAG), in which case one node is declared to be the root
ancestor and time proceeds along edges from the root to the tips.
Edge length can have no meaning (a cladogram), indicate the amount of observed or expected
evolutionary time (a phylogram), or be proportional to time (a chronogram).
Sometimes the topology of the tree is the main thing of interest. Sometimes the topology
is the backbone for understanding other data that are mapped onto the tree.
The most common phylogeny datastructure is the Newick format:
It is also common to store trees as simple graph tables. There can be separate tables for
nodes and edges, or one table for nodes that includes a field that specifies the parent
NeXML (http://www.nexml.org) is a promising standard for representing phylogenetic data,
Most tools for phylogeny visualization render static snapshots.
These include stand-alone executables:
FigTree (http://tree.bio.ed.ac.uk/software/figtree/) - the primary workhorse used by the
scientific community used for drawing trees for publication. It enables basic interaction,
such as toggling
There are also libraries for tree manipulation and rendering in several languages,
ape (http://ape.mpl.ird.fr) - R tools for manipulating and analyzing trees, extensive
functionality for rendering trees and showing data on the trees. It is now very widely
used for analyzing evolutionary data on phylogenies, and increasingly often for rendering
trees for publication.
Dendropy (http://packages.python.org/DendroPy/tutorial/index.html) - python tools for
manipulating and analyzing trees, not much for displaying them
There are also a few web-based tools for viewing and exploring trees. These include:
jstree (http://lh3lh3.users.sourceforge.net/jstree.shtml) - an editor for phylogenetic
onezoom (http://www.onezoom.org) - a set of static trees that can be viewed in the browser
What tools are needed
The ideal tool would be:
- Scalable, working well for trees that have a handful of tips up to millions of tips
(there are about 2 million described species, and probably at least 10 million currently
living on the planet)
- Interactive, enabling the user to explore the tree (traversing different parts, changing
the zoom), manipulate the tree layout (move tips around, control node density, rotate
subtrees, etc), subset the tree according to data (such as removing all nodes for species
that were described after a particular date), and control what data is shown about nodes
and edges (toggle the names, control what color the nodes are)
- Have meaningful transitions when different portions of the tree are shown (eg, dynamic
layout that optimizes the view of the tree as nodes are added or removed) and data are
shown or hidden
itis_sql_to_json.py - parses taxonomic data from itis.gov into a tabular tree encoded in
json. Each node has an associated date. For tips, the date is the year oin which the
species was described. For internal nodes, it is the year that the youngest descendent
species was described. The json file is in a nodes, links format.
force - a d3 tree viewer, that operates on *.json files produced by itis_sql_to_json.py
Example use cases
Show the history of biological exploration
Show a tree with all described species and the date they were first described. Place a
slider below the tree that goes from the first described species to the present day. As
the user moves the slider, only the species that were described before the indicated date
are shown. This allows the user to see how some groups filled in slowly and then quickly
in a burst of discovery, and how entire new groups were discovered and then expanded.
- When the slider is all the way to the left there would be <50 species, when to the right
there would be hundreds of thousands or millions. The layout would dynamically change to
accommodate this change in density.
- As the slider moves, it would be very cool to change the color or transparency of the
nodes and edges that were just added in or are about to be removed. That would make it
easy to tell at a glance what is changing without depending on the motion itself.
The simplest way to implement this would be to parse the taxonomy of itis.gov and use it
as a proxy for the tree of life. itis is a database of categories, not relationships, but
we don't have a single tree with true relationships yet. itis also has the dates that
species were described on, which is convenient.
itis_sql_to_json.py parses the tree structure and date. It propagates dates to internal
nodes, so that given year Y on the slider all nodes with a date great than Y would be
removed from the current view.
I used itis_sql_to_json.py to generate siphonophorae.json, which can be used as a test
dataset to get the viewer working. We can then create larger and larger subtrees, up to
all the species in itis.