Commits

Casey Dunn committed 0e69bae

reformatted README with markdown and updated it

  • Participants
  • Parent commits a5930f6

Comments (0)

Files changed (2)

README

-phylogeny-d3
-
-Tools for interactive phylogeny visualization
-
-
-
-Background
-==========
-==========
-
-The anatomy of a phylogeny
-==========================
-
-A phylogeny is a depictions of the evolutionary relationships, such as a family tree for a
-group of species or DNA sequences. 
-
-A phylogeny can be unrooted (ie, an undirected acyclic graph), in which case it isn't
-indicated which node is the oldest and to assertion about the direction of time is made on
-any edge, or rooted (ie, a DAG), in which case one node is declared to be the root
-ancestor and time proceeds along edges from the root to the tips. 
-
-Edge length can have no meaning (a cladogram), indicate the amount of observed or expected
-evolutionary time (a phylogram), or be proportional to time (a chronogram).
-
-Sometimes the topology of the tree is the main thing of interest. Sometimes the topology
-is the backbone for understanding other data that are mapped onto the tree.
-
-Data structures
-===============
-
-The most common phylogeny datastructure is the Newick format:
-
-http://en.wikipedia.org/wiki/Newick_format
-
-It is also common to store trees as simple graph tables. There can be separate tables for
-nodes and edges, or one table for nodes that includes a field that specifies the parent
-node.
-
-NeXML (http://www.nexml.org) is a promising standard for representing phylogenetic data, 
-including trees.
-
-
-Existing tools
-==============
-
-Most tools for phylogeny visualization render static snapshots. 
-
-These include stand-alone executables:
-
-FigTree (http://tree.bio.ed.ac.uk/software/figtree/) - the primary workhorse used by the
-scientific community used for drawing trees for publication. It enables basic interaction,
-such as toggling 
-
-Dendroscope (http://ab.inf.uni-tuebingen.de/software/dendroscope/)
-
-
-There are also libraries for tree manipulation and rendering in several languages,
-including:
-
-ape (http://ape.mpl.ird.fr) - R tools for manipulating and analyzing trees, extensive
-functionality for rendering trees and showing data on the trees. It is now very widely
-used for analyzing evolutionary data on phylogenies, and increasingly often for rendering
-trees for publication.
-
-Dendropy (http://packages.python.org/DendroPy/tutorial/index.html) - python tools for
-manipulating and analyzing trees, not much for displaying them
-
-
-There are also a few web-based tools for viewing and exploring trees. These include:
-
-jsPhyloSVG (http://www.jsphylosvg.com) - a javascript library for rendering svg's of 
-trees.
-
-jstree (http://lh3lh3.users.sourceforge.net/jstree.shtml) - an editor for phylogenetic 
-trees
-
-PhyloWidget (http://www.phylowidget.org)
-
-onezoom (http://www.onezoom.org) - a set of static trees that can be viewed in the browser
-
-
-What tools are needed
-=====================
-
-The ideal tool would be:
-
-- Scalable, working well for trees that have a handful of tips up to millions of tips
-(there are about 2 million described species, and probably at least 10 million currently
-living on the planet)
-
-- Interactive, enabling the user to explore the tree (traversing different parts, changing
-the zoom), manipulate the tree layout (move tips around, control node density, rotate
-subtrees, etc), subset the tree according to data (such as removing all nodes for species
-that were described after a particular date), and control what data is shown about nodes
-and edges (toggle the names, control what color the nodes are)
-
-- Have meaningful transitions when different portions of the tree are shown (eg, dynamic
-layout that optimizes the view of the tree as nodes are added or removed) and data are
-shown or hidden
-
-
-Files
-=====
-=====
-itis_sql_to_json.py - parses taxonomic data from itis.gov into a tabular tree encoded in
-json. Each node has an associated date. For tips, the date is the year oin which the
-species was described. For internal nodes, it is the year that the youngest descendent
-species was described. The json file is in a nodes, links format.
-
-force - a d3 tree viewer, that operates on *.json files produced by itis_sql_to_json.py
-
-
-
-Example use cases
-=================
-=================
-
-Show the history of biological exploration
-==========================================
-
-Show a tree with all described species and the date they were first described. Place a
-slider below the tree that goes from the first described species to the present day. As
-the user moves the slider, only the species that were described before the indicated date
-are shown. This allows the user to see how some groups filled in slowly and then quickly
-in a burst of discovery, and how entire new groups were discovered and then expanded.
-
-Some thoughts:
-- When the slider is all the way to the left there would be <50 species, when to the right
-there would be hundreds of thousands or millions. The layout would dynamically change to
-accommodate this change in density.
-
-- As the slider moves, it would be very cool to change the color or transparency of the
-nodes and edges that were just added in or are about to be removed. That would make it
-easy to tell at a glance what is changing without depending on the motion itself.
-
-The simplest way to implement this would be to parse the taxonomy of itis.gov and use it
-as a proxy for the tree of life. itis is a database of categories, not relationships, but
-we don't have a single tree with true relationships yet. itis also has the dates that
-species were described on, which is convenient.
-
-itis_sql_to_json.py parses the tree structure and date. It propagates dates to internal
-nodes, so that given year Y on the slider all nodes with a date great than Y would be
-removed from the current view.
-
-I used itis_sql_to_json.py to generate siphonophorae.json, which can be used as a test
-dataset to get the viewer working. We can then create larger and larger subtrees, up to
-all the species in itis.
+# phylogeny-d3
+
+phylogeny-d3 is a phylogeny viewer implemented with the d3 javascript library 
+(http://d3js.org). There are already many excellent phylogeny viewers out 
+there, this project is an experiment to see how well-suited the unique dynamic 
+interactive features of d3 are to phylogeny visualization. 
+
+
+## Example use
+
+To view the siphonophore phylogeny:
+
+    cd phylogeny-d3/force
+    python -m SimpleHTTPServer 8888 &
+    
+Then point your browser to http://localhost:8888/force.html. Click and drag the 
+nodes, mouse-over to see the names.
+
+The included sample tree, siphonophorae.json, has 191 nodes (internal and 
+external) and works well in chrome and safari. Here is how you would generate 
+the siphonophore tree from scratch, as well as some larger trees (they are 
+increasingly inclusive subgraphs that include siphonophores):
+
+    curl http://www.itis.gov/downloads/itisMySQLTables.tar.gz | tar zx
+    cd [path to phylogeny-d3...]
+    # In the command below, use the full paths to the files you downloaded above
+    # 191 node tree:
+    python itis_sql_to_json.py taxonomic_units strippedauthor 718928 > force/siphonophorae.json
+    # 2437 node tree:
+	python itis_sql_to_json.py taxonomic_units strippedauthor 48739 > force/hydrozoa.json
+	# 6240 node tree:
+	python itis_sql_to_json.py taxonomic_units strippedauthor 48738 > force/cnidaria.json
+	# 366809 node tree:
+	python itis_sql_to_json.py taxonomic_units strippedauthor 202423 > animalia.json
+
+To examine these trees, replace 'siphonophorae.json' in force/force.html with 
+the name of the new file generated above. Also, don't forget to chmod 755 your 
+files.
+
+
+## Files
+
+itis_sql_to_json.py - parses taxonomic data from itis.gov into a tabular tree 
+encoded in json. Each node has an associated date. For tips, the date is the 
+year in which the species was described. For internal nodes, it is the year 
+that the youngest descendent species was described. The json file is in a 
+nodes, links format. The itis taxonomy is not a phylogeny (it describes the 
+categorization of animals, not their evolutionary relationships). But it is the 
+simplest way to get a tree with hundreds of thousand of tips for working on 
+large visualization projects.
+
+force - a d3 tree viewer. It can read the json files, including those produced 
+by itis_sql_to_json.py
+
+## Background
+
+### The anatomy of a phylogeny
+
+A phylogeny is a depictions of the evolutionary relationships, such as a family 
+tree for a group of species or DNA sequences. 
+
+A phylogeny can be unrooted (ie, an undirected acyclic graph), in which case it 
+isn't indicated which node is the oldest and to assertion about the direction 
+of time is made on any edge, or rooted (ie, a DAG), in which case one node is 
+declared to be the root ancestor and time proceeds along edges from the root to 
+the tips. 
+
+Edge length can have no meaning (a cladogram), indicate the amount of observed 
+or expected evolutionary time (a phylogram), or be proportional to time (a 
+chronogram).
+
+Sometimes the topology of the tree is the main thing of interest. Sometimes the 
+topology is the backbone for understanding other data that are mapped onto the 
+tree.
+
+### Data structures
+
+The most common phylogeny datastructure is the Newick format:
+
+http://en.wikipedia.org/wiki/Newick_format
+
+It is also common to store trees as simple graph tables. There can be separate 
+tables for nodes and edges, or one table for nodes that includes a field that 
+specifies the parent node.
+
+NeXML (http://www.nexml.org) is a promising standard for representing 
+phylogenetic data, including trees.
+
+
+### Existing phylogeny viewers
+
+There are already many excellent tools for viewing phylogenies. Most of these 
+tools are intended to render static snapshots, and interactivity is usually 
+focused on optimizing the layout rather than active data exploration. 
+
+#### Existing stand-alone viewers
+
+FigTree (http://tree.bio.ed.ac.uk/software/figtree/) - the primary workhorse 
+used by the scientific community used for drawing trees for publication. It 
+enables basic interaction, such as toggling 
+
+Dendroscope (http://ab.inf.uni-tuebingen.de/software/dendroscope/) is 
+particularly well suited for very large trees with many tips.
+
+Mesquite (http://mesquiteproject.org/mesquite/mesquite.html) is particularly 
+well suited for examining the evolution of characters on phylogenies. 
+
+#### Existing libraries for tree manipulation and rendering trees
+
+ape (http://ape.mpl.ird.fr) - R tools for manipulating and analyzing trees, 
+extensive functionality for rendering trees and showing data on the trees. It 
+is now very widely used for analyzing evolutionary data on phylogenies, and 
+increasingly often for rendering trees for publication. There are quite a few 
+other R packages for examining phylogenies, these are summarize by Brian 
+O'Meara at http://cran.r-project.org/web/views/Phylogenetics.html.
+
+Dendropy (http://packages.python.org/DendroPy/tutorial/index.html) - python 
+tools for manipulating and analyzing trees, not much for displaying them
+
+#### Existing web-based viewers
+
+jsPhyloSVG (http://www.jsphylosvg.com) - a javascript library for rendering 
+svg's of trees. It produces very nice graphics, and 
+
+jstree (http://lh3lh3.users.sourceforge.net/jstree.shtml) - an editor for 
+phylogenetic trees
+
+PhyloWidget (http://www.phylowidget.org)
+
+onezoom (http://www.onezoom.org) - a set of static trees that can be viewed in 
+the browser
+
+
+## The ideal phylogeny viewer
+
+The ideal tool would be:
+
+- Scalable, working well for trees that have a handful of tips up to millions 
+of tips (there are about 2 million described species, and probably at least 10 
+million currently living on the planet)
+
+- Interactive, enabling the user to explore the tree (traversing different 
+parts, changing the zoom), manipulate the tree layout (move tips around, 
+control node density, rotate subtrees, etc), subset the tree according to data 
+(such as removing all nodes for species that were described after a particular 
+date), and control what data is shown about nodes and edges (toggle the names, 
+control what color the nodes are)
+
+- Have meaningful transitions when different portions of the tree are shown 
+(eg, dynamic layout that optimizes the view of the tree as nodes are added or 
+removed) and data are shown or hidden
+
+
+## Example use cases (not yet implemented)
+
+### Show the history of biological exploration
+
+Show a tree with all described species and the date they were first described. 
+Place a slider below the tree that goes from the first described species to the 
+present day. As the user moves the slider, only the species that were described 
+before the indicated date are shown. This allows the user to see how some 
+groups filled in slowly and then quickly in a burst of discovery, and how 
+entire new groups were discovered and then expanded.
+
+Some thoughts:
+- When the slider is all the way to the left there would be <50 species, when 
+to the right there would be hundreds of thousands or millions. The layout would 
+dynamically change to accommodate this change in density.
+
+- As the slider moves, it would be very cool to change the color or 
+transparency of the nodes and edges that were just added in or are about to be 
+removed. That would make it easy to tell at a glance what is changing without 
+depending on the motion itself.
+
+The simplest way to implement this would be to parse the taxonomy of itis.gov 
+and use it as a proxy for the tree of life. itis is a database of categories, 
+not relationships, but we don't have a single tree with true relationships yet. 
+itis also has the dates that species were described on, which is convenient.
+
+itis_sql_to_json.py parses the tree structure and date. It propagates dates to 
+internal nodes, so that given year Y on the slider all nodes with a date great 
+than Y would be removed from the current view.
+
+I used itis_sql_to_json.py to generate siphonophorae.json, which can be used as 
+a test dataset to get the viewer working. We can then create larger and larger 
+subtrees, up to all the species in itis.
+
+### A widget for embedding interactive phylogenies into web sites. 
+The owner of the site could hard-code the tree into the web page. The widget 
+could also be configured to allow users to paste trees into the browser. In 
+both cases, newick encoded strings would be accepted.