orange-bioinformatics / docs / reference / obiTaxonomy.htm

<html>

<head>
<title>obiTaxonomy: NCBI Taxonomy</title>
<link rel=stylesheet href="style.css" type="text/css">
<link rel=stylesheet href="style-print.css" type="text/css" media=print>
</head>

<body>
<h1>obiTaxonomy: Organism Taxonomy</h1>
<p>The module obiTaxonomy provides access to the <a href="http://www.ncbi.nlm.nih.gov/Taxonomy/">NCBI's organism taxonomy information</a>.
The taxonomy information is pre-loaded, that is, becomes available with installation, and for the reasons of response-time it is updated on Orange server and
not through direct calls to NCBI's site from your local machine. The list of names of the organisms are also updated using 
information from <a href="http://www.genome.jp/kegg/">KEGG</a> and the <a href="http://www.geneontology.org/">Gene Ontology</a>, 
so that when you search for "sgd" or "mmu" you get the expected result.</p>

<p>The module is also used through Orange Bioinformatics for organism name unification across different modules.</p>

<p class=section>Functions</p>
<dl class=attributes>
	<dt>name(taxid)</dt>
	<dd>Return the scientific name for organism with taxid.</dd>

	<dt>other_names(taxid)</dt>
	<dd>Return a list of (name, name_type) tuples but exclude the scientific name.</dd>

	<dt>search(string, onlySpecies=True, exact=False)</dt>
	<dd>Search the NCBI taxonomy database for an organism by <code>string</code>, returning a list of
	taxonomy IDs where any of the organism's names includes a <code>string</code>. If <code>onlySpecies</code> is 
	True then search only in the species and subspecies nodes ob the taxonomy. If <code>exact=True</code>, the entire
	name has to match a given <code>string</code>.</dd>
	
	<dt>lineage(taxid)</dt>
    <dd>Return a list of taxids ordered from the topmost node (root) to taxid.</dd>

	<dt>to_taxid(code)</dt>
	<dd>See if the organism <code>code</code> is a valid organism code (codes are obtained from, for instance,
	KEGG and GO data bases) and return a set of its taxids.</dd>

	<dt>taxids()</dt>
	<dd>Returns a list of all (about half a million!) NCBI's taxonomy ID's.<dd>

	<dt>common_taxids()</dt>
	<dd>Returns a list of taxonomy IDs for common organisms (see list of <a href="http://www.ncbi.nlm.nih.gov/Taxonomy/">common organisms from NCBI</a>)
	These are also the organisms for which the information files, such as Gene Ontology annotation and KEGG pathways will be
	pre-loaded on Orange server and made available through Orange Bioinformatics database update. If there is
	an organism you wish to include in this list, please contact the authors or post your wish on <a href="">Orange's Forum</a>.</dd>

	<dt>essential_taxids()</dt>
	<dd>Returns a set of taxonomy IDs, which is a subset of those returned by <code>common_taxids()</code>. This are also the organisms for which any annotation
	information will be pre-loaded on your computer upon installation of Orange Bioinformatics.</dd>
	
</dl>

<p class=section>Examples</p>

<p>The following script takes the list of taxonomy IDs and prints out their name:</p>

<p class="header"><a href="taxonomy1.py">taxonomy1.py</a></p>
<xmp class=code>import obiTaxonomy

for taxid in obiTaxonomy.common_taxids():
    print "%-6s %s" % (taxid, obiTaxonomy.name(taxid))
</xmp>

<p>The output of the script is:<p>

<xmp class=code>3702   Arabidopsis thaliana
9913   Bos taurus
6239   Caenorhabditis elegans
3055   Chlamydomonas reinhardtii
7955   Danio rerio
352472 Dictyostelium discoideum AX4
7227   Drosophila melanogaster
562    Escherichia coli
11103  Hepatitis C virus
9606   Homo sapiens
10090  Mus musculus
2104   Mycoplasma pneumoniae
4530   Oryza sativa
5833   Plasmodium falciparum
4754   Pneumocystis carinii
10116  Rattus norvegicus
4932   Saccharomyces cerevisiae
4896   Schizosaccharomyces pombe
31033  Takifugu rubripes
8355   Xenopus laevis
4577   Zea mays
</xmp>

<h2>Update from other Orange modules</h2>
<p>(this section for developers only) For unification, each module (e.g. obiKEGG, obiGO, ...) should provide the following interface:
<dl class=section>
	<dt>from_taxid(taxid)</dt>
	<dd>Convert taxid to module's internal organism code.</dd>
	
	<dt>to_taxid(organism)</dt>
	<dd>Convert module's internal organism code to taxid.</dd>
	
	<dt>organisms()</dt>
	<dd>Returns a list of tuples (taxid, internal_name)</dd>
</dl> 
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.