1. biolab
  2. Untitled project
  3. orange-bioinformatics


orange-bioinformatics / docs / reference / obiMeSH.htm

<LINK REL=StyleSheet HREF="style.css" TYPE="text/css">
<LINK REL=StyleSheet HREF="../style-print.css" TYPE="text/css" MEDIA=print></LINK>


<index name="modules+multidimensional scaling">

<p>This module provides the functionality to computer <a href="http://www.nlm.nih.gov/bsd/disted/mesh/">MeSH</a> term enrichment and to annotate (MeSH ontology) chemicals with MeSH terms.</p>


<p><INDEX name="classes/obiMeSH (in obiMeSH)">obiMeSH is the main class for all task related to MeSH ontology.</p>

<P class=section>Methods</P>
<DL class=attributes>
<DD>Constructor has no arguments.</DD>

<dt>findFrequentTerms(data, minSizeInTerm, callback=None)</dt>
<DD>Function returns a dictionary where keys are MeSH terms ids and values are integers representing number of examples annotated with corresponding MeSH term. Data has to be instance of <code>ExampleTable</code>. With argument <code>minSizeInTerm</code> you can select only MeSH terms that have at least <code>minSizeInTerm</code> annotated examples.</DD>

<dt>findEnrichedTerms(reference, cluster, pThreshold=0.05, callback=None)</dt>
<DD>Function returns a dictionary where keys are MeSH terms and values are lists of four integers (number of annotated reference examples, number of annotated cluster examples, MeSH term enrichment, fold enrichment). With attribute <code>pThreshold</code> you can limit MeSH terms in returned dictionary to terms with enrichment less or equal to defined constant. Data sets (<code>reference</code> and <code>cluster</code>) have to be instances of <code>ExampleTable</code></DD>

<dt>printMeSH(data, selection = ["term","r","c", "p"])</dt>
<DD>Function performs a pretty print of a dictionary returned by function <code>findFrequentTerms</code> or <code>findEnrichedTerms</code>. When you are printing a dictionary of enriched MeSH terms (returned by <code>findEnrichedTerms</code>) you can also specify their properties and their order to print. At the moment you can choose among "term" (MeSH term name), "desc" (MeSH term description), "r" (number of examples from reference), "c" (number of examples from cluster), "p" (MeSH term enrichment) and "fold" (fold enrichment). </DD>
<dt>printHtmlMeSH(data, selection = ["term","r","c", "p"])</dt>
<DD>Similar to previous one except it returns html code. </DD>
<dt>findTerms(ids, idType="cid")</dt>
<DD>Function returns a dictionary where keys are members of the list <code>ids</code> and values are lists of MeSH terms that apply to a key. Note that this function is using a local annotation database.</DD><dt>
<dt>parsePubMed(filename, attributes = ["pmid", "title","abstract","mesh"], skipExamplesWithout = ["mesh"])</dt>
<DD>Function parses <a href="http>//www.pubmed.gov">PubMed</a> XML file (search results saved in XML format) into Oranges <code>ExampleTable</code>. Of course you can select only certain attributes. At the moment supported attributes are "pmid" (PubMed ID), "title" (article title), "abstract" (article abstract), "mesh" (MeSH terms) and "affilation". </DD><dt>
<dt>findSubset(examples, meshTerms, callback = None)</dt>
<DD>Function return a new dataset (subset of <code>examples</code>) with examples that apply to one or more MeSH term from the list <code>meshTerms</code>. Argument <code>examples</code> has to be instance of <code>ExampleTable</code>. </DD><dt>
<p class=section>Attributes</p>
<DL class =attributes></dl>
<DD>Dictionary <code>toID</code> provides mapping between MeSH term and MeSH term ids. Please note that some MeSH terms have more than one MeSH term id (one to many relation).  </DD>

<DD>Dictionary <code>toName</code> provides mapping between MeSH term id and MeSH term. </DD>

<DD>Dictionary <code>toName</code> provides mapping between MeSH term and MeSH term description.</code></DD>

<DD>Dictionary <code>fromCID</code> provides local mapping between CID (compound id) and a list of MeSH terms.</DD>

<h3>Basic operations on MeSH ontology</h3>

<p>In our first example, we will show how to manipulate with MeSH ontology. Let's start with simple mapping between MeSH terms and their ids. This is done by the following code:</p>

<p class="header">part of <a href="mesh1.py">mesh1.py</a> </p>

<xmp class=code>import orange
import obiMeSH
</xmp> -->

<h3>Using basic obiMeSH attributes</h3>
<p>Following example shows usage of main obiMeSH attributes.</p>
<xmp class=code>>>> import obiMeSH
>>> d = obiMeSH.obiMeSH()
>>> dir(d)
# We look if there is an annotation for chemical with CID 6240 in our local database.
>>> d.fromCID[6240]
# We could also used function which is using online database ...
>>> d.findTerms([1,2,3,4,5,6])
{1: ['Acetylcarnitine'], 2: [], 3: [], 4: ['Propanolamines'], 5: [], 6: ['Dinitrochlorobenzene']}
# After we know MeSH terms we can get their description ...
>>> d.toDesc['Chlorpromazine']
"The prototypical phenothiazine antipsychotic drug. Like the other drugs in this class chlorpromazine's antipsychotic actions
are thought to be due to long-term adaptation by the brain to blocking DOPAMINE RECEPTORS. Chlorpromazine has several other
actions and therapeutic uses, including as an antiemetic and in the treatment of intractable hiccup."
# or MeSH ids ...
>>> d.toID['Chlorpromazine']
['D02.886.369.198', 'D03.494.741.198']
# which can be easily converted back to MeSH terms.
>>> d.toName['D03']
'Heterocyclic Compounds'
>>> d.toName['D03.494']
'Heterocyclic Compounds, 3-Ring'

<h3>Calculating and printing MeSH term enrichment</h3>

<p>In the following example you can see how to calculate and pretty print enriched MeSH terms.</p>
<p>part of <a href='obiMeSH-calculate-enrichment.py'>obiMeSH-calculate-enrichment.py</a></p>

<xmp class=code>import obiMeSH
import orange

# load datasets
reference = orange.ExampleTable('obiMeSH-reference-dataset.tab')
cluster = orange.ExampleTable('obiMeSH-cluster-dataset.tab')

# find and print enriched MeSH terms with p-value < 0.1
d = obiMeSH.obiMeSH()
enrichment = d.findEnrichedTerms(reference, cluster, pThreshold=0.1)

<h3>Parsing PubMed XML data</h3>

<h3>Advanced: using MeSH terms relationship data</h3>

<p class="header"><a href="mds3.py">mds3.py</a> (uses <a href="reference.tab">reference.tab</a> and <a href="cluster.tab">cluster.tab</a>)</p>
<XMP class= code>import orange
import obiMeSH


<XMP class= code>i=0 while 100>i:


<p><INDEX name="classes/pubChemAPI (in obiMeSH)">pubChemAPI is the main class used to query online PubChem database.</p>

<P class=section>Methods</P>
<dt>getSMILE(id, typ)</dt>
<dl>Functions returns corresponding chemical formula in SMILES format for desired <code>id</code>. Argument <code>typ</code> indicates type of identifier. At the moment there are two possibilities for a value of <code>typ</code> ('cid' for compound id and 'sid' for substance id). </dl>

<dl>Function returns a list of MeSH terms (classification) for a given CID.</dl>

<dl>Function returns a list of possible CIDs for a given SID.</dl>

<dl>Function returns a list of pharmacological actions for a given CID.</dl>

<dt>getCIDs(name, weight)</dt>
<dl>Function returns a list of possible chemical names for a given CID and molecular weight.</dl>

<dl>Function returns a list of possible chemical names for a given CID.</dl>


<h3>Basic functions</h3>

<p>In the following example you can see how to do basic operations with pubChemAPI library.</p>

<xmp class=code>from obiMeSH import *

api = pubChemAPI()
chemical = 'Chlorpromazine'
cid = api.getCIDs(chemical)
classTerms = api.getMeSHterms(cid[0])
pharmTerms = api.getPharmActionList(cid[0])
smiles = api.getSMILE(cid,'cid')

print 'Drug name', chemical
print 'All available CID numbers', cid
print 'Classification MeSH terms', classTerms
print 'Pharmacological action MeSH terms', pharmTerms
print 'Smiles notation', smiles

<P>The output should be
<xmp  class=code>Drug name  Chlorpromazine
All available CID numbers  [2726, 70413, 522335, 6240, 9683, 6431825, 4926, 84362, 62861, 443037, 23724898, 165214,
 160588, 122845, 9682, 6474604, 125595, 125358, 6474605, 465099, 465100, 481770, 159916, 465103, 465098, 461555, 107410,
 465104, 486143, 467415, 114324, 117674, 117673, 3026449, 72287, 6420056, 3916, 24182520, 24182516, 91499, 67356, 66064, 
66062, 6602611, 6444542, 6436410, 6436057, 5282418, 5282417, 5281881, 5281878, 5281032, 36207, 2913535, 17012, 5887, 5566, 
5452, 4917, 4748, 4744, 4078, 3372]
Classification MeSH terms  ['Organic Chemicals', 'Heterocyclic Compounds, 3-Ring', 'Phenothiazines', 'Sulfur Compounds', 'Heterocyclic Compounds', 'Chlorpromazine']
Pharmacological action MeSH terms  ['Antiemetics', 'Autonomic Agents', 'Dopamine Antagonists', 'Central Nervous System Agents', 'Chemical Actions and Uses', 'Dopamine Agents', 'Gastrointestinal Agents', 'Molecular Mechanisms of Pharmacological Action', 'Pharmacologic Actions', 'Physiological Effects of Drugs', 'Therapeutic Uses', 'Tranquilizing Agents', 'Central Nervous System Depressants', 'Neurotransmitter Agents', 'Peripheral Nervous System Agents', 'Psychotropic Drugs', 'Antipsychotic Agents']
Smiles notation  CN(C)CCCN1C2=CC=CC=C2SC3=C1C=C(C=C3)Cl

<h3>Generating drug dataset with pubChemAPI and obiMeSH</h3>

<p>In this example you can see how to print chemical dataset suitable for usage in Orange. We will assume that our initial data is a set of drug names. Note that following procedure is vague and may not be accurate for all the chemicals. The crux of the problem is the fact that function getCIDs returns a list of CIDs. We assume that first CID is the one we are looking for but that is not always true.</p>
<p>part of <a href='obiMeSH-dataset-generator.py'>obiMeSH-dataset-generator.py</a></p>
<xmp class=code>for i in names:
	cids = chem.getCIDs(i.strip())
	if len(cids) > 0:
		cid = cids[0] 		
		terms = chem.getMeSHterms([int(cid)])[cid]
		smiles = chem.getSMILE(int(cid),"cid")
		print cid, "\t", i, "\t", smiles, "\t", terms

<P>The output should be
<xmp  class=code>cid	name	smiles	mesh
string	string	string	string

6427782 	1,7-octadiene 	CC=CCC=CC 	[]
24199357 	2-Dimethylamnoethyl cloride 	CC(C)C1=CC=C(C=C1)C2C(C(=O)NC(=S)N2)C#N.[K] 	[]
3385 	5-fluorouracil 	C1=C(C(=O)NC(=O)N1)F 	['Pyrimidines', 'Uracil', 'Fluorouracil', 'Pyrimidinones', 'Heterocyclic Compounds, 1-Ring', 'Heterocyclic Compounds']