orange-bioinformatics / docs / reference / obiAssess.htm

<html>

<head>
<title>obiAssess: pathway enrichment for each sample</title>
<link rel=stylesheet href="style.css" type="text/css">
<link rel=stylesheet href="style-print.css" type="text/css" media=print>
</head>

<body>
<h1>obiAssess: pathway enrichment for each sample</h1>
<index name="modules/assess enrichment gsea">

<p>Gene Set Enrichment Analysis (GSEA) is a method which tries to identify groups of genes that are regulated together. It computes pathway enrichments for the whole data set. ASSESS in inspired by GSEA and it computes enrichments for each sample in the data set.</p>

<p>ASSESS takes gene expression with sample phenotypes and computes gene set enrichments for given gene sets. First pathway &quot;models&quot; have to be created with AssessLearner. Afterwards they are used to calculate enrichments for each pair of sample and pathway.</p>

<h2>AssessLearner</h2>

Class is used to build models, that can be used later to determine enrichments scores for each example. Note that domains of input data to <code>AssessLearner</code> and <code>Assess</code> instances must be the same.

<dl class=attributes>

<dt>__call__(self, data, organism, geneSets, minSize=3, maxSize=1000, minPart=0.1, classValues=None, rankingf=None)</dt>
<dd>Function <code>__call__</code> returns an instance of class <code>Assess</code> which can be given an example and returns its enrichemnt in all pathways. Argument descriptions follow. 

<dl class=arguments>

  <dt>data</dt>
  <dd>An <A href="ExampleTable.htm"><CODE>ExampleTable</CODE></A> with gene expression data. An example
  should correspond to a sample with its phenotype (class value). Attributes represent individual genes. Their names 
  should be meaningful gene aliases.</dd>

  <dt>organism</dt>
  <dd>Organism code as used in KEGG. Needed for matching gene names in data to those in gene sets. Some
  examples: <code>hsa</code> for human, <code>mmu</code> for mouse. This is an required argument.</dd> 

  <dt>classValues</dt>
  <dd>A pair of class values describing phenotypes that are chosen as two distinct phenotypes on which gene correlations
  are computed. Only examples with one of chosen class values are considered for analysis. If not specified, first
  two class values in <code>classVar</code> attribute descriptor are used.</dd>

  <dt>geneSets</dt>
  <dd>A python dictionary of gene sets, where key is a gene set name which points to a list of gene aliases for genes
  in the gene set. Default: gene sets in your collection directory.</dd>

  <dt>minSize, maxSize</dt>
  <dd>Minimum and maximum number of genes from gene set also present in the data set for that gene set to be analysed.
  Defaults: 3 and 1000.</dd>

  <dt>minPart</dt>
  <dd>Minimum fraction of genes from the gene set also present in the data set for that gene set to be analysed. Default: 0.1.</dd> 

  <dt>rankingf</dt>
  <dd>Used to specify model type for individual gene sets. See source code for reference. We recommend leaving the parameter blank. In that case, a parametric model from Edelman, 2006 is used.</dd>

</dl>

</dd>

</dl>

<h2>Assess</h2>

<dl class=attributes>

<dt>__init__(**kwargs)</dt>
<dd>Function <code>__init__</code> is usually called only by <code>AssessLearner</code>. It is used to save built &quot;model&quot; data. Saves all keyword arguments into object's namespace.</dd>

<dt>__call__(example)</dt>
<dd>Returns enrichments of all gene sets for this example. Enrichments are returned in a dictionary, where keys are gene set and values their enrichments. Note that example's domain must be the same as the domain on which the &quot;model&quot; was built.</dd>


</dl>

<h3>Example 1</h3>

This example prints enrichmentes for the first sample in the data set. It uses KEGG as a gene set source.

<p class="header"><a href="assess1.py">assess1.py</a> (uses <a href="http://www.ailab.si/orange/datasets/DLBCL.tab">DLBCL.tab</a>)</p>

<xmp class=code>import orange
import obiAssess
import obiGeneSets

gs = obiGeneSets.collections([":kegg:hsa"])
data = orange.ExampleTable("DLBCL.tab")

asl = obiAssess.AssessLearner()
ass = asl(data, "hsa", geneSets=gs)

print "Enrichments for the first example (10 pathways)"
enrichments = ass(data[0])
for patw, enric in sorted(enrichments.items())[:10]:
    print patw, enric
</xmp>

<p>Output:</p>

<xmp class=code>Enrichments for the first example (10 pathways)
[KEGG] 1- and 2-Methylnaphthalene degradation -0.84674671525
[KEGG] 3-Chloroacrylic acid degradation -0.587923507915
[KEGG] ABC transporters - General -0.292198856631
[KEGG] Acute myeloid leukemia 0.305086037192
[KEGG] Adherens junction 0.387903973883
[KEGG] Adipocytokine signaling pathway 0.404448748545
[KEGG] Alanine and aspartate metabolism 0.400113861834
[KEGG] Alkaloid biosynthesis I -0.677360944415
[KEGG] Alkaloid biosynthesis II -0.437492650183
[KEGG] Allograft rejection 0.491535468415
</xmp>

<h3>Example 2: transforming data sets</h3>

This example builds a new data set, where attributes are gene sets instead of genes. It prints first 10 attributes for the first example of transformed data set. Note, that the output matches previous example (well, with the exception of floating point discrepancies).

<p class="header"><a href="assess2.py">assess2.py</a> (uses <a href="http://www.ailab.si/orange/datasets/DLBCL.tab">DLBCL.tab</a>)</p>

<xmp class=code>import orange
import obiAssess
import obiGeneSets

gs = obiGeneSets.collections([":kegg:hsa"])
data = orange.ExampleTable("DLBCL.tab")

asl = obiAssess.AssessLearner()
ass = asl(data, "hsa", geneSets=gs)

def genesetsAsAttributes(data, ass, domain=None):
    """
    Construct new data set with gene sets as attributes from data
    set "data" with assess model "ass".
    """

    ares = {}
    for ex in data:
        cres = ass(ex)
        for name,val in cres.items():
            aresl = ares.get(name, [])
            aresl.append(val)
            ares[name] = aresl

    ares = sorted(ares.items())

    if not domain: #construct new domain instance if needed
        domain = orange.Domain([ orange.FloatVariable(name=name) \
            for name in [ a[0] for a in ares]], data.domain.classVar )

    examples = [ [ b[zap] for a,b in ares ] + \
        [ data[zap][-1] ]   for zap in range(len(data)) ]

    et = orange.ExampleTable(domain, examples)
    return et

tdata = genesetsAsAttributes(data, ass)

print "First 10 attributes of the first example in transformed data set"
for pathw, enric in zip(tdata.domain,tdata[0])[:10]:
    print pathw.name, enric.value
</xmp>

<p>Output:</p>

<xmp class=code>First 10 attributes of the first example in transformed data set
[KEGG] 1- and 2-Methylnaphthalene degradation -0.846746742725
[KEGG] 3-Chloroacrylic acid degradation -0.587923526764
[KEGG] ABC transporters - General -0.292198866606
[KEGG] Acute myeloid leukemia 0.305086046457
[KEGG] Adherens junction 0.387903988361
[KEGG] Adipocytokine signaling pathway 0.404448747635
[KEGG] Alanine and aspartate metabolism 0.400113850832
[KEGG] Alkaloid biosynthesis I -0.6773609519
[KEGG] Alkaloid biosynthesis II -0.437492638826
[KEGG] Allograft rejection 0.491535454988
</xmp>

<h3>Example 3: testing transformed data set quality</h3>

We measure CA and AUC of transformed data set using cross validation and compare them to the original data set. Care needs to be taken to prevent overfitting: we must not use any knowledge about testing set when creating &quot;ASSESS models&quot; and we have to use the same &quot;ASSESS model&quot; for both learning and testing set. We solve this by saving the model to a global variable.

<p class="header">part of <a href="assess3.py">assess3.py</a> (uses <a href="http://www.ailab.si/orange/datasets/DLBCL.tab">DLBCL.tab</a>)</p>

<xmp class=code>offer = None

def transformLearningS(data):
    ass = asl(data, "hsa", geneSets=gs)
    et = genesetsAsAttributes(data, ass)

    global offer
    offer = (et.domain, ass) #save assess model

    return et
   
def transformTestingS(data):
    global offer
    if not offer:
        a = fdfsdsdd #exception

    domain, ass = offer
    offer = None

    return genesetsAsAttributes(data, ass, domain)


import orngBayes, orngTest, orngStat
learners = [ orngBayes.BayesLearner() ]

resultsOriginal = orngTest.crossValidation(learners, data, folds=10)
resultsTransformed = orngTest.crossValidation(learners, data, folds=10, 
    pps = [("L", transformLearningS), ("T", transformTestingS)])

print "Original", "CA:", orngStat.CA(resultsOriginal), "AUC:", orngStat.AUC(resultsOriginal)
print "Transformed", "CA:", orngStat.CA(resultsTransformed), "AUC:", orngStat.AUC(resultsTransformed)
</xmp>

<p>Output:</p>

<xmp class=code>Original CA: [0.8214285714285714] AUC: [0.78583333333333338]
Transformed CA: [0.80714285714285716] AUC: [0.84250000000000003]
</xmp>

<HR>
<H2>References</H2>

<p>Edelman E, Porrello A, Guinney J, Balakumaran B, Bild A, Febbo PG, Mukherjee S. Analysis of sample set enrichment scores: assaying the enrichment of sets of genes for individual samples in genome-wide expression profiles. Bioinformatics. 2006 Jul 15; 22(14):e108-16. </p>

</body>
</html>
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.