Commits

Marko Toplak committed e10da8f

Documentation updates.

Comments (0)

Files changed (14)

docs/rst/reference/gene.rst

 Gene name matching (:mod:`gene`)
 ********************************************************
 
-To use gene matchers
-first set the target gene names with :obj:`~Matcher.set_targets` and then
-match  with :obj:`~Matcher.match` or :obj:`~Matcher.umatch` functions. The
+Genes can have multiple aliases. When we combine data from different
+sources, for example expression data with GO gene sets, we have to
+match gene aliases representing the same genes. All implemented matching
+methods are based on sets of gene aliases.
+
+Gene matchers in this module match genes to a user-specified set of target
+gene names. For gene matching, initialize a gene matcher (:obj:`Matcher`),
+set the target gene names with :obj:`~Matcher.set_targets`, and then
+match with :obj:`~Matcher.match` or :obj:`~Matcher.umatch` functions. The
 following example (:download:`genematch1.py <code/genematch1.py>`)
 matches gene names to NCBI gene IDs:
 
 Gene name matching
 ==================
 
-Genes can have multiple aliases. When we combine data from different
-sources, for example expression data with GO gene sets, we have to
-match gene aliases representing the same genes. All implemented matching
-methods are based on sets of gene aliases for one gene.
+The base class for all the following gene matcher is :obj:`Matcher`.
 
 .. autoclass:: Matcher
    :members:
 
 .. autoclass:: MatcherDirect
 
-Gene name matchers can applied in sequence (until the first match) or combined (overlapping sets of gene aliases of multiple gene matchers are combined) with the :obj:`matcher` function.
+Gene name matchers can be applied in sequence (until the first match) or
+combined (overlapping sets of gene aliases of multiple gene matchers
+are combined) with the :obj:`matcher` function.
 
 .. autofunction:: matcher
 
-The following example tries to match input genes onto KEGG gene aliases (:download:`genematch2.py <code/genematch2.py>`).
+The following example tries to match input genes onto KEGG gene aliases
+(:download:`genematch2.py <code/genematch2.py>`).
 
 .. literalinclude:: code/genematch2.py
 
-Results show that GO aliases can not match onto KEGG gene IDs. For the last gene only joined GO and KEGG aliases produce a match::
+Results show that GO aliases can not match onto KEGG gene IDs. For the
+last gene only joined GO and KEGG aliases produce a match::
 
         gene         KEGG           GO      KEGG+GO
         cct7    hsa:10574         None    hsa:10574
       a2a299         None         None     hsa:7052
 
 
-The following example finds KEGG pathways with given genes (:download:`genematch_path.py <code/genematch_path.py>`).
+The following example finds KEGG pathways with given genes
+(:download:`genematch_path.py <code/genematch_path.py>`).
 
 .. literalinclude:: code/genematch_path.py
 
     Olfr1403 is in
       Olfactory transduction
 
-

docs/rst/reference/genesets.rst

 Gene sets (:mod:`geneset`)
 ********************************************************
 
-This module can load either gene sets distributed with Orange (through ServerFiles) 
+This module can load either gene sets distributed with Orange
 or custom gene sets in the `GMT file format <http://www.molmine.com/magma/fileformats.html>`_.
 
 The available gene set collection can be listed with :obj:`list_all`.
 
     Orange.bio.geneset.collections((("KEGG",), "10090"), (("GO",), "10090"))
 
-You could also open a file with gene sets. The following line would open ``specific.gmt`` from the current working directory::
+You could also open a file with gene sets. The following line would open
+``specific.gmt`` from the current working directory::
 
     Orange.bio.geneset.collections("specific.gmt")
 
 
 .. autoclass:: Orange.bio.geneset.GeneSets
    :members:
+   :show-inheritance:
 
 .. autoclass:: Orange.bio.geneset.GeneSet
    :members:

docs/rst/reference/geo.rst

 <http://www.ncbi.nlm.nih.gov/>`_'s `Gene Expression Omnibus
 <http://www.ncbi.nlm.nih.gov/geo/>`_ repository. It 
 supports `GEO DataSets <http://www.ncbi.nlm.nih.gov/sites/GDSbrowser>`_
-information querying and retrieval.
+query and retrieval.
 
 In the following example :obj:`GDS.getdata`
 construct a data set with genes in rows and samples in
 columns. Notice that the annotation about each sample is retained
 in ``.attributes``.
 
-::
-
-    >>> import Orange
-    >>> gds = Orange.bio.geo.GDS("GDS1676")
-    >>> data = gds.getdata()
-    >>> len(data)
-    667
-    >>> data[0]
-    [?, ?, -0.803, 0.128, 0.110, -2.000, -1.000, -0.358], {"gene":'EXO1'}
-    >>> data.domain.attributes[0]
-    FloatVariable 'GSM63816'
-    >>> data.domain.attributes[0].attributes
-    Out[191]: {'dose': '20 U/ml IL-2', 'infection': 'acute ', 'time': '1 d'}
+>>> import Orange.bio.geo
+>>> gds = Orange.bio.geo.GDS("GDS1676")
+>>> data = gds.getdata()
+>>> len(data)
+667
+>>> data[0]
+[?, ?, -0.803, 0.128, 0.110, -2.000, -1.000, -0.358], {"gene":'EXO1'}
+>>> data.domain.attributes[0]
+FloatVariable 'GSM63816'
+>>> data.domain.attributes[0].attributes
+Out[191]: {'dose': '20 U/ml IL-2', 'infection': 'acute ', 'time': '1 d'}
 
 GDS classes
 ===========
 .. autoclass:: GDSInfo
    :members:
 
-An example that uses obj:`GDSInfo`::
+An example with obj:`GDSInfo`::
 
     >>> import Orange
     >>> info = Orange.bio.geo.GDSInfo()
 mining it would be useful to find out which data sets provide enough
 samples for each label. It is (semantically) convenient to perform
 classification within sample subsets of the same type. The following
-scripts therefore goes through all data sets and finds those with enough
+script goes through all data sets and finds those with enough
 samples within each of the subsets for a specific type. The function
 ``valid`` determines which subset types (if any) satisfy our criteria. The
 minimum number of samples in the subset was set to ``n=40``
    :lines: 8-
 
 The requested number of samples, ``n=40``, seems to be a quite
-a stringent criteria met - at the time of writing of this documentation -
+a stringent criteria met - at the time of writing this -
 by 35 sample subsets. The output starts with::
 
     GDS1611
 
 Let us now pick data set GDS2960 and see if we can predict the disease
 state. We will use logistic regression, and within 10-fold cross
-validation measure AUC, the area under ROC. AUC is the probably for
-correctly distinguishing between two classes if picking the sample from
-target (e.g., the disease) and non-target class (e.g., control). From
-(:download:`geo_gds6.py <code/geo_gds6.py>`)
+validation measure AUC, the area under ROC. AUC is the probability of
+correctly distinguishing the two classes, (e.g., the disease and control). 
+From (:download:`geo_gds6.py <code/geo_gds6.py>`):
 
 .. literalinclude:: code/geo_gds6.py
 

docs/rst/reference/omim.rst

 
 .. autofunction:: gene_diseases
 
-The following example creates a network of connected diseases and save it in pajek .net format 
-sets information file (:download:`omim_disease_network.py <code/omim_disease_network.py>`).
+The following example creates a network of connected diseases and
+save it in `Pajek <http://vlado.fmf.uni-lj.si/pub/networks/pajek/>`_
+.net format sets information file (:download:`omim_disease_network.py
+<code/omim_disease_network.py>`).
 
 .. literalinclude:: code/omim_disease_network.py
 

docs/rst/reference/ontology.rst

-
 .. automodule:: Orange.bio.ontology
 
-
 .. autoclass:: OBOOntology
    :members:
    :member-order: bysource

orangecontrib/bio/gene/__init__.py

         return self.set_targets(targets)
 
     def set_targets(self, targets):
-        """
-        Set input list of gene names (a list of strings) as target genes.
+        """ Set input list of gene names (a list of strings) as target genes.
         """
         notImplemented()
 
         notImplemented()
 
     def umatch(self, gene):
-        """Return an the single (unique)  matching target gene or None, if there are no matches or multiple matches."""
+        """Return a single (unique) matching target gene or None, if there are no matches or multiple matches."""
         mat = self.match(gene)
         return mat[0] if len(mat) == 1 else None
 
 
     def match(self, gene):
         """
-        Input gene is first mapped to ids of sets of aliases which contain
-        it. Target genes belonding to the same sets of aliases are returned
+        The `gene` is first mapped to ids of sets of aliases which contain
+        it. Target genes from the same sets of aliases are returned
         as input's match.
         """
         inputgeneids = self.parent.to_ids(gene)
 def matcher(matchers, direct=True, ignore_case=True):
     """
     Builds a new matcher from a list of gene matchers. Apply matchers in
-    the input list successively until a match is found. If a list element
-    is a a list, join matchers in the list by joining overlapping sets
-    of aliases.
+    the input list successively until a match is found. If an element of
+    `matchers` is a list, combine matchers in the sublist by joining overlapping 
+    sets of aliases.
 
-    :param matchers: gene matchers.  
-    :param direct: If True, first try
+    :param list matchers: Gene matchers. 
+    :param bool direct: If True, first try
       to match gene directly (a :obj:`MatcherDirect` is inserted in front of the
       gene matcher sequence).  
-    :param ignore_case: passed to the added
+    :param bool ignore_case: passed to the added
       direct matcher.
     """
     seqmat = []

orangecontrib/bio/geneset/__init__.py

     update_server_list(serverFiles)
 
 def register(genesets, serverFiles=None):
-    """ Registers given :class:`GeneSets` locally.  The gene set is registered
+    """ Registers given genesets locally.  The gene set is registered
     by the common hierarchy or organism (None if organisms are different).
 
-    If :obj:`serverFiles` as a authenticated ServerFiles connection,
-    the given gene sets are uploaded to the ServerFiles repository.  
+    :param GeneSets genesets:
+    :param serverFiles: If `serverFiles` is an authenticated ServerFiles connection,
+        the input gene sets are uploaded to the repository.  
     """
     if serverFiles == None:
         _register_local(genesets)
     def __init__(self, genes=[], name=None, id=None, \
         description=None, link=None, organism=None, hierarchy=None, pair=None):
         """
-        :param pair: Backward compatibility: convert a tuple (name, genes)
+        :param pair: Only for backward compatibility: convert a tuple (name, genes)
             into this object.
         """
 
 
     def to_odict(self, source=True, name=True):
         """
-        Backward compatibility: returns a gene set as a tuple
+        For backward compatibility. Return a gene set as a tuple
         (id, list of genes).
         """
         return self.cname(source=source, name=name), self.genes
 
 class GeneSets(set):
     """ A collection of gene sets: contains :class:`GeneSet` objects. 
-    It is a subclass of Python's :obj:`set`. 
     """
     
     def __init__(self, input=None):
         """
-        If input is a dictionary, the gene sets are converted to the current format.
+        If `input` is a dictionary, the gene sets are converted to the current format.
         """
         if input != None and len(input) > 0:
             self.update(input)

orangecontrib/bio/geo.py

 class GDSInfo:
 
     """
-    Retreive the infomation about `GEO DataSets
+    Retrieve infomation about `GEO DataSets
     <http://www.ncbi.nlm.nih.gov/sites/GDSbrowser>`_.  The class accesses
     the Orange server file that either resides on the local computer or
-    is automatically retreived from Orange server. Notice that the call
-    of this class does not access any NCBI's servers directly.
+    is automatically retrieved from Orange server. Calls to 
+    this class do not access any NCBI's servers.
 
     Constructor returning the object with GEO DataSets information. If
-    :obj:`force_update` is True, the constructor will download GEO DataSets
-    information file (gds_info.pickled) from Orange server, otherwise,
-    it will first check if the local copy exists.
+    `force_update` is True, the constructor will download GEO DataSets
+    information file (gds_info.pickled) from Orange server, otherwise
+    it will first check the local copy.
 
     An instance behaves like a dictionary: the keys are GEO DataSets
     IDs, and the dictionary values for is a dictionary providing various
 
 class GDS():
     """ 
-    GDS is a class that
-    provides methods for retreival of a specific GEO DataSet. The data
-    is provided as a :obj:`Orange.data.Table`.
+    Retrieval of a specific GEO DataSet as a :obj:`Orange.data.Table`.
 
-    Constructor returning the object to be used to retreive
-    GEO DataSet table (samples and gene expressions). Checks
+    Constructor returns the object that can retrieve
+    GEO DataSet (samples and gene expressions). It first checks
     a local cache directory if the particular data file is
     loaded locally, else it downloads it from `NCBI's GEO FTP site
     <ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SOFT/GDS/>`_.  The compressed
-    data file resides in the cache directory after the call of the
-    constructor (call to ``Orange.utils.serverfiles.localpath("GEO")`` reveals
+    data file resides in the cache directory afterwards 
+    (call to ``Orange.utils.serverfiles.localpath("GEO")`` reveals
     the path of this directory).
 
-    :param gdsname: an NCBI's ID for the data set in the form "GDSn"
+    :param gdsname: An NCBI's ID for the data set in the form "GDSn"
       where "n" is a GDS ID number.
 
-    :param force_download: force the download.
+    :param force_download: Force the download.
 
     """
 
                  sample_type=None, missing_class_value=None,
                  transpose=False, remove_unknown=None):
         """
-        Returns the data from GEO DataSet in
-        Orange format. 
+        Returns the GEO DataSet as an :obj:`Orange.data.Table`.
 
-        :param report_genes: Micorarray spots reported in the GEO data set can
-          either be merged according to their gene id's
+        :param report_genes: Microarray spots reported in the GEO data set can
+          either be merged according to their gene ids
           (if True) or can be left as spots. 
 
-        :param transpose: The data
+        :param transpose: The output
           table can have spots/genes in rows and samples in columns
           (False, default) or samples in rows and  spots/genes in columns
           (True). 
           The entire annotation of samples will
           be included either in the class value or in
           the ``.attributes`` field of each data set
-          attributes. 
+          attribute. 
 
         :param remove_unknown: Remove spots with sample profiles that
           include unknown values. They are removed if the proportion
     return new
 
 def transpose(data):
-    """Transposes data matrix, converts class information to attribute label and back"""
+    """ Transposes data matrix, converts class information to attribute label and back. """
     if data.domain.classVar:
         return transpose_class_to_labels(data)
     else:

orangecontrib/bio/go.py

     _CollectAnnotations = _collect_annotations
 
     def get_all_annotations(self, id):
-        """ Return a set of all annotations (instances of `AnnotationRecord`)
+        """ Return a set of all annotations (instances of :obj:`AnnotationRecord`)
         for GO term `id` and all it's subterms.
 
         :param str id: GO term id
 
 
 class Taxonomy(object):
-    """Maps NCBI taxonomy ids to corresponding GO organism codes
+    """Maps NCBI taxonomy ids to corresponding GO organism codes.
     """
     common_org_map = {"297284": "9913", "30523": "9913",  # Bos taurus
                       "5782": "352472", "44689": "352472", "366501": "352472",  # Dictyostelium discoideum
 
 
 def from_taxid(id):
-    """ Return a set of GO organism codes that correspond to NCBI taxonomy id
+    """ Return a set of GO organism codes that correspond to NCBI taxonomy id.
     """
     return Taxonomy()[id]
 
 
 def to_taxid(db_code):
-    """ Return a set of NCBI taxonomy ids from db_code GO organism annotations
+    """ Return a set of NCBI taxonomy ids from db_code GO organism annotations.
     """
     r = [key for key, val in Taxonomy().code_map.items() if db_code == val]
     return set(r)

orangecontrib/bio/kegg/__init__.py

 :mod:`kegg` is a python module for accessing `KEGG (Kyoto Encyclopedia
 of Genes and Genomes) <http://www.genome.jp/kegg/>`_ using its web services.
 
-.. note:: To use this module you need to have `slumber`_ and `requests`_
-          package installed.
+.. note:: This module requires `slumber`_ and `requests`_ packages.
 
 .. _`slumber`: https://pypi.python.org/pypi/slumber/
 
 .. _`requests`: https://pypi.python.org/pypi/requests
 
 
+>>> from Orange.bio.kegg import *
 >>> # Create a KEGG Genes database interface
 >>> genome = KEGGGenome()
 >>> # List all available entry ids
 DEFINITION  Homo sapiens (human)
 ...
 
-The :class:`Organism` class can be used as a convenient starting point
+The :class:`Organism` class can be a convenient starting point
 for organism specific databases.
 
 >>> organism = Organism("Homo sapiens")  # searches for the organism by name
     A convenience class for retrieving information regarding an
     organism in the KEGG Genes database.
 
-    :param org: KEGG organism code (e.g. "hsa", "sce"). Can also be a
+    :param str org: KEGG organism code (e.g. "hsa", "sce"). Can also be a
         descriptive name (e.g. 'yeast', "homo sapiens") in which case the
         organism code will be searched for by using KEGG `find` api.
-    :type org: str
 
     .. seealso::
 

orangecontrib/bio/omim.py

 from collections import defaultdict
 
 class disease(object):
-    """ A class representing a disease in the OMIM database
+    """ A disease in the OMIM database.
     """
     regex = re.compile(r'(?P<name>.*?),? (?P<id>[0-9]{3,6} )?(?P<m1>\([123?]\) )?(?P<m2>\([123?]\) )? *$')
     __slots__ = ["name", "id", "mapping"]

orangecontrib/bio/ontology.py

 ==============================
 
 This module provides an interface for parsing, creating and manipulating of
-OBO ontologies (http://www.obofoundry.org/)
+`OBO ontologies <http://www.obofoundry.org/>`_.
 
 Construct an ontology from scratch with custom terms ::
 
     name: Foo bar
     <BLANKLINE>
 
-To load an ontology from a file pass the file or filename to the
+To load an ontology from a file, pass the file or filename to the
 :class:`OBOOntology` constructor or call its load method ::
 
     >>> buffer.seek(0) # rewind
     >>> ontology.load(buffer)
 
 
-See http://www.geneontology.org/GO.format.obo-1_2.shtml for the definition
-of the .obo file format.
+See the definition of the `.obo file format <http://www.geneontology.org/GO.format.obo-1_2.shtml>`_.
 
 """
 
 
 class OBOObject(object):
     """
-    Represents a generic OBO object (e.g. Term, Typedef, Instance, ...)
+    A generic OBO object (e.g. Term, Typedef, Instance, ...).
     Example::
 
         >>> term = OBOObject(stanza_type="Term", id="FOO:001", name="bar")
 
     def _format_single_tag(self, index):
         """
-        Return a formated string representing index-th tag pair value
+        Return a formated string representing index-th tag pair value.
 
         Example::
 
     def _cache_relations(self):
         """
         Collect all relations from parent to a child and store it in
-        `self._related_to` member.
+        ``self._related_to`` member.
 
         """
         related_to = defaultdict(list)
 
     def parent_terms(self, term):
         """
-        Return a set of all parent terms for this `term`
+        Return a set of all parent terms for this `term`.
         """
         term = self.term(term)
         parents = []
 
 def load(file):
     """
-    Load an ontology from a .obo file
+    Load an ontology from a .obo file.
     """
     return OBOOntology(file)
 

orangecontrib/bio/ppi.py

     """
     A general interface for protein-protein interaction database access.
 
-    An example usage::
+    An example::
 
         >>> ppidb = MySuperPPIDatabase()
         >>> ppidb.organisms() # List all organisms (taxids)
 class BioGRIDInteraction(object):
     """ An object representing a BioGRID interaction. Each member of this object
     represents a data from a single column of BIOGRID-ALL.tab file.
+
     Attributes:
         - *interactor_a*    - BioGRID identifier
         - *interactor_b*    - BioGRID identifier
             del self.protein_names[case("N/A")]
 
     def proteins(self):
-        """ Return all protein names in BioGRID (from INTERACTOR_A, and INTERACTOR_B columns)
+        """ Return all protein names in BioGRID (from INTERACTOR_A, and INTERACTOR_B columns).
         """
         return self.protein_interactions.keys()
 
     def __iter__(self):
-        """ Iterate over all BioGRIDInteraction objects
+        """ Iterate over all BioGRIDInteraction objects.
         """
         return iter(self.interactions)
 
     def __getitem__(self, key):
-        """ Return a list of protein interactions that a protein is a part of
+        """ Return a list of protein interactions that a protein is a part of.
         """
         key = self._case(key)
 #        keys = self.protein_alias_matcher.match(key)
             raise KeyError(key)
 
     def get(self, key, default=None):
-        """ Return a list of protein interactions that a protein is a part of
+        """ Return a list of protein interactions that a protein is a part of.
         """
         key = self._case(key)
 #        keys = self.protein_alias_matcher.match(key)
 
 
 def biogrid_interactions(name=None):
-    """Return a list of protein interactions (BioGRIDInteraction objects) that a protein is a part of
+    """Return a list of protein interactions (BioGRIDInteraction objects) that a protein is a part of.
     """
     if name:
         return list(_BioGRID_Old.get_instance().get(name, set()))
 
 
 def biogrid_proteins():
-    """ Return all protein names in BioGRID (from INTERACTOR_A, and INTERACTOR_B columns)
+    """ Return all protein names in BioGRID (from INTERACTOR_A, and INTERACTOR_B columns).
     """
     return _BioGRID_Old.get_instance().proteins()
 

orangecontrib/bio/taxonomy.py

 
 
 def taxname_to_taxid(name):
-    """Return taxonomy ID for a taxonomy name"""
+    """Return taxonomy ID for a taxonomy name."""
     name_to_taxid = dict(map(reversed, _COMMON_NAMES))
 
     if name in name_to_taxid: