Commits

Chris Mutel committed d93acf0

Finishing up documentation updates for 0.11

  • Participants
  • Parent commits 6cecb1e

Comments (0)

Files changed (6)

bw2data/data_store.py

 
 
 class DataStore(object):
-    """Base class for all Brightway2 data stores. Subclasses should define:
+    """
+Base class for all Brightway2 data stores. Subclasses should define:
 
-        * **metadata**: A :ref:`serialized-dict` instance, e.g. ``databases`` or ``methods``. The custom is that each type of data store has a new metadata store, so the data store ``Foo`` would have a metadata store ``foos``.
-        * **dtype_fields**: A list of fields to construct a NumPy structured array, e.g. ``[('foo', np.int), ('bar', np.float)]``.
-        * **validator**: A data validator. Optional. See bw2data.validate.
+    * **metadata**: A :ref:`serialized-dict` instance, e.g. ``databases`` or ``methods``. The custom is that each type of data store has a new metadata store, so the data store ``Foo`` would have a metadata store ``foos``.
+    * **dtype_fields**: A list of fields to construct a NumPy structured array, e.g. ``[('foo', np.int), ('bar', np.float)]``. Uncertainty fields (``base_uncertainty_fields``) are added automatically.
+    * **validator**: A data validator. Optional. See bw2data.validate.
+
+In order to use ``dtype_fields``, subclasses should override the method ``process_data``. This function takes rows of data, and returns the correct values for the custom dtype fields (as a tuple), **and** the ``amount`` field with its associated uncertainty. This second part is a little flexible - if there is no uncertainty, a number can be returned; otherwise, an uncertainty dictionary should be returned.
+
+Subclasses should also override ``add_mappings``. This method takes the entire dataset, and loads objects to :ref:`mapping` or :ref:`geomapping` as needed.
 
     """
     validator = None
         raise NotImplementedError
 
     def process(self):
-        """Process intermediate data from a Python dictionary to a `stats_arrays <https://pypi.python.org/pypi/stats_arrays/>`_ array, which is a `NumPy <http://numpy.scipy.org/>`_ `Structured <http://docs.scipy.org/doc/numpy/reference/generated/numpy.recarray.html#numpy.recarray>`_ `Array <http://docs.scipy.org/doc/numpy/user/basics.rec.html>`_. A structured array (also called record array) is a heterogeneous array, where each column has a different label and data type.
+        """
+Process intermediate data from a Python dictionary to a `stats_arrays <https://pypi.python.org/pypi/stats_arrays/>`_ array, which is a `NumPy <http://numpy.scipy.org/>`_ `Structured <http://docs.scipy.org/doc/numpy/reference/generated/numpy.recarray.html#numpy.recarray>`_ `Array <http://docs.scipy.org/doc/numpy/user/basics.rec.html>`_. A structured array (also called record array) is a heterogeneous array, where each column has a different label and data type.
 
-        Processed arrays are saved in the ``processed`` directory.
+Processed arrays are saved in the ``processed`` directory.
+
+Uses ``pickle`` instead of the native NumPy ``.tofile()``. Although pickle is ~2 times slower, this difference in speed has no practical effect (e.g. one twentieth of a second slower for ecoinvent 2.2), and the numpy ``fromfile`` and ``tofile`` functions don't preserve the datatype of structured arrays.
+
         """
         data = self.load()
         arr = np.zeros((len(data),), dtype=self.dtype)
             )
 
     def add_mappings(self, data):
+        """Add objects to ``mapping`` or ``geomapping``, if necessary.
+
+        Args:
+            * *data* (object): The data
+
+        """
         return
 
     def validate(self, data):

bw2data/database.py

 
 
 class Database(DataStore):
-    """A manager for a database. This class can register or deregister databases, write intermediate data, process data to parameter arrays, query, validate, and copy databases.
+    """
+    A data store for LCI databases.
 
     Databases are automatically versioned.
 
-    The Database class never holds intermediate data, but it can load or write intermediate data. The only attribute is *database*, which is the name of the database being managed.
+    Instantiation does not load any data. If this database is not yet registered in the metadata store, a warning is written to ``stdout``.
 
-    Instantiation does not load any data. If this database is not yet registered in the metadata store, a warning is written to ``stdout``.
+    The data schema for databases is:
+
+    .. code-block:: python
+
+        Schema({valid_tuple: {
+            Required("name"): basestring,
+            Required("type"): basestring,
+            Required("exchanges"): [{
+                Required("input"): valid_tuple,
+                Required("type"): basestring,
+                Required("amount"): Any(float, int),
+                **uncertainty_fields
+                }],
+            "categories": Any(list, tuple),
+            "location": object,
+            "unit": basestring
+            }}, extra=True)
+
+    where:
+        * ``valid_tuple`` is a dataset identifier, like ``("ecoinvent", "super strong steel")``
+        * ``uncertainty_fields`` are fields from an uncertainty dictionary
+
+    The data format is explained in more depth in the `Brightway2 documentation <http://brightway2.readthedocs.org/en/latest/key-concepts.html#documents>`_.
+
+    Processing a Database actually produces two parameter arrays: one for the exchanges, which make up the technosphere and biosphere matrices, and a geomapping array which links activities to locations.
 
     Args:
         *name* (str): Name of the database to manage.
         except OSError:
             raise MissingIntermediateData("This version (%i) not found" % version)
 
-
     def process(self, version=None):
         """
-Process intermediate data from a Python dictionary to a `stats_arrays <https://pypi.python.org/pypi/stats_arrays/>`_ array, which is a `NumPy <http://numpy.scipy.org/>`_ `Structured <http://docs.scipy.org/doc/numpy/reference/generated/numpy.recarray.html#numpy.recarray>`_ `Array <http://docs.scipy.org/doc/numpy/user/basics.rec.html>`_. A structured array (also called record array) is a heterogeneous array, where each column has a different label and data type.
+Process inventory documents.
 
-Processed arrays are saved in the ``processed`` directory.
-
-Uses ``pickle`` instead of the native NumPy ``.tofile()``. Although pickle is ~2 times slower, this difference in speed has no practical effect (e.g. one twentieth of a second slower for ecoinvent 2.2), and the numpy ``fromfile`` and ``tofile`` functions don't preserve the datatype of structured arrays.
-
-The structure for processed inventory databases includes additional columns beyond the basic ``stats_arrays`` format:
-
-================ ======== ===================================
-Column name      Type     Description
-================ ======== ===================================
-uncertainty_type uint8    integer type defined in `stats_arrays.uncertainty_choices`
-input            uint32   integer value from `Mapping`
-output           uint32   integer value from `Mapping`
-geo              uint32   integer value from `GeoMapping`
-row              uint32   column filled with `NaN` values, used for matrix construction
-col              uint32   column filled with `NaN` values, used for matrix construction
-type             uint8    integer type defined in `bw2data.utils.TYPE_DICTIONARY`
-amount           float32  amount without uncertainty
-loc              float32  location parameter, e.g. mean
-scale            float32  scale parameter, e.g. standard deviation
-shape            float32  shape parameter
-minimum          float32  minimum bound
-maximum          float32  maximum bound
-negative         bool     `amount` < 0
-================ ======== ===================================
-
-See also `NumPy data types <http://docs.scipy.org/doc/numpy/user/basics.types.html>`_.
+Creates both a parameter array for exchanges, and a geomapping parameter array linking inventory activities to locations.
 
 Args:
     * *version* (int, optional): The version of the database to process
 
-Doesn't return anything, but writes a file to disk.
+Doesn't return anything, but writes two files to disk.
 
         """
         data = self.load(version)
         with open(filepath, "wb") as f:
             pickle.dump(arr, f, protocol=pickle.HIGHEST_PROTOCOL)
 
-
     def query(self, *queries):
         """Search through the database. See :class:`query.Query` for details."""
         return Query(*queries)(self.load())
             datetime.datetime.fromtimestamp(os.stat(os.path.join(
             config.dir, directory, name)).st_mtime)) for name in files])
 
-
     def write(self, data):
         """Serialize data to disk.
 
+        Normalizes units when found.
+
         Args:
             * *data* (dict): Inventory data
 

bw2data/ia_data_store.py

 
 class ImpactAssessmentDataStore(DataStore):
     """
-A subclass of ``DataStore`` for impact assessment methods, which uses the ``abbreviate`` function to transform tuples of strings into a single string, and looks up abbreviations to generate filenames.
-
-A manager for a impact assessment data. This class can register or deregister methods, write intermediate data, and copy methods.
-
-This is meant to be subclassed, and should not be used directly.
-
-Subclasses should define the following:
-
-======== ========= ===========================================
-name     type      description
-======== ========= ===========================================
-metadata attribute metadata class instances, e.g. ``methods``
-validate method    method that validates input data
-process  method    method that writes processesd data to disk
-======== ========= ===========================================
-
-The ImpactAssessmentDataStore class never holds intermediate data, but it can load or write intermediate data. The only attribute is *name*, which is the name of the method being managed.
-
-Instantiation does not load any data. If this IA object is not yet registered in the metadata store, a warning is written to ``stdout``.
+A subclass of ``DataStore`` for impact assessment methods, which uses the ``abbreviate`` function to transform tuples of strings into a single string, and looks up abbreviations to generate filenames. Translated into less technical language, that means that we can't use ``('ReCiPe Endpoint (E,A)', 'human health', 'ionising radiation')`` as a filename, but we can use ``recipee(hhir-70eeef20a20deb6347ad428e3f6c5f3c``.
 
 IA objects are hierarchally structured, and this structure is preserved in the name. It is a tuple of strings, like ``('ecological scarcity 2006', 'total', 'natural resources')``.
 

bw2data/method.py

 
     Methods are hierarchally structured, and this structure is preserved in the method name. It is a tuple of strings, like ``('ecological scarcity 2006', 'total', 'natural resources')``.
 
-    Method metadata should include the following:
-        ``unit``:
+    The data schema for IA methods is:
+
+    .. code-block:: python
+
+            Schema([Any(
+                [valid_tuple, maybe_uncertainty],         # site-generic
+                [valid_tuple, maybe_uncertainty, object]  # regionalized
+            )])
+
+    where:
+        * ``valid_tuple`` is a dataset identifier, like ``("biosphere", "CO2")``
+        * ``maybe_uncertainty`` is either a number or an uncertainty dictionary
+        * ``object`` is a location, needed only for regionalized LCIA
 
     Args:
         * *name* (tuple): Name of the method to manage. Must be a tuple of strings.

bw2data/weighting_normalization.py

 
 
 class Weighting(ImpactAssessmentDataStore):
+    """
+    LCIA weighting data - used to combine or compare different impact categories.
+
+    The data schema for weighting is a one-element list:
+
+    .. code-block:: python
+
+            Schema(All(
+                [uncertainty_dict],
+                Length(min=1, max=1)
+            ))
+
+    """
     metadata = weightings
     valdiator = weighting_validator
     dtype_fields = []
         super(Weighting, self).write(data)
 
     def process_data(self, row):
-        return (), row
+        return ((), # don't know much,
+            row)    # but I know I love you
 
 
 class Normalization(ImpactAssessmentDataStore):
+    """
+    LCIA normalization data - used to transform meaningful units, like mass or damage, into "person-equivalents" or some such thing.
+
+    The data schema for IA normalization is:
+
+    .. code-block:: python
+
+            Schema([
+                [valid_tuple, maybe_uncertainty]
+            ])
+
+    where:
+        * ``valid_tuple`` is a dataset identifier, like ``("biosphere", "CO2")``
+        * ``maybe_uncertainty`` is either a number or an uncertainty dictionary
+
+    """
     metadata = normalizations
     valdiator = normalization_validator
     dtype_fields = [
     * :ref:`weighting`
     * :ref:`normalization`
 
+Validation
+----------
+
+Data validation is done using the great `voluptuous library <https://pypi.python.org/pypi/voluptuous/>`_. Each data store can define its own validation schema. See the individual data stores documentation for details on its data format.
+
 Document and processed data
 ===========================
 
 During processing, the uncertainty dictionaries are converted to rows in a NumPy array.
 
 Mappings
---------
+========
 
 Sometimes, important data can't be stored as a numeric value. For example, the location of an inventory activity is important for regionalization, but is given by a text string, not an integer. In this case, we use :ref:`serialized-dict` to store mappings between objects are integer indices. Brightway2-data uses two such mappings: