Illustration of graph traversal
-------------------------------
+It's easiest to understand how graph traversal is implemented with a simple example. Take this system:
+.. image:: images/gt-system.png
+* This system has four **nodes**, which are LCI processes, also called transforming activities. Each **node** has one reference product, and a set of zero or more technosphere inputs. By convention, node ``A`` produces one unit of product ``A``.
+* This system has four **edges** which define the inputs of each node. An edge has a start, an end, and an amount.
+* We consider solving this system for a *functional unit* of one unit of ``A``.
+As we traverse this supply chain, we will keep different data for the nodes and the edges. For nodes, we are interested in the following:
+* ``amount``: The total amount of this node needed to produce the functional unit.
+* ``cum``: The cumulative LCA impact score attributable to the needed amount of this node, *including it's specific supply chain*.
+* ``ind``: The individual LCA impact score directly attributable to one unit of this node, i.e. the score from the direct emissions and resource consumption of this node.
+For edges, we want to know:
+* ``to``: The **id** of the node consuming the product.
+* ``from``: The **id** of the node producing the product.
+* ``amount``: The total amount of product ``from`` needed for the amount of ``to`` needed.
+* ``exc_amount``: The amount of ``from`` needed for *one unit* of ``to``, i.e. the value given in the technosphere matrix.
+* ``impact``: The total LCA impact score embodied in this edge, i.e. the individual score of ``from`` times ``amount``.
+Our functional unit is one unit of ``A``. Before starting any calculations, we need to set up our data structures. First, we have an empty list of **edges**. We also have a **heap**, a list which is automatically sorted (see documentation on priority queue below), and keeps track of the **nodes** we need to examine. **nodes** are identified by their row index in the *technosphere matrix*. Finally, we have a dictionary of **nodes**, which looks up nodes by their id numbers.
+ nodes, edges, heap = {}, [], []
+We create a special node, the functional unit, and insert it into the nodes dictionary:
+ 'cum': total_lca_score,
+ 'ind': 1e-6 * total_lca_score
+The *cumulative LCA impact score* is obviously the total LCA score; we set the *individual LCA score* to some small but non-zero value so that it isn't deleted in graph simplification later on.
+We next start building our list of edges. We start with all the inputs to the *functional unit*, which in this case is only one unit of ``A``. Note that the functional unit can have multiple inputs.
+ for node_id, amount in functional_unit:
+ "to": -1, # Special id of functional unit
+ "amount": amount, # By definition
+ "exc_amount": amount, # By definition
+ "impact": LCA(node_id, amount).score, # Evaluate LCA impact score for this node_id and amount
+Finally, we push each node to the **heap**:
+ for node_id, amount in functional_unit:
+ heappush(heap, (abs(1 / LCA(node_id, amount).score), node_id))
+This is not so easy to understand at first glance. What is ``1 / LCA(node_id, amount).score``? Why the absolute value? What is this ``heappush`` thing?
+We want one *divided by* the LCA impact score for node ``A`` because our **heap** is sorted in ascending order, and we want the highest score to be first.
+We take the absolute value because we are interested in the magnitude of node scores in deciding which node to process next, not the sign of the score - leaving out the absolute value would put all negative scores at the top of the heap (which is sorted in ascending order).
+``heappush`` is just a call to push something on to the **heap**, which is our automatically sorted list of nodes to examine.
+After this first iteration, we have the following nodes and edges in our graph traversal:
+.. image:: images/gt-step-1.png
+ nodes = {-1: {'amount': 1, 'cum': some number, 'ind': some small number}}
+ 'from': 0, # Assuming A is 0
+ heap = [(some number, 0)]
+After this is it rather simple: pull off the next node from the *heap*, add it to the list of nodes, construct its edges, and add its inputs to the heap. Iterate until no new nodes are found.
+.. image:: images/gt-step-2.png
+There are two more things to keep in mind:
+* We use a cutoff criteria to stop traversing the supply chain - any node whose cumulative LCA impact score is too small is not added to the heap.
+* We only visit each node once. The is functionality in ``bw2analyzer`` to "unroll" the supply chain so that each process can occur more than once.