# Lea / Lea3_Tutorial_3

Table of Content

<< Tutorial [2/3]

# Introduction

The present page is the third part of the tutorial for the Lea library. It assumes that you have installed Lea 3 and that you are familiar with the techniques presented in the first part of the tutorial. The main topics are mostly independent, so you could choose to jump to your preferred subject without much trouble (see TOC above).

# Other functions and features

## Plotting histograms (histo, plot methods)

### The plot method (matplotlib)

If you've installed the matplotlib library, you can very easily plot the probability mass function of a given Lea instance. We have already seen that all you have to do is invoking the plot() method on any Lea instance. By default the plot() method draws the pmf in a dedicated window. Several arguments can be specified to customize this behavior, including saving the chart in an image file. Here is the full signature, with default values:

your_lea_instance.plot(title=None,fname=None,savefig_args=dict(),**bar_args)

argument description
title a string giving the title to display (no title if None)
fname if not None, then the chart is saved in a file specified by given fname, as described in matplotlib.pyplot.savefig function
savefig_args (used only if fname is not None) a dictionary relayed to matplotlib.pyplot.savefig function and containing named arguments expected by this function
bar_args named arguments relayed to matplotlib.pyplot.bar function

Example:

from lea.leaf import dice
dice(3,6).plot(title='Three fair dice',fname='pmf_3d6.png',savefig_args=dict(bbox_inches='tight'),color='lightblue')


### The histo and as_string methods

If you haven't installed the matplotlib library, there are alternative ways to plot histograms:

• use another plotting package: this requires to extract pmf data, which is the subject of the next section,
• or display text-oriented histograms, directly on your console: this is the subject of the present section.

The histo method returns a string that allows displaying the pmf as a textual histogram:

>>> print(dice(3,6).histo())
3 :
4 :  -
5 :  ---
6 :  -----
7 :  -------
8 :  ----------
9 :  ------------
10 :  -------------
11 :  -------------
12 :  ------------
13 :  ----------
14 :  -------
15 :  -----
16 :  ---
17 :  -
18 :


If you want to scale this histogram you can pass a percentage argument: for instance .histo(200) shall double the default size.

For advanced display control, there exists a more powerful formatting method: as_string(…). It allows you formatting together the probability numbers and the histogram. The first argument is a string of 1 or 2 characters: the first character is '.' for float, '%' for percentage or '/' for fraction format. If you append '-', then you get the histogram just on the right of the number. Also, there are two more optional arguments for specifying the number of decimals required (nb_decimals with default: 6) and the scale of the histogram (histo_size with default: 100). Here are some examples of what you can do.

>>> print (dice(3,6).as_string('%',nb_decimals=2))
3 :   0.46 %
4 :   1.39 %
5 :   2.78 %
6 :   4.63 %
7 :   6.94 %
8 :   9.72 %
9 :  11.57 %
10 :  12.50 %
11 :  12.50 %
12 :  11.57 %
13 :   9.72 %
14 :   6.94 %
15 :   4.63 %
16 :   2.78 %
17 :   1.39 %
18 :   0.46 %
>>> print (dice(3,6,'r').as_string('/-',histo_size=200))
3 :  1/216 -
4 :  3/216 ---
5 :  6/216 ------
6 : 10/216 ---------
7 : 15/216 --------------
8 : 21/216 -------------------
9 : 25/216 -----------------------
10 : 27/216 -------------------------
11 : 27/216 -------------------------
12 : 25/216 -----------------------
13 : 21/216 -------------------
14 : 15/216 --------------
15 : 10/216 ---------
16 :  6/216 ------
17 :  3/216 ---
18 :  1/216 -
>>> print (dice(3,6).as_string('.-',nb_decimals=6,histo_size=300))
3 : 0.004630 -
4 : 0.013889 ----
5 : 0.027778 --------
6 : 0.046296 --------------
7 : 0.069444 ---------------------
8 : 0.097222 -----------------------------
9 : 0.115741 -----------------------------------
10 : 0.125000 --------------------------------------
11 : 0.125000 --------------------------------------
12 : 0.115741 -----------------------------------
13 : 0.097222 -----------------------------
14 : 0.069444 ---------------------
15 : 0.046296 --------------
16 : 0.027778 --------
17 : 0.013889 ----
18 : 0.004630 -


## Mode and maximum a posteriori estimation (MAP)

In some situations, you are just interested in the values that have the highest probability in a given probability distribution. This is traditionally known under the term "mode". On Lea instances, you can get these values as a tuple, using the mode attribute. Here are some example showing the mode associated to throws of 1, 2 and 3 dice:

>>> d1 = interval(1,6)
>>> d1.mode
(1, 2, 3, 4, 5, 6)
>>> d1.times(2).mode
(7,)
>>> d1.times(3).mode
(10, 11)


The mode is useful in particular to determine a maximum a posteriori estimation (MAP), which consists in finding the most likely assignments of a given set of variables given some evidence. Consider for example the Bayesian network presented earlier. A MAP query could be: "What are the most likely states for burglary, earthquake and john_calls variables, given that we know that Mary called?"

>>> lea.joint(burglary,earthquake,john_calls).given(mary_calls).mode
((False, False, False),)


The answer is: burglary = earthquake = john_calls = False, which means in clear that Mary actually called without worrying reasons, just to take news. Here are some other MAP queries for the same three variables, given other evidences:

>>> lea.joint(burglary,earthquake,john_calls).given(burglary|earthquake).mode
((False, True, False),)
>>> lea.joint(burglary,earthquake,john_calls).given(mary_calls,burglary|earthquake).mode
((True, False, True),)


## Drawing with/without replacement (draw method)

In several situations, we are interested in a sequence of values obtained from the same random source: lottery, card drawing, dice throws, etc. These sequences can be defined using different protocols:

• with or without replacement: whether the drawn value is withdrawn from the remaining values (no duplicates)
• sorted or not: whether each sequence shall be sorted or not (i.e. order of drawings is irrelevant or not).

The draw method addresses such need, supporting the different protocols through two Boolean arguments. The signature is as follows:

leaInstance.draw(n,sorted=False,replacement=False)

argument description
n the number of objects to draw, each value of the resulting distribution will be a n-tuple
sorted boolean: True means that the order of drawing is irrelevant and the tuples are arbitrarily sorted by increasing order (default: False)
replacement boolean: True means that the drawing is made WITH replacement so the same element may occur several times in each tuple (default: False)

The two boolean arguments sorted and replacement can be used independently each from each other, leading to four cases. The best way to understand the meaning of these arguments is to show the result obtained for each of the four combinations. Let's define a non-uniform distribution with 3 objects:

>>> d = lea.pmf({'A': '3/6', 'B': '2/6', 'C': '1/6'})
>>> d
A : 3/6
B : 2/6
C : 1/6


case 1. No sorting, without replacement

>>> d.draw(2)
('A', 'B') : 20/60
('A', 'C') : 10/60
('B', 'A') : 15/60
('B', 'C') :  5/60
('C', 'A') :  6/60
('C', 'B') :  4/60


case 2. No sorting, with replacement

>>> d.draw(2,replacement=True)
('A', 'A') : 9/36
('A', 'B') : 6/36
('A', 'C') : 3/36
('B', 'A') : 6/36
('B', 'B') : 4/36
('B', 'C') : 2/36
('C', 'A') : 3/36
('C', 'B') : 2/36
('C', 'C') : 1/36


case 3. Sorting, without replacement

>>> d.draw(2,sorted=True)
('A', 'B') : 35/60
('A', 'C') : 16/60
('B', 'C') :  9/60


case 4. Sorting, with replacement

>>> d.draw(2,sorted=True,replacement=True)
('A', 'A') :  9/36
('A', 'B') : 12/36
('A', 'C') :  6/36
('B', 'B') :  4/36
('B', 'C') :  4/36
('C', 'C') :  1/36


Note: The processing of case 3 and 4 (sorted=True) is usually more efficient than the other cases thanks to a special combinatorial algorithm due to Paul Moore. Note however that, in case 3 (replacement=False), this special algorithm will be used ONLY if the probability distribution is uniform (actually, the most efficient algorithm is chosen by the draw method, which checks whether the distribution is uniform or not); this is not the case in the example above but it applies fortunately in most "fair" games.

## Revised probability distributions (given_prob method)

The calculation of conditional probabilities using the given method is fine when the information expressed in the condition is 100% sure. In some situations however, this information is uncertain but it can be associated to a precise probability value p. The goal is then to calculate a revised probability distribution so that the given condition has a probability p to occur. Another interpretation of this goal could be that you want to favor some occurrences of the prior probability distribution by re-balancing the probabilities.

The method given_prob(cond,p) serves this purpose, where cond is a condition expressed as a boolean distribution (exactly as given method), p is the probability that cond is true.

As an example, suppose that from a die assumed to be fair initially, we note that it gives eventually the values 5 or 6, with a probability of 1/2. The revised distribution is calculated as follows:

>>> die = lea.interval(1,6,prob_type='r')
>>> revised_die = die1.given_prob(die >= 5,'1/2')
>>> die
1 : 1/8
2 : 1/8
3 : 1/8
4 : 1/8
5 : 2/8
6 : 2/8


The probability that has been set can be verified easily by evaluating the given condition on the new distribution:

>>> P(revised_die >= 5)
1/2


Note that the given_prob method is a generalization of the given method:

• d.given(cond) is equivalent to d.given_prob(cond,1)
• d.given(~cond) is equivalent to d.given_prob(cond,0)

## Min / max functions (min_of, max_of functions)

In the "Applying functions" section, we have seen how to build a distribution taking the maximum (or minimum) values from other distributions. Lea provides the min_of and max_of methods, which do the same job:

>>> die1 = lea.interval(1,6,prob_type='r')
>>> die2 = die1.new()
>>> lea.min_of(die1,die2)
1 : 11/36
2 :  9/36
3 :  7/36
4 :  5/36
5 :  3/36
6 :  1/36
>>> lea.max_of(die1,die2)
1 :  1/36
2 :  3/36
3 :  5/36
4 :  7/36
5 :  9/36
6 : 11/36


Note however that the performance dramatically degrades when adding more arguments (exponential complexity). Consider for example the following statements that take the minimum value among 7 dice:

>>> dice = [die1.new() for i in range(7)]
>>> lea.min_of(*dice)
1 : 201811/279936
2 :  61741/279936
3 :  14197/279936
4 :   2059/279936
5 :    127/279936
6 :      1/279936


This takes about 10 seconds to calculate (on a "decent" CPU bought in 2013).

Fortunately, there is a way to avoid this by enabling a more efficient algorithm (linear complexity), which is due to Nicky van Foreest (see Nicky's "Stochastic Makespans" page). Simply add the fast=True argument:

>>> lea.min_of(*dice,fast=True)


The result is the same... but it is produced in the blink of an eye! However, there ain't no such thing as a free lunch... Putting fast=True present a small drawback. Unlike most of Lea methods, the probability distributions returned by these two methods lose any dependency with given arguments; in, other words, they do not not comply with referential consistency. This could be important when using expressions with joint or conditional probabilities. For instance, the following results are WRONG:

>>> lea.min_of(die1,die2,fast=True).given(die1 > 4)
1 : 11/36
2 :  9/36
3 :  7/36
4 :  5/36
5 :  3/36
6 :  1/36
>>> P((lea.min_of(die1,die2,fast=True) == 1).given(die1 > 4))
11/36


since each die1 occurrence is considered as an independent event, e.g. in the first expression, the given information die1 > 4 is ignored. To get the right results, simply remove the fast argument or set it to False.

To summarize, use fast=True for best performance but take care to use it in simple expressions not requiring referential consistency, i.e. without other occurrences of inner arguments.

## Indexing and slicing

When working with Python's strings, tuples or lists, it is common to extract subparts of these objects by using indexing or slicing operations (i.e. the brackets syntax). When such operation is operated on a Lea instance, it is propagated on the inner values, passing the given (slice) index. The result is a new probability distribution containing the indexed or sliced values.

To give an example, consider the following Lord of the Rings' characters micro-database:

>>> character = lea.vals(('Gimli','dwarf'),('Bilbo','hobbit'),('Frodo','hobbit'),('Meriadoc','hobbit'),
('Pippin','hobbit'),('Gandalf','wizard'),('Boromir','man'),('Faramir','man'),prob_type='r')
>>> character
('Bilbo'   , 'hobbit') : 1/8
('Boromir' , 'man'   ) : 1/8
('Faramir' , 'man'   ) : 1/8
('Frodo'   , 'hobbit') : 1/8
('Gandalf' , 'wizard') : 1/8
('Gimli'   , 'dwarf' ) : 1/8
('Pippin'  , 'hobbit') : 1/8


We can extract the distribution of names and races as follows (remind that first item has index 0 in Python) :

>>> name = character[0]
>>> name
Bilbo    : 1/8
Boromir  : 1/8
Faramir  : 1/8
Frodo    : 1/8
Gandalf  : 1/8
Gimli    : 1/8
Pippin   : 1/8
>>> race = character[1]
>>> race
dwarf  : 1/8
hobbit : 4/8
man    : 2/8
wizard : 1/8


Other examples (aka "SQL, my precious! ") :

• What is the distribution of the last letter of names?
>>> name[-1]
c : 1/8
f : 1/8
i : 1/8
n : 1/8
o : 2/8
r : 2/8

• Who are the hobbits? (note how name and race are interdependent variables!)
>>> name.given(race == 'hobbit')
Bilbo    : 1/4
Frodo    : 1/4
Pippin   : 1/4

• What is the race of those having a name ending with "ir"?
>>> race.given(name[-2:] == 'ir')
man : 1

• Who is who?
>>> name + ' is a ' + race + '.'
Bilbo is a hobbit.    : 1/8
Boromir is a man.     : 1/8
Faramir is a man.     : 1/8
Frodo is a hobbit.    : 1/8
Gandalf is a wizard.  : 1/8
Gimli is a dwarf.     : 1/8
Meriadoc is a hobbit. : 1/8
Pippin is a hobbit.   : 1/8

• Who is the Chosen One?
>>> name.random() + ' is the Chosen One.'
'Frodo is the Chosen One.'


Note the character distribution is nothing else than a joint probability distribution (JPD). The indexing technique demonstrated here could be used also to extract specific attributes from the inner tuples. Revisiting the example given in "Defining joint probability distributions, you could skip the definition of jpd variable using the as_joint method and do instead:

toothache = _jpd[0]
catch = _jpd[1]
cavity = _jpd[2]


These three variables shall then be interdependent; these could be used as demonstrated in "Defining joint probability distributions.

## Object attributes and methods

The micro-DB example above can be redone using user-defined objects instead of tuples.

class C(object):

def __init__(self,name,race):
self.name = name
self.race = race

def __str__(self):
return "%s (%s)"%(self.name,self.race)

def health_points(self):
return 1 + hash(self.name+self.race) % 10

C('Pippin','hobbit'),C('Gandalf','wizard'),C('Boromir','man'),C('Faramir','man'),prob_type='r')


You get here a distribution having C objects as domain, instead of tuples.

>>> character2
Faramir (man)     : 1/8
Gimli (dwarf)     : 1/8
Pippin (hobbit)   : 1/8
Gandalf (wizard)  : 1/8
Bilbo (hobbit)    : 1/8
Boromir (man)     : 1/8
Frodo (hobbit)    : 1/8


Then, attributes can be extracted using the dot notation directly on the probability distribution.

>>> character.name
Bilbo    : 1/8
Boromir  : 1/8
Faramir  : 1/8
Frodo    : 1/8
Gandalf  : 1/8
Gimli    : 1/8
Pippin   : 1/8
>>> character.race
dwarf  : 1/8
hobbit : 4/8
man    : 2/8
wizard : 1/8


Also, method invocation is feasible; this returns a new probability distribution with the returned results.

>>> character.health_points()
1 : 1/8
3 : 1/8
7 : 2/8
8 : 3/8
9 : 1/8
>>> character.name + ', with HP = ' + character.health_points().map(str)
Bilbo, with HP = 10   : 1/8
Boromir, with HP = 4  : 1/8
Faramir, with HP = 1  : 1/8
Frodo, with HP = 5    : 1/8
Gandalf, with HP = 10 : 1/8
Gimli, with HP = 8    : 1/8
Meriadoc, with HP = 6 : 1/8
Pippin, with HP = 9   : 1/8


Now, the most watchful readers may have noticed an issue with this simple syntax. How Lea can distinguish between his own attributes/methods (like mean, stdev, given, etc) and those to be relayed to the inner objects (like name, race, etc)? Well, the rule is simple: if the attribute/method name is unknown in Lea API, then it is propagated in the inner objects. Of course, this may cause worries in the case of homonyms, i.e. the inner objects have, by mischance, a name defined in Lea API; such attribute is not accessible with the simple technique since it is "hidden" by the Lea API. Should such situation occurs, there exists a trick using Lea's map method and Python getattr function. Assume for example, that the class C above has an attribute called mean. Here is how to get the distribution of this attribute.

>>> character2.map(getattr,('mean',))


The ability to easily relay methods on inner objects a given probability distribution can be used for builtin Python objects. String's upper method provides a trivial example:

>>> lea.set_prob_type('r')
>>> lea.vals(*"alea").upper()
A : 2/4
E : 1/4
L : 1/4


For a more useful example, suppose you throw 3 fair dice and you want to know the probability to get at least two 6's. We have seen how to use the draw method to model the three throws:

>>> die = lea.interval(1,6)
>>> three_dice = die.draw(3,replacement=True)


three_dice contains 6x6x6 = 216 tuples, each containing 3 elements. To get the result, you can directly use the count method defined on Python's tuples:

>>> count_6 = three_dice.count(6)
>>> count_6
0 : 125/216
1 :  75/216
2 :  15/216
3 :   1/216
>>> P(count_6 >= 2)
2/27


Once you are familiar with all these techniques, you could get such result in a one-liner:

>>> P(lea.interval(1,6).draw(3,replacement=True).count(6) >= 2)
2/27


## Reducing sequences of probability distribution (reduce_all method)

Lea instances can be put in a Python container type, like lists and tuples. then, these can be "reduced" using a given 2-ary function to produce a new Lea instance. Consider for example the following tuple containing three dices with 2, 4 and 6 faces:

>>> dice = tuple(lea.interval(1,n,prob_type='r') for n in (2,4,6))
>>> dice
(1 : 1/2
2 : 1/2, 1 : 1/4
2 : 1/4
3 : 1/4
4 : 1/4, 1 : 1/6
2 : 1/6
3 : 1/6
4 : 1/6
5 : 1/6
6 : 1/6)


The display is not very clean because the tuple's display format mixes with Lea's. Now, suppose we throw the three dice together. To get the distribution of the sum of their values, the standard Python's sum function can be used:

>>> sum(dice)
3 : 1/48
4 : 3/48
5 : 5/48
6 : 7/48
7 : 8/48
8 : 8/48
9 : 7/48
10 : 5/48
11 : 3/48
12 : 1/48


Fine,… but what if we want to get the product of the values of the three dice? There is no builtin function for this. You could do the job through a loop or by using the builtin reduce function, which is only available in Python 2. To meet this need, Lea provides a dedicate function, reduce_all, which treats the elements of an iterable of Lea instances according to a given 2-ary function.

>>> import operator
>>> Lea.reduce_all(operator.mul,dice)
1 : 1/48
2 : 3/48
3 : 2/48
4 : 5/48
5 : 1/48
6 : 5/48
8 : 5/48
9 : 1/48
10 : 2/48
12 : 6/48
15 : 1/48
16 : 3/48
18 : 2/48
20 : 2/48
24 : 4/48
30 : 1/48
32 : 1/48
36 : 1/48
40 : 1/48
48 : 1/48


The first argument of reduce_all is typically a standard 2-ary operator but any user-defined 2-ary function is valid also, provided that it applies to the values of Lea instances present in the sequence.

Note that reduce_all admits the optional argument absorber (default: None): you can specify the absorber value for the given function; for instance, we could set absorber=0 for the multiplication case above. The effect is that it activates a more efficient algorithm able to stop combinatorial as soon as the absorber is met. This can accelerate things dramatically. For instance, the following calculation takes several seconds:

>>> z = tuple(lea.bernoulli('1/4') for _ in range(16))
>>> Lea.reduce_all(operator.mul,z,absorber=0)
0 : 65535/65536
1 :     1/65536


In contrast, if you take care to specify the absorber value:

>>> Lea.reduce_all(operator.mul,z,absorber=0)


then you get the same result in a blink of an eye! Actually, the absorber is super-efficient on this special example because there is a lot of zeroes: among the theoretical 2**16=65536 cases, it's able to remove everything except 2 cases.

## Advanced calculation options (calc method)

Suppose that you have defined a Lea instance x with several dependencies to other instances. Remember that Lea works with lazy evaluation: when you execute statements like print (x), print (x.p(4))) or avg = x.mean, a calculation occurs behind the scene to evaluate the pmf of x, before doing the specific required action. This calculation is done implicitly, in order to ease your life. You can however ask to perform this calculation explicitly at any time by making this call:

x.calc()


This performs the required calculations and returns a new instance with the calculated pmf. This instance is independent of any other variables, so it does the same as x.new(). As such, this call is not very useful except if you want to control finely when the calculation is done. Now, the real interest is that calc has three optional arguments that let you make some customization of the calculation algorithm. Before describing these arguments, beware that these are meant for "advanced" use: you have to handle them with care if you want to avoid any misinterpretation of the delivered results.

The full signature of calc, with the default values, is:

calc(self,prob_type=-1,sorting=True,normalization=True,bindings=None,memoization=True)


The first two arguments (prob_type and sorting) are described in the section dedicated to advanced constructor arguments. The three remaining arguments (normalization, bindings, and memoization) are described below. Note that normalization has been presented earlier but it requires a special coverage here.

### Avoiding normalization

The optional normalization argument has the same purpose as explained for the [normalization argument in the pmf method] (Lea3_Tutorial_1#markdown-header-more-on-normalization). By calling calc(normalization=False), the probabilities that are collected are NOT divided by their sum. This means that you can get a "pseudo" probability distribution, for which the probability sum is different from 1. This occurs in particular when you deal with conditional probabilities (given method): the probability total you get is less than 1; the missing denominator is actually the probability of the given condition: in mathematical term: instead of getting P(A|B) = P(A&B)/P(B), you get P(A&B).

This feature may seem uninteresting but, in some cases, it could be used to have explanations on calculated probabilities. Consider the BN example given earlier. Imagine you want to know the probability that BOTH John AND Mary calls. This is easy to calculate:

>>> P(john_calls & mary_calls)
0.002084100239


however, it's difficult to explain this calculation or to redo it manually. We have seen that a way to understand such result is to display the 16 atomic events (or "possible worlds") by using the joint method:

lea.joint(earthquake,burglary,alarm,john_calls,mary_calls)


From this full joint probability distribution, you can now add the 8 probabilities where the last two variables are true; you should then obtain the same value as above. Since this treatment is cumbersome, you could use the given method to make the filtering for you:

>>> lea.joint(earthquake,burglary,alarm,john_calls,mary_calls).given(john_calls & mary_calls)
(False, False, False, True, True) : 0.23895323731595242
(False, False, True , True, True) : 0.30138246147957953
(False, True , False, True, True) : 1.4365911696438337e-05
(False, True , True , True, True) : 0.28358309688769245
(True , False, False, True, True) : 0.00034033391807504135
(True , False, True , True, True) : 0.17515213192200016
(True , True , False, True, True) : 2.3991168497726016e-08
(True , True , True , True, True) : 0.0005743485738355601


Unfortunately, the probabilities are not the same as in the previous joint table: the new joint table has been normalized to balance the entries that have been filtered out. Here comes the normalization=False trick:

>>> lea.joint(earthquake,burglary,alarm,john_calls,mary_calls).given((john_calls & mary_calls)).calc(normalization=False)
(False, False, False, True, True) : 0.0004980024990000001
(False, False, True , True, True) : 0.0006281112599999999
(False, True , False, True, True) : 2.994000000000003e-08
(False, True , True , True, True) : 0.0005910156
(True , False, False, True, True) : 7.0929e-07
(True , False, True , True, True) : 0.0003650346
(True , True , False, True, True) : 5.0000000000000054e-11
(True , True , True , True, True) : 1.197e-06


What you get is NOT a true probability distribution; it's just a truncated probability distribution! If you want now to sum the 8 probabilities, simply append .p_sum:

>>> lea.joint(earthquake,burglary,alarm,john_calls,mary_calls).given((john_calls & mary_calls)).calc(normalization=False).p_sum
0.0020841002389999997


This sum is then the probability that John and Mary calls together, as calculated before. So, to summarize, what we have done here is mimicking the internal calculations made internally by Lea's algorithm.

### Making explicit bindings

In many cases, the evidences passed to the given method are equalities. This is sometimes called "observations": variable x has been observed to be a given value u, variable y has been observed to be a given value v, etc. For example, consider this situation which shows the probability distributioon of the sum of two dice given the observation of one of these die:

>>> d1 = lea.interval(1,6)
>>> d2 = lea.interval(1,6)
>>> d = d1 + d2
>>> d.given(d1 == 1)
2 : 0.16666666666666663
3 : 0.16666666666666663
4 : 0.16666666666666663
5 : 0.16666666666666663
6 : 0.16666666666666663
7 : 0.16666666666666663


Actually, since the Lea algorithm is designed to cover any evidence condition, it shall for enumerate all the 36 possible combination of values of (d1,d2) for the variables and evaluates for each of them whether the condition d1 == 1 is true or false. This works fine but there is some waste of CPU to make this work, especially if the number of the observed variables is getting big (remember also that their domain can be non-boolean, with unlimited number of elements). Fortunately, this waste can be avoided for elementary variables (those defined by a known prior pmf, like d1 and d2). The technique is called explicit bindings: this consists in specifying, in the bindings argument, a dictionary associating the observed variables with the their values. This is demonstrated in the following expression:

>>> d.calc(bindings={d1:1})


This returns the same result as previously but by browsing only 6 cases, which are the possible values of d2. Note that the specified binding holds for the query only, so d1 is unbound as soon as the result has been calculated. A more extreme example is to bind all elementary variables:

>>> d.calc(bindings={d1:1,d2:4})
5 : 1.0


In such case, the enumeration is limited to one single case.

Such technique could be efficient when the number of cases to examine is large. Remember however that it can be used only for elementary variables; an exception is raised if you try to bind a derived variable like d.

TO BE WRITTEN

# Machine Learning with Lea

So far, most of the examples shown demonstrate various ways to calculate probability distributions from known probability distributions. However, apart trivial cases of uniform distributions found in gambling, it could be difficult to determine prior probability distributions for a given phenomenon. If no quantitative data are available for the phenomenon at hand, you have to make assumptions with some theoretical model, assuming this is not too complex. On the other hand, if quantitative data are available, then it is possible, with dedicated algorithms, to calculate prior probability distributions that tend to fit the given data. This last approach belongs to the field of Machine Learning (ML). In the present section, we shall present a couple of basic techniques to "learn" probabilities from the data.

## Frequentist inference

Actually, we have already presented a very basic ML approach. Counting occurrences of a given phenomenon and derive probabilities as calculated frequencies is indeed a way to "learn" from the data.

For instance, frequencies of English letters can be learned from text data:

>>> lea.set_prob_type('r')
>>> text_data = "to be or not to be: that is the question"
>>> freq1 = lea.vals(*text_data)
>>> freq1
: 9/40
: : 1/40
a : 1/40
b : 2/40
e : 4/40
h : 2/40
i : 2/40
n : 2/40
o : 5/40
q : 1/40
r : 1/40
s : 2/40
t : 7/40
u : 1/40


Generally speaking, the larger the data, the more sensible the result will be (this is the assumption made with "Big Data"!). We could now easily produce random sample strings from that distribution.

>>> ''.join(freq1.random(100))
'oshheeo a ttt  ubt  ee o      ontettisboe t: hb nis    t bhoo nee bs o ttahsbhtne ttoa  o  oth   ete'


We get something silly, although it tends to respect letter/punctuation frequencies.

Frequencies of group of two letters can be easily calculated also, using the pairwise function (see implementation in Python’s "Itertools Recipes":

>>> freq2 = lea.vals(*pairwise(text_data))


Then, you could examine the frequencies of transitions from a 't':

>>> freq2.given(freq2[0] == 't')
('t', ' ') : 2/7
('t', 'h') : 2/7
('t', 'i') : 1/7
('t', 'o') : 2/7


... or to a 't':

>>> freq2.given(freq2[1] == 't')
(' ', 't') : 3/6
('a', 't') : 1/6
('o', 't') : 1/6
('s', 't') : 1/6


To produce again random samples, we have first to transform the tuple pairs into 2-letters strings (e.g. ('t', 'h') -> 'th'). Here is the trick to do that:

>>> freq2s = freq2.map(lambda v: ''.join(v))


Then we are able to produce a random sample as before:

>>> ''.join(freq2s.random(50))
't  b ns thuest: ue t nhais tt tht  tth bhaquioio t t tate: bnoe:uebeo :  t: r e heorbes titht e:uet '


This is still silly, but you can see that things improved a bit: for instance, you have no more repetitions of more than two whitespaces or more than two 't'.

A better approach to study transitions between letters is to build a Markov chain from the given data, as already explained here:

>>> mc = markov.chain_from_seq(text_data)
>>> mc

-> b : 2/9
-> i : 1/9
-> n : 1/9
-> o : 1/9
-> q : 1/9
-> t : 3/9
:
->   : 1
a
-> t : 1
b
-> e : 1
e
->   : 2/4
-> : : 1/4
-> s : 1/4
h
-> a : 1/2
-> e : 1/2
i
-> o : 1/2
-> s : 1/2
n
-> o : 1
o
->   : 2/5
-> n : 1/5
-> r : 1/5
-> t : 1/5
q
-> u : 1
r
->   : 11
s
->   : 1/2
-> t : 1/2
t
->   : 2/7
-> h : 2/7
-> i : 1/7
-> o : 2/7
u
-> e : 1


Note the difference from the previous case: freq2 captures probabilities of two consecutive letters (e.g. P('es') = 1/39), while mc captures conditional probabilities of transitions from one letter to another (e.g. P('s'|'e') = 1/4). The two approaches are then very different. Let's now generate a random string from this Markov chain (we arbitrarily start from an 'a'):

>>> ''.join(mc.get_state('a').random_seq(100))
'thato be be tistis nonononor is nor the: or t tiot to que be that thest to t t no tonor que no be qu'


You can see that this is quite different from what has been produced before; it's now something more like a baby who's babbling alone. The system modestly learned how to babble from the Shakespearian sentence that we gave. We see, in particular, that it knows that a colon is always followed by a whitespace; we see also that, due to the small size of input text sample, it suffers limitations due to over-generalization such that following any 'b' by a 'e', etc.

Should you like that kind of linguistic experience, then you are kindly invited to have a look at the Bullshit Generator on the Examples page.

## Building a Bayesian network from a joint table

If you reconsider the examples of data mining on a joint table, now interpreting the frequencies as probabilities, then you get another example of ML through frequentist approach. Admittedly, such simple statistical approach could be qualified as "Machine Learning for the layman"! A more difficult goal is to calculate the probabilities involved in a Bayesian network. Lea provides a method for this purpose; it can calculate the CPT entries provided that you define the dependency graph of the BN. This method, named Lea.build_bn_from_joint, makes the following assumptions:

1. you study a set of given N discrete random variables, which are possibly interdependent and for which the pmf are unknown;
2. you have got a joint table with S samples given the values of each of the N variables (i.e. a table of S rows and N columns); we assume that the these data are rich enough to discover all values of the domain of each random variable;
3. you have defined the dependencies of the BN, by means of a direct acyclic graph linking the random variables.

Let's illustrate these assumptions on an example that reuses the patient joint table:

patient = lea.read_csv_file('patients.csv',col_names=('givenName','surname','gender','title','birthday','blood_type','weight{f}','height{f}','smoker{b}') )


Let's now consider the three assumptions above one by one:

1. we shall focus on the five random variables gender, blood_type, smoker, height and weight (N = 5);
2. these 5 data are present in the patient joint table, which contains fifty records (S = 50);
3. we assume a BN with the following dependencies between the five random elements:
                   gender  blood_type
|  \      /
|   \    /
height  |   smoker
\   |    /
\  |   /
weight


The build_bn_from_joint method allows "learning" the different CPT of this BN from the data present in the patient table. Each argument of Lea.build_bn_from_joint is a tuple giving the names of antecedent nodes and the name of the dependent node:

bn = patient.build_bn_from_joint(
( ('gender','blood_type')     , 'smoker' ),
( ('gender','height','smoker'), 'weight' ))


Now, bn is a full-fledged BN on which you can make queries, such a calculating the gender probability of a person given that its weight is between 110 and 120 (not included):

>>> bn.gender.given(110 <= bn.weight, bn.weight < 120)
female : 0.21234352561454817
male   : 0.7876564743854518
>>> P((bn.gender=='male').given(110 <= bn.weight, bn.weight < 120))
0.7876564743854518


You may wonder: What is the gain of building a BN, while the initial joint table already offers the ability to make the very same query? There are two answers to this question. The first one concerns the data size: the space required to store BN data is far smaller than the space required to store the full joint table; to some extent, the BN makes a compression of the joint table. The second answer is that the joint table stores actual raw data while the BN captures the dependency patterns. This phenomenon, known as overfitting, is blatant if we redo the query seen before on the patient joint table instead of the built-up BN:

>>> P((patient.gender=='male').given(110 <= patient.weight, patient.weight < 120))
1.0


Here, with the joint table, the given evidence on patient's weight makes the gender certain. This result is dubious but it is in line with the joint table:

>>> patient.given(110 <= patient.weight, patient.weight < 120)
givenName, surname   , gender, title, birthday    , blood_type, weight, height, smoker
('Jimmie' , 'Martinez', 'male', 'Mr.', '03-12-1930', 'B+'      ,  114.0,  171.0, False ) : 1.0


There is only one patient in the given weight range, who is incidentally a man. This is a trivial instance of overfitting. In this present case, the BN provides a more nuanced answer, which sounds more sensible.

### Improving the Bayesian network

Of course, the merits of BN are directly linked to the quality of the dependency graph designed by the human expert who models the phenomenon. In this regard, the BN graph given above could for sure be improved. To illustrate some advanced techniques, let us revisit the BN by including the patient's age: it seems sensible to add age as height's influencer, as shown in the following graph:

         age       gender  blood_type
\      /  |  \      /
\    /   |   \    /
height   |   smoker
\    |    /
\   |   /
weight


Now, the problem is that age is not an attribute of the patient joint table. Only birthday is available, which is a string, hence not adequate for a BN variable. As seen previously, assuming the year of the study is 2015, the age could be approximated with the following definition:

age = 2015 – patient.birthday[-4:].map(int)


Based on this definition, the joint table can now be augmented with the new column. Since each patient row is a named tuple, the trick is to convert age as a tuple through the map method, to append it to the patient tuples and then to use the as_Joint method to redefine the column names. This could be done through the following one-liner:

patient2 = (patient + age.map(lambda x:(x,))).as_joint('givenName','surname','gender','title','birthday','blood_type','weight','height','smoker','age')


You could check that each row of patient2 joint table contains a new column age, which is in line with birthday. Now that we have all the data needed, we can build the new BN, as sketched above:

bn2 = patient2.build_bn_from_joint(
( ('gender','age')            , 'height'),
( ('gender','blood_type')     , 'smoker'),
( ('gender','height','smoker'), 'weight'))


This new BN gives different results than the previous BN, which ignored the dependency on age:

>>> P((bn2.gender=='male').given(110 <= bn2.weight, bn2.weight < 120))
0.8397314844204974


We then see the impact of the added dependency of age to height, even if these two variables are absent from the query.

# Information Theory

Lea allows calculating several measures defined by the information theory. These could provide new insights on your probability distributions.

## Information

The most basic concept is the information (in bits) carried by a given event. As a basic example, consider two coins, a fair one and an unfair one :

>>> flip = lea.vals('head','tail')
>>> flip_u = lea.pmf({'head': 1/4, 'tail': 3/4})


Here is how to get the information carried by each possible result, after flipping the fair coin :

>>> flip.information_of('head')
1.0
>>> flip.information_of('tail')
1.0


Each result carries an information of 1 bit, as expected by the theory. Now, here are the information values for the unfair coin :

>>> flip_u.information_of('head')
2.0
>>> flip_u.information_of('tail')
0.4150374992788437


We see that the information carried by a 'head' result is higher (2 bits) than the information carried by 'tail'. This is sensible since a 'head' is less likely to occur (prob. 1/4) than a 'tail' (prob. 3/4), hence it carries more information.

For boolean probability distributions, note that .information_of(True) can be replaced by .information, which is a convenient shortcut. Example:

>>> rain = lea.event(1/8)
>>> rain.information
3.0
2.0


## Entropy

If you take the average information carried by all events modeled by a given probability distribution, using probabilities of these events as weights, you get the entropy of this distribution. The entropy is one of the most important concept of information theory. It captures the "degree of randomness" of a given probability distribution.

Here is how to get the entropy of each defined coin:

>>> flip.entropy
1.0
>>> flip_u.entropy
0.8112781244591328


The entropy of the fair coin is 1 bit, which is not surprising since we have seen that each possible outcome brings 1 bit of information. The entropy of the unfair coin is about 0.81 bits; this is less than 1 bit, which is consistent with the fact that the unfair coin is less random than the fair one. Note that the theory states that 1 bit is the maximum entropy for a distribution with two values; it happens when the two values are equally likely.

## Entropy and conditional probabilities

The information and entropy can be calculated on any Lea instance, whether defined explicitly as above or resulting of calculations, as explained throughout the tutorials. In particular, these indicators can be calculated on conditional probability distributions.

Let us take an example. There are 64 balls in an urn. 62 of these balls are blue and marked with a "x"; 2 of these balls are red: one is marked with an "x", the other with "y". Here is a short model of the distribution of these balls:

>>> ball = lea.pmf({ 'Bx': 62, 'Rx': 1, 'Ry': 1 },prob_type='r')
>>> ball
Bx : 62/64
Rx :  1/64
Ry :  1/64


Note that it's easy to extract the color code or the mark, using the expression ball[0] or ball[1] respectively (see "Indexing and slicing"). Let us create two new variables to ease the work:

>>> color = ball[0]
>>> mark = ball[1]
>>> color
B : 31/32
R :  1/32
>>> mark
x : 63/64
y :  1/64


Let us calculate the entropy of ball:

>>> ball.entropy
0.23187232431271465


It is quite low since it's almost certain that a drawn ball is Blue, with a "x" mark.

Suppose now that someone draws a ball and tell us some information on its color or mark (remember that the three variables ball, color and mark are interdependent due to referential consistency). Let us calculate the new entropy in the following cases:

• What is the entropy if the ball's mark is "x"?
>>> ball.given(mark=='x').entropy
0.11759466565886476

• What is the entropy if the ball's mark is "y"?
>>> ball.given(mark=='y').entropy
0.0

• What is the entropy if the ball's color is blue?
>>> ball.given(color=='B').entropy
0.0

• What is the entropy if the ball's color is red?
>>> ball.given(color=='R').entropy
1.0


We see in the three first cases that the given information lower the entropy, considering the initial value of 0.23 bits. In two cases, the entropy becomes null: this happens because the new information dissolve any uncertainty on the mark and color. In the last case however, the information that the ball is red increases the entropy to 1 bit. This value is consistent with the uncertainty remaining on the mark of the ball, which is equally likely "x" or "y". It may sound odd that new information makes the entropy increase. Actually, there's nothing wrong here, it's simply unusual. The initial unbalanced distribution is such that the reception of unlikely information ("the ball is red") can reduce the prior beliefs we had.

## Mutual information, joint and conditional entropy

When dealing with two dependent random variables, it's interesting to know the amount of information they share. This is the purpose of the mutual information, aka transinformation. The examples below use the ball, color and mark interdependent distributions of our last case.

>>> lea.mutual_information(color,mark)
0.08486507530476972
>>> lea.mutual_information(color,ball)
0.20062232431271465
>>> lea.mutual_information(mark,ball)
0.11611507530476972


Note that the mutual information is expressed in bits.

As a stupid example, we can verify that our ball lottery and coin flipping do not share anything:

>>> lea.mutual_information(ball,flip)
0.0


Note that the mutual information is symmetric, so the method call is unaffected by the order of its arguments.

Another interesting measure is the joint entropy of a set of random variables. It gives the entropy of the joint of these random variables, taking into account possible interdependence between them. Here are two basic examples.

>>> lea.joint_entropy(mark,color)
0.23187232431271465
>>> lea.joint_entropy(mark,flip)
1.1161150753047697


The first result shows that the joint entropy of mark and color is the same as the basic entropy of ball; this is not surprising: a given ball is completely described by its mark and color, so there has exactly the same level information that the couple of these two characteristics. In the second example, we join mark and flip, since these are independent events, we get the sum of the individual entropies (0.1161 bits + 1.0 bits). You can specify any number of arguments in lea.joint_entropy. Note that if you specify only one argument, then you get the basic entropy of this argument.

A last measure is the conditional entropy: what is the entropy of x given that y is true? This is also known as equivocation of x about y. Here are two basic examples of usage:

>>> ball.cond_entropy(mark)
0.11575724900794494
>>> mark.cond_entropy(ball)
0.0


The first result shows that the (unconditional) entropy of ball, previously calculated as 0.2319 bits, decreases to 0.1158 bits given that we know the mark on the ball. The second result shows that the entropy of the mark is null given that we know the drawn ball: this is consistent since there is no uncertainty on the mark given a ball. Note that specifying True (or any relationship that is certainly true) as argument of cond_entropy boils down to calculate the basic entropy.

## Likelihood ratio (LR)

The likelihood ratio of an evidence E given an hypothesis H is defined as

        P(E | H)
LR = --------------
P(E | not H)


It indicates how far the given evidence E confirms the given hypothesis H. It is traditionally used in medical research and forensic sciences.

As a toy problem, imagine that a die has been thrown but we have very limited ways to gather information on the result. Let us assume that our hypothesis is "the die result is even". After investigations, we get an evidence: the result is not 1. What is the LR of this evidence for our hypothesis? The lea.lr(E,H) syntax makes this calculation:

>>> die = lea.interval(1,6)
>>> parity = die.map(lambda x: 'even' if x%2==0 else 'odd')
>>> lea.lr(die != 1, parity == 'even')
1.5


You could check this result, by translating the LR definition above in other Lea primitives:

>>> P((die != 1).given(parity == 'even')) / P((die != 1).given(parity != 'even'))
1.5


Since LR > 1, the given evidence is indeed somehow relevant for the hypothesis (as our intuition suggests!).

Notes: the following rules are enforced, in this sequence:

1. if E is impossible, then an exception is raised (LR = 0/0)
2. if H is impossible, then an exception is raised (P(E | H) undefined)
3. if H is certain, then an exception is raised (P(E | not H) undefined)
4. if H is impossible given E, then LR = 0.0
5. if H is certain given E, then LR = +infinity (Python's math.inf)

Examples:

>>> lea.lr(die == 3, parity == 'even')
0.0
>>> lea.lr(die == 2, parity == 'even')
inf


There is another syntax to calculate the LR: E.given(H).lr(); this flavor emphasizes the numerator of the LR expression and provides same result as the syntax seen above. It may be handier to express an hypothesis that is a conjunction of several sub-hypothesis, which can be put as arguments of given(…):

>>> (die >= 2).given(parity == 'even').lr()
1.5
>>> (die >= 2).given(parity == 'even', die <= 5).lr()
1.3333333333333333


# Symbolic computation with probabilities

We have seen that Lea can handle probabilities expressed as variable names (p, q, …), provided that the SymPy library is installed. This approach is commonly referred as symbolic computation. All the techniques for probabilistic calculation seen above in the tutorial can be revisited with probability variables in place of probability figures.

Such approach is useful when the same query is made over and over on the same complex model, with only varying prior probability values. In such case, the query result may be prepared "offline" by Lea, once for all, into an arithmetic expression; this may take maybe a long processing time; then, this expression can be copy-pasted in a dedicated program to be evaluated many times using fast arithmetic computations, with different input probability values. This dedicated program could be written in any programming language: Python, C, C++, Java, etc, without the need of any probability library.

Let's start with basic examples: let's call p the probability that number of heads obtained by flipping one coin and q the probability that it shall rain tomorrow:

>>> flip = lea.bernoulli('p')
>>> rain = lea.event('q')
>>> flip
1 : p
0 : -p + 1
>>> rain
True  : q
False : -q + 1


We can now combine these two independent variables to calculate probabilities of events, defined by conjunction or disjunctions:

>>> (flip == 0) & rain
False : p*q - q + 1
True  : -q*(p - 1)
>>> (flip == 0) | rain
False : -p*(q - 1)
True  : p*q - p + 1


Since these probability distributions have a Boolean domain, we can get the probability of True by using the P function.

>>> P = lea.P
>>> P((flip == 0) | rain)
p*q - p + 1


The object returned by P is a SymPy expression (more precisely, an instance of some subclass of SymPy's Expr). You can then apply methods provided by SymPy, like substituting some variable(s) using the subs method:

>>> P((flip == 0) | rain).subs('p',0.5)
0.5*q + 0.5
>>> P((flip == 0) | rain).subs((('p',0.5),('q',0.2)))
0.600000000000000
>>> P((flip == 0) | rain).subs('p','q')
q**2 - q + 1


Beside booleans, we can also calculate probability distributions.

>>> flip - lea.bernoulli('q')
-1 : -q*(p - 1)
0 : 2*p*q - p - q + 1
1 : -p*(q - 1)
>>> flip4 = flip.times(4)
>>> flip4
0 : (p - 1)**4
1 : -4*p*(p - 1)**3
2 : 6*p**2*(p - 1)**2
3 : -4*p**3*(p - 1)
4 : p**4


Remind that flip4 probability distribution is also known as the binomial distribution, which you can build using a dedicated function:

>>> flip4 = lea.binom(4,'p')


You can then evaluate the probability of some conditions using the usual Lea syntax:

>>> P((1 <= flip4) & (flip4 < 4))
-2*p*(p - 1)*(p**2 - p + 2)
>>> P(flip4.is_any_of(1,2,3))
-2*p*(p - 1)*(p**2 - p + 2)


and obtain standard indicators:

>>> flip4.mean
4*p
>>> flip4.var
-4*p*(p - 1)
>>> flip4.entropy
(-p**4*log(p**4) + 4*p**3*(p - 1)*log(-4*p**3*(p - 1)) - 6*p**2*(p - 1)**2*log(6*p**2*(p - 1)**2) + 4*p*(p - 1)**3*log(-4*p*(p - 1)**3) - (p - 1)**4*log((p - 1)**4))/log(2)


Note that all these expressions are calculated, using the same algorithms than those used for usual numerical probabilities. Lea just calls the sympy.factor function as a post-treatment in order to simplify the resulting expressions as much as possible. You can notice that the calculated mean 4*p agrees with the theoretical formula for the mean of the binomial.

To make variable substitution in a probability distribution that is non-boolean, you simply have to call subs method on that distribution: this call is propagated, with the given arguments, to all inner probability expressions:

>>> flip4.subs('p',0.25)
0 : 0.316406250000000
1 : 0.421875000000000
2 : 0.210937500000000
3 : 0.0468750000000000
4 : 0.00390625000000000


Actually, you could check that this last distribution boils to the same distribution as

>>> lea.binom(4,0.25)


The sole difference is that the probability type is slightly different between the two distributions: it's SymPy.Float for the former while it's Python's float for the latter.

So far, we have defined symbolic probabilities by means of strings of characters. Behind the scene, these strings are converted to SymPy symbols thanks to sympy.symbols function. In order to have better control on these variables, you could also build these symbols yourself and pass them to Lea constructors in place of strings. Here is an example:

>>> import sympy
>>> (p,q) = sympy.symbols('p q')
>>> lea.binom(2,p) + lea.binom(2,q)
0 : (p - 1)**2*(q - 1)**2
1 : -2*(p - 1)*(q - 1)*(2*p*q - p - q)
2 : 6*p**2*q**2 - 6*p**2*q + p**2 - 6*p*q**2 + 4*p*q + q**2
3 : -2*p*q*(2*p*q - p - q)
4 : p**2*q**2


So far, we have not yet defined a probability distribution using the pmf function. Let's give it a try:

>>> lea.pmf({"A": 'pA', "B": 'pB', "C": 'pC'})
A : pA/(pA + pB + pC)
B : pB/(pA + pB + pC)
C : pC/(pA + pB + pC)


You can see that the probability variables have been normalized to ensure that the sum equals 1. This is somehow unpleasant because there is actually 2 degrees of freedom, not 3. You could then prefer to remove pC as it can be derived from pA and pB. As explained, in the section dedicated to normalization, replacing pC by None will do the job:

>>> lea.pmf({"A": 'pA', "B": 'pB', "C": None})
A : pA
B : pB
C : -pA - pB + 1


Note that you could achieve the same result using a more explicit approach:

>>> (pA,pB) = sympy.symbols('pA pB')
>>> lea.pmf({"A": pA, "B": pB, "C": 1-(pA+pB)})


Symbolic computation can handle more complex problems like conditional probabilities and Bayes reasoning. Let us revisit our first example of Bayes network, replacing all the 6 probability values by variables:

burglary   = lea.event('a')
earthquake = lea.event('b')
alarm = lea.joint(burglary,earthquake).switch({ (True ,True ) : lea.event('c'),
(True ,False) : lea.event('d'),
(False,True ) : lea.event('e'),
(False,False) : lea.event('f') })
john_calls = alarm.switch({ True  : lea.event('g'),
False : lea.event('h') })
mary_calls = alarm.switch({ True  : lea.event('i'),
False : lea.event('j') })


We can now query the BN as before, getting formulas instead of numbers:

>>> pb = P(burglary.given(mary_calls & john_calls))
>>> pb
a*(b*c*g*i - b*c*h*j - b*d*g*i + b*d*h*j + d*g*i - d*h*j + h*j)/(a*b*c*g*i - a*b*c*h*j - a*b*d*g*i + a*b*d*h*j - a*b*e*g*i + a*b*e*h*j + a*b*f*g*i - a*b*f*h*j + a*d*g*i - a*d*h*j - a*f*g*i + a*f*h*j + b*e*g*i - b*e*h*j - b*f*g*i + b*f*h*j + f*g*i - f*h*j + h*j)


To gain confidence in this long formula, you could substitute the 6 variables by their actual values given earlier and check the result:

>>> values = (('a',0.001),('b',0.002),('c',0.95),('d',0.94),('e',0.29),('f',0.001),('g',0.9),('h',0.05),('i',0.7),('j',0.01))
>>> pb.subs(values)
0.284171835364393


You can similarly check results provided for the other queries, by simply appending .subs(values) on them.

As you can expect it from a symbolic computation system, you are free to fix only a subset of variables: you will get then a new formula with the remaining free variables:

>>> pb.subs((('a',0.001),('b',0.002),('c',0.95),('d',0.94),('e',0.29),('f',0.001)))
0.001*(0.94002*g*i + 0.05998*h*j)/(0.002516442*g*i + 0.997483558*h*j)


As explained in the introduction, the returned formula could be copy-pasted in a program, which is completely unaware of Bayes networks but which is fast for number-crunching. Note also that SymPy allows reformatting such formula into various formats, like LaTeX:

 >>> print(latex(pb))
\frac{a \left(b c g i - b c h j - b d g i + b d h j + d g i - d h j + h j\right)}{a b c g i - a b c h j - a b d g i + a b d h j - a b e g i + a b e h j + a b f g i - a b f h j + a d g i - a d h j - a f g i + a f h j + b e g i - b e h j - b f g i + b f h j + f g i - f h j + h j}


You can then easily embed nice formula in your papers:

# Monte-Carlo sampling

## Random generation through MC (random_mc method)

We have seen in the first part of the tutorial how to generate random samples with the .random(…) method. An important point must be known on the way this method works. Consider a set of 9 different "dice", with number of faces ranging from 2 to 10. Let us make 20 throws of these 9 dice and get the sum:

>>> dice = tuple(lea.interval(1,n) for n in range(2,11))
>>> sum_dice = sum(dice)
>>> sum_dice.random(20)
(32, 26, 33, 31, 28, 39, 32, 31, 38, 26, 26, 33, 33, 27, 26, 27, 33, 28, 47, 25)


You will note that the generation of the random sample takes some times. Why? Actually, what happens is that, behind the scene, the actual probability distribution of sum_dice is calculated (a heavy combinatorial process), which produces a resulting probability distribution; then, from this instance, the 20 random samples are obtained. So, what actually happens is equivalent to this:

>>> _alea1 = sum_dice.get_alea()
>>> _alea1.random(20)
(32, 26, 33, 31, 28, 39, 32, 31, 38, 26, 26, 33, 33, 27, 26, 27, 33, 28, 47, 25)


The heavy calculation is done by the get_alea() calls. Note that, because of a cache mechanism, any further call to sum_dice.random(...) shall be very fast.

In some cases, using this approach just to get random samples is intractable because of combinatorial explosion. The random_mc(…) method proposes an alternative implementation, which tackles this limitation. Let's revisiting the same example:

>>> sum_dice2 = sum(dice)
>>> sum_dice2.random_mc(20)
(37, 35, 38, 34, 29, 31, 32, 21, 31, 30, 35, 39, 35, 40, 31, 22, 30, 29, 39, 31)


This now produces random samples without any latency. How does it work? random_mc does not try to calculate the final probability distribution; it just make random sample on each elementary distribution, then it aggregates the results together. In the current case, for each of the 20 trials, it throws the 9 dice and it sums up the results. This is closer to the real process since it's basically a simulation of this process.

Take care that this example does NOT entail that the random_mc method is the real McCoy. Both random and random_mc methods are complimentary:

• random tends to be better when the end values result from long calculations and/or when the combinatorial is affordable and/or the random sample to produce is large;
• random_mc tends to be better when the end values result from short calculations and/or when the combinatorial is voluminous and/or the random sample to produce is small.

## Estimating probability distribution by MC sampling (estimate_mc method)

The random_mc(…) method may serve another purpose than pure random sample generation. In the case of calculation on large combinatorial, it may be used to estimate the resulting probability distribution based on random sample with a given size. It offers then an alternative to the exact algorithm used so far, which maybe prohibitively long.

For instance, here is how to estimate the probability distribution of the sum of 9 dice seen above, based on a sample of 10000 trials:

>>> estimated_sum_dice = lea.vals(*sum_dice2.random_mc(10000))
>>> estimated_sum_dice
13 : 0.0002
14 : 0.0002
15 : 0.0007
16 : 0.0016
17 : 0.0017
18 : 0.0039
19 : 0.007
…


One of the interest is that you can tune the sample size as you wish according to the precision you want. Since this approach is very useful, Lea provides a convenience method to do the job more easily, namely .estimate_mc(…). The following

>>> estimated_sum_dice  = sum_dice2.estimate_mc(10000)


produces an estimation similar as the one shown above.

This method of estimating a probability distribution by MC sampling is very useful in many situations having large combinatorial size, like querying a complex Bayesian network.

# How to optimize calculations

TO BE WRITTEN

• use floating-point representation
• use symbolic computation
• use times method
• use multiple arguments in given method
• explicit bindings
• replace cpt by switch or if_ method
• calculate and save independent results
• use MC sampling

# The Statues algorithm

The calculation of probabilities in Lea is made by an original algorithm, named the Statues algorithm. Should you need more details, references are provided at the end of the section.

Note that the name "Statues" is borrowed from the popular Statues game; other names include "Red Light, Green Light" (US), "Grandmother's Footsteps" (UK), "1-2-3, SOLEIL!" (France) and "1-2-3, PIANO!" (Belgium).

## The What…

The Statues algorithm performs exact probabilistic inference, by opposition to approximate algorithms like the MC sampling seen above. This algorithm is limited strictly to discrete probabilities.

Contrasting to several well-established exact algorithms, the Statues algorithm is able to deal with several probabilistic modeling approaches: as seen in the present tutorial, it can handle in particular probabilistic arithmetic, conditional probabilities, JPD, CPT, BN, Markov chains. Also, different modeling approaches can be seamlessly mixed together to provide more expressive models. For instance, assuming two BN variables x, y represent numbers, a new variable z = x + y can be created to represent their sum; this new variable z could then be queried to get its probability distribution given some evidence; also, z could be used itself to express some evidence. So, the Statues algorithm has the ability to unify several probabilistic modeling approaches.

Before going further, it's important to understand what are the input and output of the algorithm.

• In input, it requires a given node from a given direct acyclic graph (DAG); this DAG defines how each variable depends from the others, up to "atomic" variables that are defined by simple explicit pmf.

• In output, the algorithm returns the pmf of the given node.

The dependency relationships expressed in the input DAG can be arithmetic operators, logical operators, user-defined functions, CPT constructs, etc. This DAG is constructed on-the-fly by Lea, starting from prior pmf and following the different techniques shown in the present tutorial. Note that the DAG construction is out of the scope of the Statues algorithm; it's just a pretreatment process that builds a data structure, following recursive compositional rules. Sometimes, this is referred as "lazy evaluation" because the actual evaluation is done later, when needed.

If any node has no more than one parent, then the DAG is actually a tree, where the leaf nodes are explicit prior pmf. In such case, a recursive algorithm building a pmf from the pmfs calculated on each child nodes can do the job. Such case is specific to situations where any variable occur only at one place and influence only one other variable. Note that the very earliest prototype of Lea worked with such algorithm, which was fine for basic probabilistic arithmetic. In the general situations (e.g. given conditions), this constraint is not respected and the recursive algorithm mentioned above give wrong results because it does not enforce referential consistency: one given node referenced by two parent nodes will be seen as two independent events, although they share the same pmf. Here comes the "Statues" algorithm...

## The How…

The Statues algorithm actually starts when calculating the pmf of a given node is required. This happens when you ask to display a variable or expression; this happens also when you call the P function on such variable or expression.

How does it work?

Here are the most important features. The Statues algorithm uses generators instead of classical functions - in Python, a generator is, roughly speaking, a function containing yield statement(s) -. Instead of asking children to calculate and return their pmf, the nodes ask their children to browse the possible variable bindings one by one and to generate the associated probabilities, ensuring the consistency at any stage. More precisely, any node requests its children nodes to generate atoms: for a given node X, the generated atoms are (v,p) pairs such that the sum of all p for a given v equals the probability P(X = v). If all children obey this rule, then any node can in turn apply simple rules to generate its own atoms (u,q) from the atoms it receives. In particular,

• a node representing the addition X+Y shall get any atom (x,p) from X and any atom (y,q) from Y and generates in turn the atom (x+y,pq);
• a node representing X | Y (i.e. X given Y) shall get any atom (y, q) from Y; then, only if y is true, it shall get the next atom (x,p) of X and generate the atom (x,pq);
• etc.

Finally, the (u,q) atoms received by the top-level node (input query) are aggregated by summing the probabilities value by value, producing the expected pmf. The generators created recursively on the nodes allows keeping the binding of any variable consistent. In contrast with usual functions, the created generators are alive during the whole calculation and work cooperatively to delivers individual atoms to the queried node, which is the sole node to calculate a pmf.

## References

Admittedly, the explanations above just give you, at the best, a rough idea of the process. If you want a true understanding of the Statues algorithm, you can:

# What's next?

Thank you for reading the present tutorial pages! We hope that you found it enough clear and informative.

You should have now comprehensive information on what Lea can do. You should be able to use it in your projects, researches or as a toolkit for teaching probability theory and probabilistic programming. You can find more examples in the Examples page.

For enhancements or bugs, please create issues on bitbucket Lea page.

Feel free also to send an e-mail to pie.denis@skynet.be (English or French spoken). Use cases of Lea in your projects/researches are especially appreciated! A "references" page listing the projects using Lea could greatly improve the reputation of the library.

Updated