Wiki

Clone wiki

Lea / Lea3_Tutorial_3

leas3_title_3.png

< Tutorial [2/3] Wiki Home Machine Learning >

Table of Content


Introduction

The present page is the third part of the tutorial for the Lea library. It assumes that you have installed Lea 3 and that you are familiar with the techniques presented in the first part of the tutorial. The main topics are mostly independent, so you could choose to jump to your preferred subject without much trouble (see TOC above).

Other functions and features

Plotting histograms (histo, plot methods)

The plot method (matplotlib)

If you've installed the matplotlib library, you can very easily plot the probability mass function of a given Lea instance. We have already seen that all you have to do is invoking the plot() method on any Lea instance. By default the plot() method draws the pmf in a dedicated window. Several arguments can be specified to customize this behavior, including saving the chart in an image file. Here is the full signature, with default values:

your_lea_instance.plot(title=None,fname=None,savefig_args=dict(),**bar_args)
argument description
title a string giving the title to display (no title if None)
fname if not None, then the chart is saved in a file specified by given fname, as described in matplotlib.pyplot.savefig function
savefig_args (used only if fname is not None) a dictionary relayed to matplotlib.pyplot.savefig function and containing named arguments expected by this function
bar_args named arguments relayed to matplotlib.pyplot.bar function

Example:

from lea.leaf import dice
dice(3,6).plot(title='Three fair dice',fname='pmf_3d6.png',savefig_args=dict(bbox_inches='tight'),color='lightblue')

pmf_3d6.png

The histo and as_string methods

If you haven't installed the matplotlib library, there are alternative ways to plot histograms:

  • use another plotting package: this requires to extract pmf data, which is the subject of the next section,
  • or display text-oriented histograms, directly on your console: this is the subject of the present section.

The histo method returns a string that allows displaying the pmf as a textual histogram:

>>> print(dice(3,6).histo())
 3 :
 4 :  -
 5 :  ---
 6 :  -----
 7 :  -------
 8 :  ----------
 9 :  ------------
10 :  -------------
11 :  -------------
12 :  ------------
13 :  ----------
14 :  -------
15 :  -----
16 :  ---
17 :  -
18 :

If you want to scale this histogram you can pass a percentage argument: for instance .histo(200) shall double the default size.

For advanced display control, there exists a more powerful formatting method: as_string(…). It allows you formatting together the probability numbers and the histogram. The first argument is a string of 1 or 2 characters: the first character is '.' for float, '%' for percentage or '/' for fraction format. If you append '-', then you get the histogram just on the right of the number. Also, there are two more optional arguments for specifying the number of decimals required (nb_decimals with default: 6) and the scale of the histogram (histo_size with default: 100). Here are some examples of what you can do.

>>> print (dice(3,6).as_string('%',nb_decimals=2))
 3 :   0.46 %
 4 :   1.39 %
 5 :   2.78 %
 6 :   4.63 %
 7 :   6.94 %
 8 :   9.72 %
 9 :  11.57 %
10 :  12.50 %
11 :  12.50 %
12 :  11.57 %
13 :   9.72 %
14 :   6.94 %
15 :   4.63 %
16 :   2.78 %
17 :   1.39 %
18 :   0.46 %
>>> print (dice(3,6,'r').as_string('/-',histo_size=200))
 3 :  1/216 -
 4 :  3/216 ---
 5 :  6/216 ------
 6 : 10/216 ---------
 7 : 15/216 --------------
 8 : 21/216 -------------------
 9 : 25/216 -----------------------
10 : 27/216 -------------------------
11 : 27/216 -------------------------
12 : 25/216 -----------------------
13 : 21/216 -------------------
14 : 15/216 --------------
15 : 10/216 ---------
16 :  6/216 ------
17 :  3/216 ---
18 :  1/216 -
>>> print (dice(3,6).as_string('.-',nb_decimals=6,histo_size=300))
 3 : 0.004630 -
 4 : 0.013889 ----
 5 : 0.027778 --------
 6 : 0.046296 --------------
 7 : 0.069444 ---------------------
 8 : 0.097222 -----------------------------
 9 : 0.115741 -----------------------------------
10 : 0.125000 --------------------------------------
11 : 0.125000 --------------------------------------
12 : 0.115741 -----------------------------------
13 : 0.097222 -----------------------------
14 : 0.069444 ---------------------
15 : 0.046296 --------------
16 : 0.027778 --------
17 : 0.013889 ----
18 : 0.004630 -

Mode and maximum a posteriori estimation (MAP)

In some situations, you are just interested in the values that have the highest probability in a given probability distribution. This is traditionally known under the term "mode". On Lea instances, you can get these values as a tuple, using the mode attribute. Here are some example showing the mode associated to throws of 1, 2 and 3 dice:

>>> d1 = interval(1,6)
>>> d1.mode
(1, 2, 3, 4, 5, 6)
>>> d1.times(2).mode
(7,)
>>> d1.times(3).mode
(10, 11)

The mode is useful in particular to determine a maximum a posteriori estimation (MAP), which consists in finding the most likely assignments of a given set of variables given some evidence. Consider for example the Bayesian network presented earlier. A MAP query could be: "What are the most likely states for burglary, earthquake and john_calls variables, given that we know that Mary called?"

>>> lea.joint(burglary,earthquake,john_calls).given(mary_calls).mode
((False, False, False),)

The answer is: burglary = earthquake = john_calls = False, which means in clear that Mary actually called without worrying reasons, just to take news. Here are some other MAP queries for the same three variables, given other evidences:

>>> lea.joint(burglary,earthquake,john_calls).given(burglary|earthquake).mode
((False, True, False),)
>>> lea.joint(burglary,earthquake,john_calls).given(mary_calls,burglary|earthquake).mode
((True, False, True),)

Drawing with/without replacement (draw method)

In several situations, we are interested in a sequence of values obtained from the same random source: lottery, card drawing, dice throws, etc. These sequences can be defined using different protocols:

  • with or without replacement: whether the drawn value is withdrawn from the remaining values (no duplicates)
  • sorted or not: whether each sequence shall be sorted or not (i.e. order of drawings is irrelevant or not).

The draw method addresses such need, supporting the different protocols through two Boolean arguments. The signature is as follows:

leaInstance.draw(n,sorted=False,replacement=False)
argument description
n the number of objects to draw, each value of the resulting distribution will be a n-tuple
sorted boolean: True means that the order of drawing is irrelevant and the tuples are arbitrarily sorted by increasing order (default: False)
replacement boolean: True means that the drawing is made WITH replacement so the same element may occur several times in each tuple (default: False)

The two boolean arguments sorted and replacement can be used independently each from each other, leading to four cases. The best way to understand the meaning of these arguments is to show the result obtained for each of the four combinations. Let's define a non-uniform distribution with 3 objects:

>>> d = lea.pmf({'A': '3/6', 'B': '2/6', 'C': '1/6'})
>>> d
A : 3/6
B : 2/6
C : 1/6

case 1. No sorting, without replacement

>>> d.draw(2)
('A', 'B') : 20/60
('A', 'C') : 10/60
('B', 'A') : 15/60
('B', 'C') :  5/60
('C', 'A') :  6/60
('C', 'B') :  4/60

case 2. No sorting, with replacement

>>> d.draw(2,replacement=True)
('A', 'A') : 9/36
('A', 'B') : 6/36
('A', 'C') : 3/36
('B', 'A') : 6/36
('B', 'B') : 4/36
('B', 'C') : 2/36
('C', 'A') : 3/36
('C', 'B') : 2/36
('C', 'C') : 1/36

case 3. Sorting, without replacement

>>> d.draw(2,sorted=True)
('A', 'B') : 35/60
('A', 'C') : 16/60
('B', 'C') :  9/60

case 4. Sorting, with replacement

>>> d.draw(2,sorted=True,replacement=True)
('A', 'A') :  9/36
('A', 'B') : 12/36
('A', 'C') :  6/36
('B', 'B') :  4/36
('B', 'C') :  4/36
('C', 'C') :  1/36

Note: The processing of case 3 and 4 (sorted=True) is usually more efficient than the other cases thanks to a special combinatorial algorithm due to Paul Moore. Note however that, in case 3 (replacement=False), this special algorithm will be used ONLY if the probability distribution is uniform (actually, the most efficient algorithm is chosen by the draw method, which checks whether the distribution is uniform or not); this is not the case in the example above but it applies fortunately in most "fair" games.

Revised probability distributions (given_prob method)

The calculation of conditional probabilities using the given method is fine when the information expressed in the condition is 100% sure. In some situations however, this information is uncertain but it can be associated to a precise probability value p. The goal is then to calculate a revised probability distribution so that the given condition has a probability p to occur. Another interpretation of this goal could be that you want to favor some occurrences of the prior probability distribution by re-balancing the probabilities.

The method given_prob(cond,p) serves this purpose, where cond is a condition expressed as a boolean distribution (exactly as given method), p is the probability that cond is true.

As an example, suppose that from a die assumed to be fair initially, we note that it gives eventually the values 5 or 6, with a probability of 1/2. The revised distribution is calculated as follows:

>>> die = lea.interval(1,6,prob_type='r')
>>> revised_die = die1.given_prob(die >= 5,'1/2')
>>> die
1 : 1/8
2 : 1/8
3 : 1/8
4 : 1/8
5 : 2/8
6 : 2/8

The probability that has been set can be verified easily by evaluating the given condition on the new distribution:

>>> P(revised_die >= 5)
1/2

Note that the given_prob method is a generalization of the given method:

  • d.given(cond) is equivalent to d.given_prob(cond,1)
  • d.given(~cond) is equivalent to d.given_prob(cond,0)

Min / max functions (min_of, max_of functions)

In the "Applying functions" section, we have seen how to build a distribution taking the maximum (or minimum) values from other distributions. Lea provides the min_of and max_of methods, which do the same job:

>>> die1 = lea.interval(1,6,prob_type='r')
>>> die2 = die1.new()
>>> lea.min_of(die1,die2)
1 : 11/36
2 :  9/36
3 :  7/36
4 :  5/36
5 :  3/36
6 :  1/36
>>> lea.max_of(die1,die2)
1 :  1/36
2 :  3/36
3 :  5/36
4 :  7/36
5 :  9/36
6 : 11/36

Note however that the performance dramatically degrades when adding more arguments (exponential complexity). Consider for example the following statements that take the minimum value among 7 dice:

>>> dice = [die1.new() for i in range(7)]
>>> lea.min_of(*dice)
1 : 201811/279936
2 :  61741/279936
3 :  14197/279936
4 :   2059/279936
5 :    127/279936
6 :      1/279936

This takes about 10 seconds to calculate (on a "decent" CPU bought in 2013).

Fortunately, there is a way to avoid this by enabling a more efficient algorithm (linear complexity), which is due to Nicky van Foreest (see Nicky's "Stochastic Makespans" page). Simply add the fast=True argument:

>>> lea.min_of(*dice,fast=True)

The result is the same... but it is produced in the blink of an eye! However, there ain't no such thing as a free lunch... Putting fast=True present a small drawback. Unlike most of Lea methods, the probability distributions returned by these two methods lose any dependency with given arguments; in, other words, they do not not comply with referential consistency. This could be important when using expressions with joint or conditional probabilities. For instance, the following results are WRONG:

>>> lea.min_of(die1,die2,fast=True).given(die1 > 4)
1 : 11/36
2 :  9/36
3 :  7/36
4 :  5/36
5 :  3/36
6 :  1/36
>>> P((lea.min_of(die1,die2,fast=True) == 1).given(die1 > 4))
11/36

since each die1 occurrence is considered as an independent event, e.g. in the first expression, the given information die1 > 4 is ignored. To get the right results, simply remove the fast argument or set it to False.

To summarize, use fast=True for best performance but take care to use it in simple expressions not requiring referential consistency, i.e. without other occurrences of inner arguments.

Indexing and slicing

When working with Python's strings, tuples or lists, it is common to extract subparts of these objects by using indexing or slicing operations (i.e. the brackets syntax). When such operation is operated on a Lea instance, it is propagated on the inner values, passing the given (slice) index. The result is a new probability distribution containing the indexed or sliced values.

To give an example, consider the following Lord of the Rings' characters micro-database:

>>> character = lea.vals(('Gimli','dwarf'),('Bilbo','hobbit'),('Frodo','hobbit'),('Meriadoc','hobbit'),
                         ('Pippin','hobbit'),('Gandalf','wizard'),('Boromir','man'),('Faramir','man'),prob_type='r')
>>> character
('Bilbo'   , 'hobbit') : 1/8
('Boromir' , 'man'   ) : 1/8
('Faramir' , 'man'   ) : 1/8
('Frodo'   , 'hobbit') : 1/8
('Gandalf' , 'wizard') : 1/8
('Gimli'   , 'dwarf' ) : 1/8
('Meriadoc', 'hobbit') : 1/8
('Pippin'  , 'hobbit') : 1/8

We can extract the distribution of names and races as follows (remind that first item has index 0 in Python) :

>>> name = character[0]
>>> name
Bilbo    : 1/8
Boromir  : 1/8
Faramir  : 1/8
Frodo    : 1/8
Gandalf  : 1/8
Gimli    : 1/8
Meriadoc : 1/8
Pippin   : 1/8
>>> race = character[1]
>>> race
dwarf  : 1/8
hobbit : 4/8
man    : 2/8
wizard : 1/8

Other examples (aka "SQL, my precious! ") :

  • What is the distribution of the last letter of names?
>>> name[-1]
c : 1/8
f : 1/8
i : 1/8
n : 1/8
o : 2/8
r : 2/8
  • Who are the hobbits? (note how name and race are interdependent variables!)
>>> name.given(race == 'hobbit')
Bilbo    : 1/4
Frodo    : 1/4
Meriadoc : 1/4
Pippin   : 1/4
  • What is the race of those having a name ending with "ir"?
>>> race.given(name[-2:] == 'ir')
man : 1
  • Who is who?
>>> name + ' is a ' + race + '.'
Bilbo is a hobbit.    : 1/8
Boromir is a man.     : 1/8
Faramir is a man.     : 1/8
Frodo is a hobbit.    : 1/8
Gandalf is a wizard.  : 1/8
Gimli is a dwarf.     : 1/8
Meriadoc is a hobbit. : 1/8
Pippin is a hobbit.   : 1/8
  • Who is the Chosen One?
>>> name.random() + ' is the Chosen One.'
'Frodo is the Chosen One.'

Note the character distribution is nothing else than a joint probability distribution (JPD). The indexing technique demonstrated here could be used also to extract specific attributes from the inner tuples. Revisiting the example given in "Defining joint probability distributions, you could skip the definition of jpd variable using the as_joint method and do instead:

toothache = _jpd[0]
catch = _jpd[1]
cavity = _jpd[2]

These three variables shall then be interdependent; these could be used as demonstrated in "Defining joint probability distributions.

Object attributes and methods

The micro-DB example above can be redone using user-defined objects instead of tuples.

class C(object):

    def __init__(self,name,race):
        self.name = name
        self.race = race

    def __str__(self):
        return "%s (%s)"%(self.name,self.race)

    def health_points(self):
        return 1 + hash(self.name+self.race) % 10

character2 = lea.vals(C('Gimli','dwarf'),C('Bilbo','hobbit'),C('Frodo','hobbit'),C('Meriadoc','hobbit'),
                      C('Pippin','hobbit'),C('Gandalf','wizard'),C('Boromir','man'),C('Faramir','man'),prob_type='r')

You get here a distribution having C objects as domain, instead of tuples.

>>> character2
Faramir (man)     : 1/8
Meriadoc (hobbit) : 1/8
Gimli (dwarf)     : 1/8
Pippin (hobbit)   : 1/8
Gandalf (wizard)  : 1/8
Bilbo (hobbit)    : 1/8
Boromir (man)     : 1/8
Frodo (hobbit)    : 1/8

Then, attributes can be extracted using the dot notation directly on the probability distribution.

>>> character.name
Bilbo    : 1/8
Boromir  : 1/8
Faramir  : 1/8
Frodo    : 1/8
Gandalf  : 1/8
Gimli    : 1/8
Meriadoc : 1/8
Pippin   : 1/8
>>> character.race
dwarf  : 1/8
hobbit : 4/8
man    : 2/8
wizard : 1/8

Also, method invocation is feasible; this returns a new probability distribution with the returned results.

>>> character.health_points()
1 : 1/8
3 : 1/8
7 : 2/8
8 : 3/8
9 : 1/8
>>> character.name + ', with HP = ' + character.health_points().map(str)
Bilbo, with HP = 10   : 1/8
Boromir, with HP = 4  : 1/8
Faramir, with HP = 1  : 1/8
Frodo, with HP = 5    : 1/8
Gandalf, with HP = 10 : 1/8
Gimli, with HP = 8    : 1/8
Meriadoc, with HP = 6 : 1/8
Pippin, with HP = 9   : 1/8

Now, the most watchful readers may have noticed an issue with this simple syntax. How Lea can distinguish between his own attributes/methods (like mean, stdev, given, etc) and those to be relayed to the inner objects (like name, race, etc)? Well, the rule is simple: if the attribute/method name is unknown in Lea API, then it is propagated in the inner objects. Of course, this may cause worries in the case of homonyms, i.e. the inner objects have, by mischance, a name defined in Lea API; such attribute is not accessible with the simple technique since it is "hidden" by the Lea API. Should such situation occurs, there exists a trick using Lea's map method and Python getattr function. Assume for example, that the class C above has an attribute called mean. Here is how to get the distribution of this attribute.

>>> character2.map(getattr,('mean',))

The ability to easily relay methods on inner objects a given probability distribution can be used for builtin Python objects. String's upper method provides a trivial example:

>>> lea.set_prob_type('r')
>>> lea.vals(*"alea").upper()
A : 2/4
E : 1/4
L : 1/4

For a more useful example, suppose you throw 3 fair dice and you want to know the probability to get at least two 6's. We have seen how to use the draw method to model the three throws:

>>> die = lea.interval(1,6)
>>> three_dice = die.draw(3,replacement=True)

three_dice contains 6x6x6 = 216 tuples, each containing 3 elements. To get the result, you can directly use the count method defined on Python's tuples:

>>> count_6 = three_dice.count(6)
>>> count_6
0 : 125/216
1 :  75/216
2 :  15/216
3 :   1/216
>>> P(count_6 >= 2)
2/27

Once you are familiar with all these techniques, you could get such result in a one-liner:

>>> P(lea.interval(1,6).draw(3,replacement=True).count(6) >= 2)
2/27

Reducing sequences of probability distribution (reduce_all method)

Lea instances can be put in a Python container type, like lists and tuples. then, these can be "reduced" using a given 2-ary function to produce a new Lea instance. Consider for example the following tuple containing three dices with 2, 4 and 6 faces:

>>> dice = tuple(lea.interval(1,n,prob_type='r') for n in (2,4,6))
>>> dice
(1 : 1/2
2 : 1/2, 1 : 1/4
2 : 1/4
3 : 1/4
4 : 1/4, 1 : 1/6
2 : 1/6
3 : 1/6
4 : 1/6
5 : 1/6
6 : 1/6)

The display is not very clean because the tuple's display format mixes with Lea's. Now, suppose we throw the three dice together. To get the distribution of the sum of their values, the standard Python's sum function can be used:

>>> sum(dice)
 3 : 1/48
 4 : 3/48
 5 : 5/48
 6 : 7/48
 7 : 8/48
 8 : 8/48
 9 : 7/48
10 : 5/48
11 : 3/48
12 : 1/48

Fine,… but what if we want to get the product of the values of the three dice? There is no builtin function for this. You could do the job through a loop or by using the builtin reduce function, which is only available in Python 2. To meet this need, Lea provides a dedicate function, reduce_all, which treats the elements of an iterable of Lea instances according to a given 2-ary function.

>>> import operator
>>> Lea.reduce_all(operator.mul,dice)
 1 : 1/48
 2 : 3/48
 3 : 2/48
 4 : 5/48
 5 : 1/48
 6 : 5/48
 8 : 5/48
 9 : 1/48
10 : 2/48
12 : 6/48
15 : 1/48
16 : 3/48
18 : 2/48
20 : 2/48
24 : 4/48
30 : 1/48
32 : 1/48
36 : 1/48
40 : 1/48
48 : 1/48

The first argument of reduce_all is typically a standard 2-ary operator but any user-defined 2-ary function is valid also, provided that it applies to the values of Lea instances present in the sequence.

Note that reduce_all admits the optional argument absorber (default: None): you can specify the absorber value for the given function; for instance, we could set absorber=0 for the multiplication case above. The effect is that it activates a more efficient algorithm able to stop combinatorial as soon as the absorber is met. This can accelerate things dramatically. For instance, the following calculation takes several seconds:

>>> z = tuple(lea.bernoulli('1/4') for _ in range(16))
>>> Lea.reduce_all(operator.mul,z,absorber=0)
0 : 65535/65536
1 :     1/65536

In contrast, if you take care to specify the absorber value:

>>> Lea.reduce_all(operator.mul,z,absorber=0)

then you get the same result in a blink of an eye! Actually, the absorber is super-efficient on this special example because there is a lot of zeroes: among the theoretical 2**16=65536 cases, it's able to remove everything except 2 cases.

Advanced calculation options (calc method)

Suppose that you have defined a Lea instance x with several dependencies to other instances. Remember that Lea works with lazy evaluation: when you execute statements like print (x), print (x.p(4))) or avg = x.mean, a calculation occurs behind the scene to evaluate the pmf of x, before doing the specific required action. This calculation is done implicitly, in order to ease your life. You can however ask to perform this calculation explicitly at any time by making this call:

x.calc()

This performs the required calculations and returns a new instance with the calculated pmf. This instance is independent of any other variables, so it does the same as x.new(). As such, this call is not very useful except if you want to control finely when the calculation is done. Now, the real interest is that calc has three optional arguments that let you make some customization of the calculation algorithm. Before describing these arguments, beware that these are meant for "advanced" use: you have to handle them with care if you want to avoid any misinterpretation of the delivered results.

The full signature of the calc method is:

calc(prob_type=-1,sorting=True,normalization=True,bindings=None,memoization=True,
     algo=EXACT,nb_samples=None,nb_subsamples=None,nb_tries=None,exact_vars=None)

The first two arguments (prob_type and sorting) are described in the section dedicated to advanced constructor arguments. The three following arguments (normalization, bindings and memoization) are described below. Note that normalization has been presented earlier but it requires a special coverage here. The last five arguments (algo, nb_samples, nb_subsamples, nb_tries and exact_vars) are covered in the section dedicated to Monte-Carlo algorithms.

Avoiding normalization

The optional normalization argument has the same purpose as explained for the normalization argument in the pmf method. By calling calc(normalization=False), the probabilities that are collected are NOT divided by their sum. This means that you can get a "pseudo" probability distribution, for which the probability sum is different from 1. This occurs in particular when you deal with conditional probabilities (given method): the probability total you get is less than 1; the missing denominator is actually the probability of the given condition: in mathematical term: instead of getting P(A|B) = P(A&B)/P(B), you get P(A&B).

This feature may seem uninteresting but, in some cases, it could be used to have explanations on calculated probabilities. Consider the BN example given earlier. Imagine you want to know the probability that BOTH John AND Mary calls. This is easy to calculate:

>>> P(john_calls & mary_calls)
0.002084100239

however, it's difficult to explain this calculation or to redo it manually. We have seen that a way to understand such result is to display the 16 atomic events (or "possible worlds") by using the joint method:

lea.joint(earthquake,burglary,alarm,john_calls,mary_calls)

From this full joint probability distribution, you can now add the 8 probabilities where the last two variables are true; you should then obtain the same value as above. Since this treatment is cumbersome, you could use the given method to make the filtering for you:

>>> lea.joint(earthquake,burglary,alarm,john_calls,mary_calls).given(john_calls & mary_calls)
(False, False, False, True, True) : 0.23895323731595242
(False, False, True , True, True) : 0.30138246147957953
(False, True , False, True, True) : 1.4365911696438337e-05
(False, True , True , True, True) : 0.28358309688769245
(True , False, False, True, True) : 0.00034033391807504135
(True , False, True , True, True) : 0.17515213192200016
(True , True , False, True, True) : 2.3991168497726016e-08
(True , True , True , True, True) : 0.0005743485738355601

Unfortunately, the probabilities are not the same as in the previous joint table: the new joint table has been normalized to balance the entries that have been filtered out. Here comes the normalization=False trick:

>>> lea.joint(earthquake,burglary,alarm,john_calls,mary_calls).given((john_calls & mary_calls)).calc(normalization=False)
(False, False, False, True, True) : 0.0004980024990000001
(False, False, True , True, True) : 0.0006281112599999999
(False, True , False, True, True) : 2.994000000000003e-08
(False, True , True , True, True) : 0.0005910156
(True , False, False, True, True) : 7.0929e-07
(True , False, True , True, True) : 0.0003650346
(True , True , False, True, True) : 5.0000000000000054e-11
(True , True , True , True, True) : 1.197e-06

What you get is NOT a true probability distribution; it's just a truncated probability distribution! If you want now to sum the 8 probabilities, simply append .p_sum:

>>> lea.joint(earthquake,burglary,alarm,john_calls,mary_calls).given((john_calls & mary_calls)).calc(normalization=False).p_sum
0.0020841002389999997

This sum is then the probability that John and Mary calls together, as calculated before. So, to summarize, what we have done here is mimicking the internal calculations made internally by Lea's algorithm.

Making explicit bindings

In many cases, the evidences passed to the given method are equalities. This is sometimes called "observations": variable x has been observed to be a given value u, variable y has been observed to be a given value v, etc. For example, consider this situation which shows the probability distributioon of the sum of two dice given the observation of one of these die:

>>> d1 = lea.interval(1,6)
>>> d2 = lea.interval(1,6)
>>> d = d1 + d2
>>> d.given(d1 == 1)
2 : 0.16666666666666663
3 : 0.16666666666666663
4 : 0.16666666666666663
5 : 0.16666666666666663
6 : 0.16666666666666663
7 : 0.16666666666666663

Actually, since the Lea algorithm is designed to cover any evidence condition, it shall for enumerate all the 36 possible combination of values of (d1,d2) for the variables and evaluates for each of them whether the condition d1 == 1 is true or false. This works fine but there is some waste of CPU to make this work, especially if the number of the observed variables is getting big (remember also that their domain can be non-boolean, with unlimited number of elements). Fortunately, this waste can be avoided for elementary variables (those defined by a known prior pmf, like d1 and d2). The technique is called explicit bindings: this consists in specifying, in the bindings argument, a dictionary associating the observed variables with the their values. This is demonstrated in the following expression:

>>> d.calc(bindings={d1:1})

This returns the same result as previously but by browsing only 6 cases, which are the possible values of d2. Note that the specified binding holds for the query only, so d1 is unbound as soon as the result has been calculated. A more extreme example is to bind all elementary variables:

>>> d.calc(bindings={d1:1,d2:4})
5 : 1.0

In such case, the enumeration is limited to one single case.

Such technique could be efficient when the number of cases to examine is large. Remember however that it can be used only for elementary variables; an exception is raised if you try to bind a derived variable like d.

Disabling memoization

TO BE WRITTEN

Information Theory

Lea allows calculating several measures defined by information theory. These could provide new insights on your probability distributions.

Information

The most basic concept is the information carried by a given event. It is expressed in bits. As a basic example, consider two coins, a fair one and an unfair one:

>>> flip = lea.vals('head','tail')
>>> flip_u = lea.pmf({'head': 1/4, 'tail': 3/4})

Here is how to get the information carried by each possible result, after flipping the fair coin :

>>> flip.information_of('head')
1.0
>>> flip.information_of('tail')
1.0

Each result carries an information of 1 bit, as expected by the theory. Now, here are the information values for the unfair coin :

>>> flip_u.information_of('head')
2.0
>>> flip_u.information_of('tail')
0.4150374992788437

We see that the information carried by a 'head' result is higher (2 bits) than the information carried by 'tail'. This is sensible since a 'head' is less likely to occur (prob. 1/4) than a 'tail' (prob. 3/4), hence it carries more information.

For boolean probability distributions, note that .information_of(True) can be replaced by .information, which is a convenient shortcut. Example:

>>> rain = lea.event(1/8)
>>> rain.information
3.0
>>> (flip_u=='head').information
2.0

Entropy

If you take the average information carried by all events modeled by a given probability distribution, using probabilities of these events as weights, you get the entropy of this distribution. The entropy is one of the most important concept of information theory. It captures the "degree of randomness" of a given probability distribution.

Here is how to get the entropy of each defined coin:

>>> flip.entropy
1.0
>>> flip_u.entropy
0.8112781244591328

The entropy of the fair coin is 1 bit, which is not surprising since we have seen that each possible outcome brings 1 bit of information. The entropy of the unfair coin is about 0.81 bits; this is less than 1 bit, which is consistent with the fact that the unfair coin is less random than the fair one. Note that the theory states that 1 bit is the maximum entropy for a distribution with two values; it happens when the two values are equally likely.

Entropy and conditional probabilities

The information and entropy can be calculated on any Lea instance, whether defined explicitly as above or resulting of calculations, as explained throughout the tutorials. In particular, these indicators can be calculated on conditional probability distributions.

Let us take an example. There are 64 balls in an urn. 62 of these balls are blue and marked with a "x"; 2 of these balls are red: one is marked with an "x", the other with "y". Here is a short model of the distribution of these balls:

>>> ball = lea.pmf({ 'Bx': 62, 'Rx': 1, 'Ry': 1 },prob_type='r')
>>> ball
Bx : 62/64
Rx :  1/64
Ry :  1/64

Note that it's easy to extract the color code or the mark, using the expression ball[0] or ball[1] respectively (see "Indexing and slicing"). Let us create two new variables to ease the work:

>>> color = ball[0]
>>> mark = ball[1]
>>> color
B : 31/32
R :  1/32
>>> mark
x : 63/64
y :  1/64

Let us calculate the entropy of ball:

>>> ball.entropy
0.23187232431271465

It is quite low since it's almost certain that a drawn ball is Blue, with a "x" mark.

Suppose now that someone draws a ball and tell us some information on its color or mark (remember that the three variables ball, color and mark are interdependent due to referential consistency). Let us calculate the new entropy in the following cases:

  • What is the entropy if the ball's mark is "x"?
>>> ball.given(mark=='x').entropy
0.11759466565886476
  • What is the entropy if the ball's mark is "y"?
>>> ball.given(mark=='y').entropy
0.0
  • What is the entropy if the ball's color is blue?
>>> ball.given(color=='B').entropy
0.0
  • What is the entropy if the ball's color is red?
>>> ball.given(color=='R').entropy
1.0

We see in the three first cases that the given information lower the entropy, considering the initial value of 0.23 bits. In two cases, the entropy becomes null: this happens because the new information dissolve any uncertainty on the mark and color. In the last case however, the information that the ball is red increases the entropy to 1 bit. This value is consistent with the uncertainty remaining on the mark of the ball, which is equally likely "x" or "y". It may sound odd that new information makes the entropy increase. Actually, there's nothing wrong here, it's simply unusual. The initial unbalanced distribution is such that the reception of unlikely information ("the ball is red") can reduce the prior beliefs we had.

Mutual information, joint entropy and conditional entropy

When dealing with two dependent random variables, it's interesting to know the amount of information they share. This is the purpose of the mutual information, aka transinformation. The examples below use the ball, color and mark interdependent distributions of our last case.

>>> lea.mutual_information(color,mark)
0.08486507530476972
>>> lea.mutual_information(color,ball)
0.20062232431271465
>>> lea.mutual_information(mark,ball)
0.11611507530476972

Note that the mutual information is expressed in bits.

As a stupid example, we can verify that our ball lottery and coin flipping do not share anything:

>>> lea.mutual_information(ball,flip)
0.0

Note that the mutual information is symmetric, so the method call is unaffected by the order of its arguments.

Another interesting measure is the joint entropy of a set of random variables. It gives the entropy of the joint of these random variables, taking into account possible interdependence between them. Here are two basic examples.

>>> lea.joint_entropy(mark,color)
0.23187232431271465
>>> lea.joint_entropy(mark,flip)
1.1161150753047697

The first result shows that the joint entropy of mark and color is the same as the basic entropy of ball; this is not surprising: a given ball is completely described by its mark and color, so there has exactly the same level information that the couple of these two characteristics. In the second example, we join mark and flip, since these are independent events, we get the sum of the individual entropies (0.1161 bits + 1.0 bits). You can specify any number of arguments in lea.joint_entropy. Note that if you specify only one argument, then you get the basic entropy of this argument.

Another measure is the conditional entropy: what is the entropy of x given that y is true? This is also known as equivocation of x about y. Here are two basic examples of usage:

>>> ball.cond_entropy(mark)
0.11575724900794494
>>> mark.cond_entropy(ball)
0.0

The first result shows that the (unconditional) entropy of ball, previously calculated as 0.2319 bits, decreases to 0.1158 bits given that we know the mark on the ball. The second result shows that the entropy of the mark is null given that we know the drawn ball: this is consistent since there is no uncertainty on the mark given a ball. Note that specifying True (or any relationship that is certainly true) as argument of cond_entropy boils down to calculate the basic entropy.

Cross entropy and Kullback–Leibler divergence

WARNING: requires Lea 3.2 at least

For completing the topic of entropy-related measures in Lea, we shall introduce two more indicators that are specially useful for Machine Learning. These are the cross-entropy and the Kullback–Leibler divergence (also known as relative entropy). Both indicators allow to compare two probability distributions having same support by providing a measure of how far one fits the other. Assuming that d is the actual observed distribution and e is an estimated distribution, these are measures of how good e fits to d.

Basically, the cross entropy of d with e makes an average of information given by e using as weights the probabilities of d. For example, we could examine how given probability distributions fits the color distribution defined above ({B: 31/32, R: 1/32}):

>>> estimated_color1 = lea.pmf({ 'B': 20, 'R': 12 },prob_type='r')
>>> color.cross_entropy(estimated_color1)
0.7011020799303316
>>> estimated_color2 = lea.pmf({ 'B': 30, 'R': 2 },prob_type='r')
>>> color.cross_entropy(estimated_color2)
0.2151997355042477

Note that the cross entropy is not symmetric, i.e. cross entropy of d with e is not generally the same as cross entropy of e with d. When e is exactly the same as d, the cross entropy is minimal and is equal to the entropy of d. Therefore, the cross entropy can be seen as a generalization of the entropy, since the entropy of a given probability distribution is the same as the cross entropy of this distribution with itself.

>>> color.cross_entropy(color)
0.20062232431271465
>>> color.entropy
0.20062232431271465

Since the cross entropy tends to entropy as estimated distribution becomes closer to its target, it's tempting to calculate the difference between these two indicators. This is precisely what the Kullback–Leibler divergence does, which is accessible in Lea through kl_divergence method.

>>> color.kl_divergence(estimated_color1)
0.5004797556176169
>>> estimated_color2 = lea.pmf({ 'B': 30, 'R': 2 },prob_type='r')
>>> color.kl_divergence(estimated_color2)
0.014577411191533052

We see that KL divergence (aka relative entropy) offers a more objective measure of fitness than cross entropy since it tends towards 0 as the estimation becomes closer to its target, whatever target entropy.

For more practical examples of usage, you may have a look on section dedicated to Machine Learning using EM algorithm - case 3.

Likelihood ratio (LR)

The likelihood ratio (LR) of an evidence E given an hypothesis H is defined as

        P(E | H)
  LR = --------------
        P(E | not H)

It indicates how far the given evidence E confirms the given hypothesis H. It is traditionally used in medical research and forensic sciences.

As a toy problem, imagine that a die has been thrown but we have very limited ways to gather information on the result. Let us assume that our hypothesis is "the die result is even". After investigations, we get an evidence: the result is not 1. What is the LR of this evidence for our hypothesis? The lea.lr(E,H) syntax makes this calculation:

>>> die = lea.interval(1,6)
>>> parity = die.map(lambda x: 'even' if x%2==0 else 'odd')
>>> lea.lr(die != 1, parity == 'even')
1.5

You could check this result, by translating the LR definition above in other Lea primitives:

>>> P((die != 1).given(parity == 'even')) / P((die != 1).given(parity != 'even'))
1.5

Since LR > 1, the given evidence is indeed somehow relevant for the hypothesis (as our intuition suggests!).

Notes: the following rules are enforced, in this sequence:

  1. if E is impossible, then an exception is raised (LR = 0/0)
  2. if H is impossible, then an exception is raised (P(E | H) undefined)
  3. if H is certain, then an exception is raised (P(E | not H) undefined)
  4. if H is impossible given E, then LR = 0.0
  5. if H is certain given E, then LR = +infinity (Python's math.inf)

Examples:

>>> lea.lr(die == 3, parity == 'even')
0.0
>>> lea.lr(die == 2, parity == 'even')
inf

There is another syntax to calculate the LR: E.given(H).lr(); this flavor emphasizes the numerator of the LR expression and provides same result as the syntax seen above. It may be handier to express an hypothesis that is a conjunction of several sub-hypothesis, which can be put as arguments of given(…):

>>> (die >= 2).given(parity == 'even').lr()
1.5
>>> (die >= 2).given(parity == 'even', die <= 5).lr()
1.3333333333333333

Symbolic computation with probabilities

We have seen that Lea can handle probabilities expressed as variable names (p, q, …), provided that the SymPy library is installed. This approach is commonly referred as symbolic computation. All the techniques for probabilistic calculation seen above in the tutorial can be revisited with probability variables in place of probability figures.

Such approach is useful when the same query is made over and over on the same complex model, with only varying prior probability values. In such case, the query result may be prepared "offline" by Lea, once for all, into an arithmetic expression; this may take maybe a long processing time; then, this expression can be copy-pasted in a dedicated program to be evaluated many times using fast arithmetic computations, with different input probability values. This dedicated program could be written in any programming language: Python, C, C++, Java, etc, without the need of any probability library.

Let's start with basic examples: let's call p the probability that number of heads obtained by flipping one coin and q the probability that it shall rain tomorrow:

>>> flip = lea.bernoulli('p')
>>> rain = lea.event('q')
>>> flip
1 : p
0 : -p + 1
>>> rain
True  : q
False : -q + 1

We can now combine these two independent variables to calculate probabilities of events, defined by conjunction or disjunctions:

>>> (flip == 0) & rain
False : p*q - q + 1
True  : -q*(p - 1)
>>> (flip == 0) | rain
False : -p*(q - 1)
True  : p*q - p + 1 

Since these probability distributions have a Boolean domain, we can get the probability of True by using the P function.

>>> P = lea.P
>>> P((flip == 0) | rain)
p*q - p + 1 

The object returned by P is a SymPy expression (more precisely, an instance of some subclass of SymPy's Expr). You can then apply methods provided by SymPy, like substituting some variable(s) using the subs method:

>>> P((flip == 0) | rain).subs('p',0.5)
0.5*q + 0.5
>>> P((flip == 0) | rain).subs((('p',0.5),('q',0.2)))
0.600000000000000
>>> P((flip == 0) | rain).subs('p','q')
q**2 - q + 1

Beside booleans, we can also calculate probability distributions.

>>> flip - lea.bernoulli('q')
-1 : -q*(p - 1)
 0 : 2*p*q - p - q + 1
 1 : -p*(q - 1)
>>> flip4 = flip.times(4)
>>> flip4
0 : (p - 1)**4
1 : -4*p*(p - 1)**3
2 : 6*p**2*(p - 1)**2
3 : -4*p**3*(p - 1)
4 : p**4

Remind that flip4 probability distribution is also known as the binomial distribution, which you can build using a dedicated function:

>>> flip4 = lea.binom(4,'p')

You can then evaluate the probability of some conditions using the usual Lea syntax:

>>> P((1 <= flip4) & (flip4 < 4))
-2*p*(p - 1)*(p**2 - p + 2)
>>> P(flip4.is_any_of(1,2,3))
-2*p*(p - 1)*(p**2 - p + 2)

and obtain standard indicators:

>>> flip4.mean
4*p
>>> flip4.var
-4*p*(p - 1)
>>> flip4.entropy
(-p**4*log(p**4) + 4*p**3*(p - 1)*log(-4*p**3*(p - 1)) - 6*p**2*(p - 1)**2*log(6*p**2*(p - 1)**2) + 4*p*(p - 1)**3*log(-4*p*(p - 1)**3) - (p - 1)**4*log((p - 1)**4))/log(2)

Note that all these expressions are calculated, using the same algorithms than those used for usual numerical probabilities. Lea just calls the sympy.factor function as a post-treatment in order to simplify the resulting expressions as much as possible. You can notice that the calculated mean 4*p agrees with the theoretical formula for the mean of the binomial.

To make variable substitution in a probability distribution that is non-boolean, you simply have to call subs method on that distribution: this call is propagated, with the given arguments, to all inner probability expressions:

>>> flip4.subs('p',0.25)
0 : 0.316406250000000
1 : 0.421875000000000
2 : 0.210937500000000
3 : 0.0468750000000000
4 : 0.00390625000000000

Actually, you could check that this last distribution boils to the same distribution as

>>> lea.binom(4,0.25)

The sole difference is that the probability type is slightly different between the two distributions: it's SymPy.Float for the former while it's Python's float for the latter.

So far, we have defined symbolic probabilities by means of strings of characters. Behind the scene, these strings are converted to SymPy symbols thanks to sympy.symbols function. In order to have better control on these variables, you could also build these symbols yourself and pass them to Lea constructors in place of strings. Here is an example:

>>> import sympy
>>> (p,q) = sympy.symbols('p q')
>>> lea.binom(2,p) + lea.binom(2,q)
0 : (p - 1)**2*(q - 1)**2
1 : -2*(p - 1)*(q - 1)*(2*p*q - p - q)
2 : 6*p**2*q**2 - 6*p**2*q + p**2 - 6*p*q**2 + 4*p*q + q**2
3 : -2*p*q*(2*p*q - p - q)
4 : p**2*q**2

So far, we have not yet defined a probability distribution using the pmf function. Let's give it a try:

>>> lea.pmf({"A": 'pA', "B": 'pB', "C": 'pC'})
A : pA/(pA + pB + pC)
B : pB/(pA + pB + pC)
C : pC/(pA + pB + pC)

You can see that the probability variables have been normalized to ensure that the sum equals 1. This is somehow unpleasant because there is actually 2 degrees of freedom, not 3. You could then prefer to remove pC as it can be derived from pA and pB. As explained, in the section dedicated to normalization, replacing pC by None will do the job:

>>> lea.pmf({"A": 'pA', "B": 'pB', "C": None})
A : pA
B : pB
C : -pA - pB + 1

Note that you could achieve the same result using a more explicit approach:

>>> (pA,pB) = sympy.symbols('pA pB')
>>> lea.pmf({"A": pA, "B": pB, "C": 1-(pA+pB)})

Symbolic computation can handle more complex problems like conditional probabilities and Bayes reasoning. Let us revisit our first example of Bayes network, replacing all the 6 probability values by variables:

burglary   = lea.event('a')
earthquake = lea.event('b')
alarm = lea.joint(burglary,earthquake).switch({ (True ,True ) : lea.event('c'),
                                                (True ,False) : lea.event('d'),
                                                (False,True ) : lea.event('e'),
                                                (False,False) : lea.event('f') })
john_calls = alarm.switch({ True  : lea.event('g'),
                            False : lea.event('h') })
mary_calls = alarm.switch({ True  : lea.event('i'),
                            False : lea.event('j') })

We can now query the BN as before, getting formulas instead of numbers:

>>> pb = P(burglary.given(mary_calls & john_calls))
>>> pb
a*(b*c*g*i - b*c*h*j - b*d*g*i + b*d*h*j + d*g*i - d*h*j + h*j)/(a*b*c*g*i - a*b*c*h*j - a*b*d*g*i + a*b*d*h*j - a*b*e*g*i + a*b*e*h*j + a*b*f*g*i - a*b*f*h*j + a*d*g*i - a*d*h*j - a*f*g*i + a*f*h*j + b*e*g*i - b*e*h*j - b*f*g*i + b*f*h*j + f*g*i - f*h*j + h*j)

To gain confidence in this long formula, you could substitute the 6 variables by their actual values given earlier and check the result:

>>> values = (('a',0.001),('b',0.002),('c',0.95),('d',0.94),('e',0.29),('f',0.001),('g',0.9),('h',0.05),('i',0.7),('j',0.01))
>>> pb.subs(values)
0.284171835364393

You can similarly check results provided for the other queries, by simply appending .subs(values) on them.

As you can expect it from a symbolic computation system, you are free to fix only a subset of variables: you will get then a new formula with the remaining free variables:

>>> pb.subs((('a',0.001),('b',0.002),('c',0.95),('d',0.94),('e',0.29),('f',0.001)))
0.001*(0.94002*g*i + 0.05998*h*j)/(0.002516442*g*i + 0.997483558*h*j)

As explained in the introduction, the returned formula could be copy-pasted in a program, which is completely unaware of Bayes networks but which is fast for number-crunching. Note also that SymPy allows reformatting such formula into various formats, like LaTeX:

 >>> print(latex(pb))
\frac{a \left(b c g i - b c h j - b d g i + b d h j + d g i - d h j + h j\right)}{a b c g i - a b c h j - a b d g i + a b d h j - a b e g i + a b e h j + a b f g i - a b f h j + a d g i - a d h j - a f g i + a f h j + b e g i - b e h j - b f g i + b f h j + f g i - f h j + h j}

You can then easily embed nice formula in your papers:

formula.gif

Monte-Carlo sampling

Random generation through MC (random_mc method)

We have seen in the first part of the tutorial how to generate random samples with the random(…) method. An important point must be known on the way this method works. Consider a set of 9 different "dice", with number of faces ranging from 2 to 10. Let us make 20 throws of these 9 dice and get the sum:

dice = tuple(lea.interval(1,n) for n in range(2,11))
sum_dice = sum(dice)
print (sum_dice.random(20))
# -> (32, 26, 33, 31, 28, 39, 32, 31, 38, 26, 26, 33, 33, 27, 26, 27, 33, 28, 47, 25)

You will note that the generation of the random sample takes some times. Why? Actually, what happens is that, behind the scene, the actual probability distribution of sum_dice is calculated (a heavy combinatorial process), which produces a resulting probability distribution; then, from this instance, the 20 random samples are obtained. So, what actually happens is equivalent to this:

calculated_sum_dice = sum_dice.calc()
print (calculated_sum_dice.random(20))
# -> (32, 26, 33, 31, 28, 39, 32, 31, 38, 26, 26, 33, 33, 27, 26, 27, 33, 28, 47, 25)

The heavy calculation is done by the calc() call (see below). Note that, because of a cache mechanism, any further call to sum_dice.random(...) shall be very fast (as fast as calculated_sum_dice.random(...), actually).

In several cases, using this approach just to get random samples is intractable because of combinatorial explosion. The random_mc(…) method proposes an alternative implementation, which tackles this limitation. Let's revisiting the same example:

sum_dice2 = sum(dice)
sum_dice2.random_mc(20)
# -> (37, 35, 38, 34, 29, 31, 32, 21, 31, 30, 35, 39, 35, 40, 31, 22, 30, 29, 39, 31)

This now produces random samples without any latency. How does it work? random_mc does not try to calculate the final probability distribution; it just make random sample on each elementary distribution, then it aggregates the results together. In the current case, for each of the 20 trials, it throws the 9 dice and it sums up the results. This is closer to the real process since it's basically a simulation of this process.

random_mc can handle any type of Lea construct, including those specifying a condition:

print (sum_dice2.given(sum_dice2 <= 24).random_mc(20))
# -> (24, 18, 18, 20, 21, 23, 24, 21, 23, 24, 22, 22, 24, 24, 24, 23, 24, 21, 22, 24)

Take care that these examples does NOT imply that the random_mc method is the real McCoy. The random and random_mc methods have actually a complimentary usage:

  • random tends to be better when the end values result from long calculations and/or when the combinatorial is tractable and/or the random sample to produce is large;
  • random_mc tends to be better when the end values result from short calculations and/or when the combinatorial is intractable and/or the random sample to produce is small.

Approximate inference by Monte-Carlo approaches

The random_mc(…) method may serve another purpose than pure random sample generation. It may be used to estimate the resulting probability distribution based on random samples. By default, Lea calculates exact probability distributions; this works very fine for all examples shown in previous tutorial pages. However, many real-world probabilistic problems have a large sample space and require a prohibitive time to be solved in an exact way. This is especially the case for large or densely connected Bayesian networks. For such intractable problems, Lea provides several Monte-Carlo algorithms that are able to produce approximate results in a reasonable time, with precision tunable by giving a number of random samples.

Let's start by using the random_mc(…) tool you have just learned. You may estimate the probability distribution of the sum of 9 dice seen above, based on a sample of 10,000 trials:

estimated_sum_dice = lea.vals(*sum_dice2.random_mc(10000))
print (estimated_sum_dice)
# -> 13 : 0.0002
     14 : 0.0002
     15 : 0.0007
     16 : 0.0016
     17 : 0.0017
     18 : 0.0039
     19 : 0.007
     …

One of the interest is that you can tune the sample size as you wish according to the precision you want. Since this approach is very useful, Lea provides a convenience method to do the job more easily, namely estimate_mc(…). The following statement

estimated_sum_dice = sum_dice2.estimate_mc(10000)

is equivalent to the one shown above.

This method of estimating a probability distribution by MC sampling is very useful in many situations having large sample space size, like querying a complex Bayesian network. The next subsection elaborates this theme, providing more algorithms and options.

The calc method and Monte-Carlo algorithms

The estimate_mc seen above is actually a convenience method, which is simple to use. Lea provides actually several probabilistic inference algorithms, which are accessible through the x.calc(...) method, where x is any Lea instance. Before coveing the MC algorithms themselves, some explanations are needed on this calc method. Suppose that you have defined a Lea instance x with several dependencies to other instances. When you execute statements like print (x), print (x.p(4))), avg = x.mean or sample = x.random(1000), a calculation occurs behind the scene to evaluate the pmf of x, before doing the specific required action. This calculation is done implicitly, in order to ease your life. You can however ask to perform this calculation explicitly at any time by making this call:

x.calc()

This performs the required calculations and returns a new instance with the calculated pmf. This instance is independent of any other variables, so it does the same as x.new(). As such, this call is not very useful except if you want to control finely when the calculation is done. Now, the real interest is that calc has several optional arguments that let you make fine customization of the calculation algorithm.

Let us consider the example of Bayesian network on Wikipedia.

from lea import *
rain = event(0.20)
sprinkler = if_(rain, event(0.01),
                      event(0.40))
grass_wet = joint(sprinkler,rain).switch({ (False,False): False,
                                           (False,True ): event(0.80),
                                           (True ,False): event(0.90),
                                           (True ,True ): event(0.99)})

We shall define the following query, which represents the conditional probability distribution P(rain | grass wet):

query = rain.given(grass_wet)

At this stage, nothing is calculated yet. The following three statements are equivalent and produce the exact result:

print (P(query))
print (P(query.calc()))
print (P(query.calc(algo=EXACT)))
# -> 0.35768767563227616

Now, let's imagine that calculating the exact probability of query is intractable (i.e. the calculation takes ages to terminate). As a fallback, you want to obtain an approximate result using a Monte-Carlo algorithm. The easiest option is to use the rejection sampling algorithm. Here is how to proceed for 10,000 samples:

WARNING: requires Lea 3.3 at least

print (P(query.calc(algo=MCRS, nb_samples=10000)))
# -> 0.3585

Note: This statement is actually equivalent to P(query.estimate_mc(10000)), in a more pedantic style.

Actually, the rejection sampling has the drawback to be inefficient if the given condition (here, grass_wet) has a small probability to occur: the algorithm uses most of the processing time to find bindings satisfying the condition. The following statements, with other argument options, provide useful alternatives.

This variant generates 10 sub-samples each time a binding satisfying the condition grass_wet is satisfied (speeding up the calculation):

print (P(query.calc(algo=MCRS, nb_samples=10000, nb_subsamples=10)))
# -> 0.3595

This variant uses the weighting likelihood algorithm, using the exact algorithm for the condition (grass_wet) and MC for the conditioned part (rain):

print (P(query.calc(algo=MCLW, nb_subsamples=10000)))
# -> 0.357687675632294

Actually, the last statement is a bit absurd for the present query. Since rain is referred in grass_wet, it is embedded in the exact calculation, so this calculation boils down to the EXACT algorithm. Actually, setting nb_subsamples=1 is enough to get the exact result (without rounding errors caused by summation of 10,000 probabilities). Actually, the weighting likelihood algorithm would make sense for queries as:

print (P(grass_wet.given(rain).calc(algo=lea.MCLW, nb_subsamples=10000)))
# -> 0.8066000000000276

What happens here is that the MC is performed in two parts: 1° random samples grass_wet with fixed rain=True (probability weight 0.2), 2° random samples grass_wet with fixed rain=False (probability weight 0.8)

The full signature of the calc method is:

calc(prob_type=-1,sorting=True,normalization=True,bindings=None,memoization=True,
     algo=EXACT,nb_samples=None,nb_subsamples=None,nb_tries=None,exact_vars=None)

We only cover here the last five arguments, viz. algo, nb_samples, nb_subsamples, nb_tries, exact_vars; the other arguments can safely keep their default values for the present needs. Note that the first two arguments (prob_type and sorting) are described in the section dedicated to advanced constructor arguments; the three following arguments normalization, bindings, and memoization are described in this section.

  • algo (default: EXACT): four algorithms are available (note that these are available as variables in the lea module, so you have to add the prefix lea. if you import no more than the module):
    • EXACT: calculates the exact probability distribution using the Statues algorithm; for such choice, the remaining arguments shall not be used;
    • MCRS (Monte-Carlo Rejection Sampling): calculates an approximate probability distribution following the MC rejection sampling algorithm on nb_samples random samples; if self is an Ilea instance, i.e. evaluating a conditional probability x.given(e), then the algorithm may be speed up by specifying the nb_subsamples argument, which shall be a divisor of nb_samples; each time the condition e is satisfied, nb_subsamples random samples are taken on the conditioned part x instead of a single one; specifying nb_subsamples is especially valuable if the condition has a small probability;
    • MCLW (Monte-Carlo Likelihood Weighting): this requires that self is an Ilea instance, i.e. a conditional probability x.given(e); this algorithm calculates an approximate probability distribution by making first an exact evaluation of the condition e using the Statues algorithm; then, for each binding that verifies the condition with some probability p, it makes nb_subsamples random samples on the conditioned part x, assigning a weight p to these samples; this algorithm is especially valuable if the condition has a small probability while its exact evaluation is tractable; this algorithm accepts also an optional exact_vars argument, to include given variables in the exact evaluation, beyond these already referred in the condition (see MCEV);
    • MCEV (Monte-Carlo Exact Variables): calculates an approximate probability distribution by making first an exact evaluation of the variables given in exact_vars using the Statues algorithm; for each binding found with some probability p, it makes nb_subsamples random samples on remaining (unbound) variables, assigning a weight p to these samples; MCEV algorithm cannot handle expressions under condition, i.e. x.given(e); MCLW shall be used instead;
  • nb_samples (default: None): number of random samples made for MCRS algorithm;
  • nb_subsamples (default: None): only for MCRS and MCLW algorithms and if self is an Ilea instance, i.e. a conditional probability x.given(e); it specifies the number of random samples made on x for each binding verifying the condition e; for MCRS, nb_subsamples is optional, if specified it shall be a divisor of nb_samples; for MCLW, nb_subsamples is mandatory;
  • nb_tries (default: None): if not None, defines the maximum number of trials in case a random value is incompatible with a condition; this happens only if the current Lea instance is an Ilea instance x.given(e) or is referring to such instance; for MCLW algorithm on x.given(e), it only applies on x, should it refers to Ilea instances, since e is evaluated using the exact algorithm; if a condition cannot be satisfied after nb_tries tries, then an error exception is raised; WARNING: if nb_tries is None, any infeasible condition shall cause an infinite loop;
  • exact_vars (default: None): only for MCEV algorithm: an iterable giving the variables referred in self that shall be evaluated by using the exact algorithm, the other ones being subject to random sampling;

On choosing the right algorithm and options...

EXACT is the default algorithm; it is the recommended algorithm for all tractable problems; it allows in particular to work with probability fractions and symbols. For untractable problems, the three other algorithms offer fallback options. MCRS algorithm with sole nb_samples argument is the easiest option; choosing the value for nb_samples is a matter of trade-off between result accuracy and execution time. If the evaluated expression contains conditions having low probabilities, then the MCRS algorithm may be inefficient: as a rejection sampling algorithm, it may use most of processing time to find bindings satisfying the condition. For improving the efficiency, the nb_subsamples argument can be used: this allows making multiple random samples each time the condition is met, instead of a single one; the samples are generated until the condition has been satisfied n times, with n = nb_samples/nb_subsamples. For a given nb_samples, increasing nb_subsamples shall speed up the calculation; however, the result accuracy may tend to decrease if the condition is not visited enough (e.g. choosing nb_subsamples = nb_samples will satisfy the condition with only one binding). MCLW algorithm with mandatory nb_subsamples argument is the best choice if the evaluation of the conditioned part is untractable meanwhile the condition is tractable (whatever its probability); every binding verifying the condition is covered and, for each one, nb_subsamples random samples are generated, weighted by the binding's probability. MCEV algorithm is suited for untractable problems, from which a set of variables v1, ... , vn can be evaluated in a reasonable time or, in other words, if joint(v1 , ... , vn) is tractable; if this set of variables is specified in exact_vars argument, then all their value combinations are browsed systematically by the exact algorithm while random sampling is done for other variables.

How to optimize probabilistic inference?

TO BE WRITTEN

  • use floating-point representation
  • use symbolic computation
  • use times method
  • use multiple arguments in given method
  • explicit bindings
  • replace cpt by switch or if_ method
  • cascaded CPT (contextual independence)
  • calculate and save independent results
  • use MC sampling

The Statues algorithm

The exact calculation of probabilities in Lea is made by an original algorithm, named the Statues algorithm. The present section provides a short introduction on this algorithm. For more in-depth coverage, references are provided at the end.

The What…

The Statues algorithm performs exact probabilistic inference, by opposition to approximate algorithms like the Monte-Carlos approaches seen above. This algorithm is limited strictly to discrete probabilities. The name "Statues" is borrowed from the popular Statues game; other names include "Red Light, Green Light" (US), "Grandmother's Footsteps" (UK), "1-2-3, SOLEIL!" (France) and "1-2-3, PIANO!" (Belgium).

Contrasting to several well-established exact algorithms, the Statues algorithm is able to deal with several probabilistic modeling approaches: as seen in the present tutorial, it can handle in particular probabilistic arithmetic, conditional probabilities, JPD, CPT, BN, Markov chains. Also, different modeling approaches can be seamlessly mixed together to provide more expressive models. For instance, assuming two BN variables x, y represent numbers, a new variable z = x + y can be created to represent their sum; this new variable z could then be queried to get its probability distribution given some evidence; also, z could be used itself to express some evidence. So, the Statues algorithm has the ability to unify several probabilistic modeling approaches.

Before going further, it's important to understand what are the input and output of the algorithm.

  • In input, it requires a given node from a given direct acyclic graph (DAG); this DAG defines how each variable depends from the others, up to "atomic" variables that are defined by simple explicit pmf.

  • In output, the algorithm returns the pmf of the given node.

The dependency relationships expressed in the input DAG can be arithmetic operators, logical operators, user-defined functions, CPT constructs, etc. This DAG is constructed on-the-fly by Lea, starting from prior pmf and following the different techniques shown in the present tutorial. Note that the DAG construction is out of the scope of the Statues algorithm; it's just a pretreatment process that builds a data structure, following recursive compositional rules. Sometimes, this is referred as "lazy evaluation" because the actual evaluation is done later, when needed.

If any node has no more than one parent, then the DAG is actually a tree, where the leaf nodes are explicit prior pmf. In such case, a recursive algorithm building a pmf from the pmfs calculated on each child nodes can do the job. Such case is specific to situations where any variable occur only at one place and influence only one other variable. Note that the very earliest prototype of Lea worked with such algorithm, which was fine for basic probabilistic arithmetic. In the general situations (e.g. given conditions), this constraint is not respected and the recursive algorithm mentioned above give wrong results because it does not enforce referential consistency: one given node referenced by two parent nodes will be seen as two independent events, although they share the same pmf. Here comes the "Statues" algorithm...

The How…

The Statues algorithm actually starts when calculating the pmf of a given node is required. This happens when you ask to display a variable or expression; this happens also when you call the P function on such variable or expression.

How does it work?

Here are the most important features. The Statues algorithm uses generators instead of classical functions. In Python, a generator is, roughly speaking, a special routine that can halt at given places, delivers results to its caller, then resumes when commanded so by the caller. In Python source code, generators are easily identified in functions by the presence of yield statement(s). Instead of asking children to calculate and return their pmf, the nodes ask their children to browse the possible variable bindings one by one and to generate the associated probabilities, ensuring the consistency at any stage. More precisely, any node requests its children nodes to generate atoms: for a given node X, the generated atoms are (v,p) pairs such that the sum of all p for a given v equals the probability P(X = v). If all children obey this rule, then any node can in turn apply simple rules to generate its own atoms (u,q) from the atoms it receives. In particular,

  • a node representing the addition X+Y shall get any atom (x,p) from X and any atom (y,q) from Y and generates in turn the atom (x+y,pq);
  • a node representing X | Y (i.e. X given Y) shall get any atom (y, q) from Y; then, only if y is true, it shall get the next atom (x,p) of X and generate the atom (x,pq);
  • etc.

Finally, the (u,q) atoms received by the top-level node (input query) are aggregated by summing the probabilities value by value, producing the expected pmf. The generators created recursively on the nodes allows keeping the binding of any variable consistent. In contrast with usual functions, the created generators are alive during the whole calculation and work cooperatively to delivers individual atoms to the queried node, which is the sole node to calculate a pmf.

References

Admittedly, the explanations above just give you, at the best, a rough idea of the process. If you want a true understanding of the Statues algorithm, you can:

What's next?

Thank you for reading the present tutorial pages! We hope that you found it enough clear and informative. If not done yet, you may have a look on Examples page.

If you have interests in Machine Learning, note that there is an extra tutorial page on this topic. It covers maximum likelihood and EM algorithms.

For enhancements or bugs, please create issues on bitbucket Lea page.

Feel free also to send an e-mail to pie.denis@skynet.be (English or French spoken). Use cases of Lea in your projects/researches are especially appreciated! A "references" page listing the projects using Lea could greatly improve the reputation of the library.

Updated