Wiki
Clone wikiLea / Lea3_Tutorial_3
< Tutorial [2/3]
Wiki Home
Machine Learning >
Table of Content
 Introduction
 Other functions and features
 Plotting histograms (histo, plot methods)
 Mode and maximum a posteriori estimation (MAP)
 Drawing with/without replacement (draw method)
 Revised probability distributions (given_prob method)
 Min / max functions (min_of, max_of functions)
 Indexing and slicing
 Object attributes and methods
 Reducing sequences of probability distribution (reduce_all method)
 Advanced calculation options (calc method)
 Information Theory
 Symbolic computation with probabilities
 MonteCarlo sampling
 How to optimize probabilistic inference?
 The Statues algorithm
 What's next?
Introduction
The present page is the third part of the tutorial for the Lea library. It assumes that you have installed Lea 3 and that you are familiar with the techniques presented in the first part of the tutorial. The main topics are mostly independent, so you could choose to jump to your preferred subject without much trouble (see TOC above).
Other functions and features
Plotting histograms (histo
, plot
methods)
The plot
method (matplotlib)
If you've installed the matplotlib library, you can very easily plot the probability mass function of a given Lea instance. We have already seen that all you have to do is invoking the plot()
method on any Lea instance. By default the plot()
method draws the pmf in a dedicated window. Several arguments can be specified to customize this behavior, including saving the chart in an image file. Here is the full signature, with default values:
your_lea_instance.plot(title=None,fname=None,savefig_args=dict(),**bar_args)
argument  description 

title 
a string giving the title to display (no title if None) 
fname 
if not None, then the chart is saved in a file specified by given fname , as described in matplotlib.pyplot.savefig function 
savefig_args 
(used only if fname is not None) a dictionary relayed to matplotlib.pyplot.savefig function and containing named arguments expected by this function 
bar_args 
named arguments relayed to matplotlib.pyplot.bar function 
Example:
from lea.leaf import dice dice(3,6).plot(title='Three fair dice',fname='pmf_3d6.png',savefig_args=dict(bbox_inches='tight'),color='lightblue')
The histo
and as_string
methods
If you haven't installed the matplotlib library, there are alternative ways to plot histograms:
 use another plotting package: this requires to extract pmf data, which is the subject of the next section,
 or display textoriented histograms, directly on your console: this is the subject of the present section.
The histo
method returns a string that allows displaying the pmf as a textual histogram:
>>> print(dice(3,6).histo()) 3 : 4 :  5 :  6 :  7 :  8 :  9 :  10 :  11 :  12 :  13 :  14 :  15 :  16 :  17 :  18 :
If you want to scale this histogram you can pass a percentage argument: for instance .histo(200)
shall double the default size.
For advanced display control, there exists a more powerful formatting method: as_string(…)
. It allows you formatting together the probability numbers and the histogram. The first argument is a string of 1 or 2 characters: the first character is '.'
for float, '%'
for percentage or '/' for fraction format. If you append '', then you get the histogram just on the right of the number. Also, there are two more optional arguments for specifying the number of decimals required (nb_decimals
with default: 6) and the scale of the histogram (histo_size
with default: 100). Here are some examples of what you can do.
>>> print (dice(3,6).as_string('%',nb_decimals=2)) 3 : 0.46 % 4 : 1.39 % 5 : 2.78 % 6 : 4.63 % 7 : 6.94 % 8 : 9.72 % 9 : 11.57 % 10 : 12.50 % 11 : 12.50 % 12 : 11.57 % 13 : 9.72 % 14 : 6.94 % 15 : 4.63 % 16 : 2.78 % 17 : 1.39 % 18 : 0.46 % >>> print (dice(3,6,'r').as_string('/',histo_size=200)) 3 : 1/216  4 : 3/216  5 : 6/216  6 : 10/216  7 : 15/216  8 : 21/216  9 : 25/216  10 : 27/216  11 : 27/216  12 : 25/216  13 : 21/216  14 : 15/216  15 : 10/216  16 : 6/216  17 : 3/216  18 : 1/216  >>> print (dice(3,6).as_string('.',nb_decimals=6,histo_size=300)) 3 : 0.004630  4 : 0.013889  5 : 0.027778  6 : 0.046296  7 : 0.069444  8 : 0.097222  9 : 0.115741  10 : 0.125000  11 : 0.125000  12 : 0.115741  13 : 0.097222  14 : 0.069444  15 : 0.046296  16 : 0.027778  17 : 0.013889  18 : 0.004630 
Mode and maximum a posteriori estimation (MAP)
In some situations, you are just interested in the values that have the highest probability in a given probability distribution. This is traditionally known under the term "mode". On Lea instances, you can get these values as a tuple, using the mode
attribute. Here are some example showing the mode associated to throws of 1, 2 and 3 dice:
>>> d1 = interval(1,6) >>> d1.mode (1, 2, 3, 4, 5, 6) >>> d1.times(2).mode (7,) >>> d1.times(3).mode (10, 11)
The mode is useful in particular to determine a maximum a posteriori estimation (MAP), which consists in finding the most likely assignments of a given set of variables given some evidence. Consider for example the Bayesian network presented earlier. A MAP query could be: "What are the most likely states for burglary
, earthquake
and john_calls
variables, given that we know that Mary called?"
>>> lea.joint(burglary,earthquake,john_calls).given(mary_calls).mode ((False, False, False),)
The answer is: burglary = earthquake = john_calls = False
, which means in clear that Mary actually called without worrying reasons, just to take news. Here are some other MAP queries for the same three variables, given other evidences:
>>> lea.joint(burglary,earthquake,john_calls).given(burglaryearthquake).mode ((False, True, False),) >>> lea.joint(burglary,earthquake,john_calls).given(mary_calls,burglaryearthquake).mode ((True, False, True),)
Drawing with/without replacement (draw
method)
In several situations, we are interested in a sequence of values obtained from the same random source: lottery, card drawing, dice throws, etc. These sequences can be defined using different protocols:
 with or without replacement: whether the drawn value is withdrawn from the remaining values (no duplicates)
 sorted or not: whether each sequence shall be sorted or not (i.e. order of drawings is irrelevant or not).
The draw method addresses such need, supporting the different protocols through two Boolean arguments. The signature is as follows:
leaInstance.draw(n,sorted=False,replacement=False)
argument  description 

n 
the number of objects to draw, each value of the resulting distribution will be a n tuple 
sorted 
boolean: True means that the order of drawing is irrelevant and the tuples are arbitrarily sorted by increasing order (default: False) 
replacement 
boolean: True means that the drawing is made WITH replacement so the same element may occur several times in each tuple (default: False) 
The two boolean arguments sorted
and replacement
can be used independently each from each other, leading to four cases. The best way to understand the meaning of these arguments is to show the result obtained for each of the four combinations. Let's define a nonuniform distribution with 3 objects:
>>> d = lea.pmf({'A': '3/6', 'B': '2/6', 'C': '1/6'}) >>> d A : 3/6 B : 2/6 C : 1/6
case 1. No sorting, without replacement
>>> d.draw(2) ('A', 'B') : 20/60 ('A', 'C') : 10/60 ('B', 'A') : 15/60 ('B', 'C') : 5/60 ('C', 'A') : 6/60 ('C', 'B') : 4/60
case 2. No sorting, with replacement
>>> d.draw(2,replacement=True) ('A', 'A') : 9/36 ('A', 'B') : 6/36 ('A', 'C') : 3/36 ('B', 'A') : 6/36 ('B', 'B') : 4/36 ('B', 'C') : 2/36 ('C', 'A') : 3/36 ('C', 'B') : 2/36 ('C', 'C') : 1/36
case 3. Sorting, without replacement
>>> d.draw(2,sorted=True) ('A', 'B') : 35/60 ('A', 'C') : 16/60 ('B', 'C') : 9/60
case 4. Sorting, with replacement
>>> d.draw(2,sorted=True,replacement=True) ('A', 'A') : 9/36 ('A', 'B') : 12/36 ('A', 'C') : 6/36 ('B', 'B') : 4/36 ('B', 'C') : 4/36 ('C', 'C') : 1/36
Note: The processing of case 3 and 4 (sorted=True
) is usually more efficient than the other cases thanks to a special combinatorial algorithm due to Paul Moore. Note however that, in case 3 (replacement=False
), this special algorithm will be used ONLY if the probability distribution is uniform (actually, the most efficient algorithm is chosen by the draw
method, which checks whether the distribution is uniform or not); this is not the case in the example above but it applies fortunately in most "fair" games.
Revised probability distributions (given_prob
method)
The calculation of conditional probabilities using the given
method is fine when the information expressed in the condition is 100% sure. In some situations however, this information is uncertain but it can be associated to a precise probability value p. The goal is then to calculate a revised probability distribution so that the given condition has a probability p to occur. Another interpretation of this goal could be that you want to favor some occurrences of the prior probability distribution by rebalancing the probabilities.
The method given_prob(cond,p)
serves this purpose, where cond
is a condition expressed as a boolean distribution (exactly as given
method), p
is the probability that cond
is true.
As an example, suppose that from a die assumed to be fair initially, we note that it gives eventually the values 5 or 6, with a probability of 1/2. The revised distribution is calculated as follows:
>>> die = lea.interval(1,6,prob_type='r') >>> revised_die = die1.given_prob(die >= 5,'1/2') >>> die 1 : 1/8 2 : 1/8 3 : 1/8 4 : 1/8 5 : 2/8 6 : 2/8
The probability that has been set can be verified easily by evaluating the given condition on the new distribution:
>>> P(revised_die >= 5) 1/2
Note that the given_prob
method is a generalization of the given
method:
d.given(cond)
is equivalent tod.given_prob(cond,1)
d.given(~cond)
is equivalent tod.given_prob(cond,0)
Min / max functions (min_of
, max_of
functions)
In the "Applying functions" section, we have seen how to build a distribution taking the maximum (or minimum) values from other distributions. Lea provides the min_of
and max_of
methods, which do the same job:
>>> die1 = lea.interval(1,6,prob_type='r') >>> die2 = die1.new() >>> lea.min_of(die1,die2) 1 : 11/36 2 : 9/36 3 : 7/36 4 : 5/36 5 : 3/36 6 : 1/36 >>> lea.max_of(die1,die2) 1 : 1/36 2 : 3/36 3 : 5/36 4 : 7/36 5 : 9/36 6 : 11/36
Note however that the performance dramatically degrades when adding more arguments (exponential complexity). Consider for example the following statements that take the minimum value among 7 dice:
>>> dice = [die1.new() for i in range(7)] >>> lea.min_of(*dice) 1 : 201811/279936 2 : 61741/279936 3 : 14197/279936 4 : 2059/279936 5 : 127/279936 6 : 1/279936
This takes about 10 seconds to calculate (on a "decent" CPU bought in 2013).
Fortunately, there is a way to avoid this by enabling a more efficient algorithm (linear complexity), which is due to Nicky van Foreest (see Nicky's "Stochastic Makespans" page). Simply add the fast=True
argument:
>>> lea.min_of(*dice,fast=True)
The result is the same... but it is produced in the blink of an eye! However, there ain't no such thing as a free lunch... Putting fast=True
present a small drawback. Unlike most of Lea methods, the probability distributions returned by these two methods lose any dependency with given arguments; in, other words, they do not not comply with referential consistency. This could be important when using expressions with joint or conditional probabilities. For instance, the following results are WRONG:
>>> lea.min_of(die1,die2,fast=True).given(die1 > 4) 1 : 11/36 2 : 9/36 3 : 7/36 4 : 5/36 5 : 3/36 6 : 1/36 >>> P((lea.min_of(die1,die2,fast=True) == 1).given(die1 > 4)) 11/36
since each die1
occurrence is considered as an independent event, e.g. in the first expression, the given information die1 > 4
is ignored. To get the right results, simply remove the fast
argument or set it to False
.
To summarize, use fast=True
for best performance but take care to use it in simple expressions not requiring referential consistency, i.e. without other occurrences of inner arguments.
Indexing and slicing
When working with Python's strings, tuples or lists, it is common to extract subparts of these objects by using indexing or slicing operations (i.e. the brackets syntax). When such operation is operated on a Lea instance, it is propagated on the inner values, passing the given (slice) index. The result is a new probability distribution containing the indexed or sliced values.
To give an example, consider the following Lord of the Rings' characters microdatabase:
>>> character = lea.vals(('Gimli','dwarf'),('Bilbo','hobbit'),('Frodo','hobbit'),('Meriadoc','hobbit'), ('Pippin','hobbit'),('Gandalf','wizard'),('Boromir','man'),('Faramir','man'),prob_type='r') >>> character ('Bilbo' , 'hobbit') : 1/8 ('Boromir' , 'man' ) : 1/8 ('Faramir' , 'man' ) : 1/8 ('Frodo' , 'hobbit') : 1/8 ('Gandalf' , 'wizard') : 1/8 ('Gimli' , 'dwarf' ) : 1/8 ('Meriadoc', 'hobbit') : 1/8 ('Pippin' , 'hobbit') : 1/8
We can extract the distribution of names and races as follows (remind that first item has index 0 in Python) :
>>> name = character[0] >>> name Bilbo : 1/8 Boromir : 1/8 Faramir : 1/8 Frodo : 1/8 Gandalf : 1/8 Gimli : 1/8 Meriadoc : 1/8 Pippin : 1/8 >>> race = character[1] >>> race dwarf : 1/8 hobbit : 4/8 man : 2/8 wizard : 1/8
Other examples (aka "SQL, my precious! ") :
 What is the distribution of the last letter of names?
>>> name[1] c : 1/8 f : 1/8 i : 1/8 n : 1/8 o : 2/8 r : 2/8
 Who are the hobbits? (note how
name
andrace
are interdependent variables!)
>>> name.given(race == 'hobbit') Bilbo : 1/4 Frodo : 1/4 Meriadoc : 1/4 Pippin : 1/4
 What is the race of those having a name ending with "ir"?
>>> race.given(name[2:] == 'ir') man : 1
 Who is who?
>>> name + ' is a ' + race + '.' Bilbo is a hobbit. : 1/8 Boromir is a man. : 1/8 Faramir is a man. : 1/8 Frodo is a hobbit. : 1/8 Gandalf is a wizard. : 1/8 Gimli is a dwarf. : 1/8 Meriadoc is a hobbit. : 1/8 Pippin is a hobbit. : 1/8
 Who is the Chosen One?
>>> name.random() + ' is the Chosen One.' 'Frodo is the Chosen One.'
Note the character
distribution is nothing else than a joint probability distribution (JPD). The indexing technique demonstrated here could be used also to extract specific attributes from the inner tuples. Revisiting the example given in "Defining joint probability distributions, you could skip the definition of jpd
variable using the as_joint
method and do instead:
toothache = _jpd[0] catch = _jpd[1] cavity = _jpd[2]
These three variables shall then be interdependent; these could be used as demonstrated in "Defining joint probability distributions.
Object attributes and methods
The microDB example above can be redone using userdefined objects instead of tuples.
class C(object): def __init__(self,name,race): self.name = name self.race = race def __str__(self): return "%s (%s)"%(self.name,self.race) def health_points(self): return 1 + hash(self.name+self.race) % 10 character2 = lea.vals(C('Gimli','dwarf'),C('Bilbo','hobbit'),C('Frodo','hobbit'),C('Meriadoc','hobbit'), C('Pippin','hobbit'),C('Gandalf','wizard'),C('Boromir','man'),C('Faramir','man'),prob_type='r')
You get here a distribution having C
objects as domain, instead of tuples.
>>> character2 Faramir (man) : 1/8 Meriadoc (hobbit) : 1/8 Gimli (dwarf) : 1/8 Pippin (hobbit) : 1/8 Gandalf (wizard) : 1/8 Bilbo (hobbit) : 1/8 Boromir (man) : 1/8 Frodo (hobbit) : 1/8
Then, attributes can be extracted using the dot notation directly on the probability distribution.
>>> character.name Bilbo : 1/8 Boromir : 1/8 Faramir : 1/8 Frodo : 1/8 Gandalf : 1/8 Gimli : 1/8 Meriadoc : 1/8 Pippin : 1/8 >>> character.race dwarf : 1/8 hobbit : 4/8 man : 2/8 wizard : 1/8
Also, method invocation is feasible; this returns a new probability distribution with the returned results.
>>> character.health_points() 1 : 1/8 3 : 1/8 7 : 2/8 8 : 3/8 9 : 1/8 >>> character.name + ', with HP = ' + character.health_points().map(str) Bilbo, with HP = 10 : 1/8 Boromir, with HP = 4 : 1/8 Faramir, with HP = 1 : 1/8 Frodo, with HP = 5 : 1/8 Gandalf, with HP = 10 : 1/8 Gimli, with HP = 8 : 1/8 Meriadoc, with HP = 6 : 1/8 Pippin, with HP = 9 : 1/8
Now, the most watchful readers may have noticed an issue with this simple syntax. How Lea can distinguish between his own attributes/methods (like mean
, stdev
, given
, etc) and those to be relayed to the inner objects (like name
, race
, etc)? Well, the rule is simple: if the attribute/method name is unknown in Lea API, then it is propagated in the inner objects. Of course, this may cause worries in the case of homonyms, i.e. the inner objects have, by mischance, a name defined in Lea API; such attribute is not accessible with the simple technique since it is "hidden" by the Lea API. Should such situation occurs, there exists a trick using Lea's map
method and Python getattr
function. Assume for example, that the class C above has an attribute called mean
. Here is how to get the distribution of this attribute.
>>> character2.map(getattr,('mean',))
The ability to easily relay methods on inner objects a given probability distribution can be used for builtin Python objects. String's upper
method provides a trivial example:
>>> lea.set_prob_type('r') >>> lea.vals(*"alea").upper() A : 2/4 E : 1/4 L : 1/4
For a more useful example, suppose you throw 3 fair dice and you want to know the probability to get at least two 6's. We have seen how to use the draw
method to model the three throws:
>>> die = lea.interval(1,6) >>> three_dice = die.draw(3,replacement=True)
three_dice
contains 6x6x6 = 216 tuples, each containing 3 elements. To get the result, you can directly use the count
method defined on Python's tuples:
>>> count_6 = three_dice.count(6) >>> count_6 0 : 125/216 1 : 75/216 2 : 15/216 3 : 1/216 >>> P(count_6 >= 2) 2/27
Once you are familiar with all these techniques, you could get such result in a oneliner:
>>> P(lea.interval(1,6).draw(3,replacement=True).count(6) >= 2) 2/27
Reducing sequences of probability distribution (reduce_all
method)
Lea instances can be put in a Python container type, like lists and tuples. then, these can be "reduced" using a given 2ary function to produce a new Lea instance. Consider for example the following tuple containing three dices with 2, 4 and 6 faces:
>>> dice = tuple(lea.interval(1,n,prob_type='r') for n in (2,4,6)) >>> dice (1 : 1/2 2 : 1/2, 1 : 1/4 2 : 1/4 3 : 1/4 4 : 1/4, 1 : 1/6 2 : 1/6 3 : 1/6 4 : 1/6 5 : 1/6 6 : 1/6)
The display is not very clean because the tuple's display format mixes with Lea's. Now, suppose we throw the three dice together. To get the distribution of the sum of their values, the standard Python's sum
function can be used:
>>> sum(dice) 3 : 1/48 4 : 3/48 5 : 5/48 6 : 7/48 7 : 8/48 8 : 8/48 9 : 7/48 10 : 5/48 11 : 3/48 12 : 1/48
Fine,… but what if we want to get the product of the values of the three dice? There is no builtin function for this. You could do the job through a loop or by using the builtin reduce
function, which is only available in Python 2. To meet this need, Lea provides a dedicate function, reduce_all
, which treats the elements of an iterable of Lea instances according to a given 2ary function.
>>> import operator >>> Lea.reduce_all(operator.mul,dice) 1 : 1/48 2 : 3/48 3 : 2/48 4 : 5/48 5 : 1/48 6 : 5/48 8 : 5/48 9 : 1/48 10 : 2/48 12 : 6/48 15 : 1/48 16 : 3/48 18 : 2/48 20 : 2/48 24 : 4/48 30 : 1/48 32 : 1/48 36 : 1/48 40 : 1/48 48 : 1/48
The first argument of reduce_all
is typically a standard 2ary operator but any userdefined 2ary function is valid also, provided that it applies to the values of Lea instances present in the sequence.
Note that reduce_all
admits the optional argument absorber
(default: None
): you can specify the absorber value for the given function; for instance, we could set absorber=0
for the multiplication case above. The effect is that it activates a more efficient algorithm able to stop combinatorial as soon as the absorber is met. This can accelerate things dramatically. For instance, the following calculation takes several seconds:
>>> z = tuple(lea.bernoulli('1/4') for _ in range(16)) >>> Lea.reduce_all(operator.mul,z,absorber=0) 0 : 65535/65536 1 : 1/65536
In contrast, if you take care to specify the absorber value:
>>> Lea.reduce_all(operator.mul,z,absorber=0)
then you get the same result in a blink of an eye! Actually, the absorber is superefficient on this special example because there is a lot of zeroes: among the theoretical 2**16=65536 cases, it's able to remove everything except 2 cases.
Advanced calculation options (calc
method)
Suppose that you have defined a Lea instance x
with several dependencies to other instances. Remember that Lea works with lazy evaluation: when you execute statements like print (x)
, print (x.p(4)))
or avg = x.mean
, a calculation occurs behind the scene to evaluate the pmf of x
, before doing the specific required action. This calculation is done implicitly, in order to ease your life. You can however ask to perform this calculation explicitly at any time by making this call:
x.calc()
This performs the required calculations and returns a new instance with the calculated pmf. This instance is independent of any other variables, so it does the same as x.new()
. As such, this call is not very useful except if you want to control finely when the calculation is done. Now, the real interest is that calc
has three optional arguments that let you make some customization of the calculation algorithm. Before describing these arguments, beware that these are meant for "advanced" use: you have to handle them with care if you want to avoid any misinterpretation of the delivered results.
The full signature of the calc
method is:
calc(prob_type=1,sorting=True,normalization=True,bindings=None,memoization=True, algo=EXACT,nb_samples=None,nb_subsamples=None,nb_tries=None,exact_vars=None)
The first two arguments (prob_type
and sorting
) are described in the section dedicated to advanced constructor arguments. The three following arguments (normalization
, bindings
and memoization
) are described below. Note that normalization
has been presented earlier but it requires a special coverage here. The last five arguments (algo
, nb_samples
, nb_subsamples
, nb_tries
and exact_vars
) are covered in the section dedicated to MonteCarlo algorithms.
Avoiding normalization
The optional normalization
argument has the same purpose as explained for the normalization
argument in the pmf
method. By calling calc(normalization=False)
, the probabilities that are collected are NOT divided by their sum. This means that you can get a "pseudo" probability distribution, for which the probability sum is different from 1. This occurs in particular when you deal with conditional probabilities (given
method): the probability total you get is less than 1; the missing denominator is actually the probability of the given condition: in mathematical term: instead of getting P(AB) = P(A&B)/P(B), you get P(A&B).
This feature may seem uninteresting but, in some cases, it could be used to have explanations on calculated probabilities. Consider the BN example given earlier. Imagine you want to know the probability that BOTH John AND Mary calls. This is easy to calculate:
>>> P(john_calls & mary_calls) 0.002084100239
however, it's difficult to explain this calculation or to redo it manually. We have seen that a way to understand such result is to display the 16 atomic events (or "possible worlds") by using the joint
method:
lea.joint(earthquake,burglary,alarm,john_calls,mary_calls)
From this full joint probability distribution, you can now add the 8 probabilities where the last two variables are true; you should then obtain the same value as above. Since this treatment is cumbersome, you could use the given
method to make the filtering for you:
>>> lea.joint(earthquake,burglary,alarm,john_calls,mary_calls).given(john_calls & mary_calls) (False, False, False, True, True) : 0.23895323731595242 (False, False, True , True, True) : 0.30138246147957953 (False, True , False, True, True) : 1.4365911696438337e05 (False, True , True , True, True) : 0.28358309688769245 (True , False, False, True, True) : 0.00034033391807504135 (True , False, True , True, True) : 0.17515213192200016 (True , True , False, True, True) : 2.3991168497726016e08 (True , True , True , True, True) : 0.0005743485738355601
Unfortunately, the probabilities are not the same as in the previous joint table: the new joint table has been normalized to balance the entries that have been filtered out. Here comes the normalization=False
trick:
>>> lea.joint(earthquake,burglary,alarm,john_calls,mary_calls).given((john_calls & mary_calls)).calc(normalization=False) (False, False, False, True, True) : 0.0004980024990000001 (False, False, True , True, True) : 0.0006281112599999999 (False, True , False, True, True) : 2.994000000000003e08 (False, True , True , True, True) : 0.0005910156 (True , False, False, True, True) : 7.0929e07 (True , False, True , True, True) : 0.0003650346 (True , True , False, True, True) : 5.0000000000000054e11 (True , True , True , True, True) : 1.197e06
What you get is NOT a true probability distribution; it's just a truncated probability distribution! If you want now to sum the 8 probabilities, simply append .p_sum
:
>>> lea.joint(earthquake,burglary,alarm,john_calls,mary_calls).given((john_calls & mary_calls)).calc(normalization=False).p_sum 0.0020841002389999997
This sum is then the probability that John and Mary calls together, as calculated before. So, to summarize, what we have done here is mimicking the internal calculations made internally by Lea's algorithm.
Making explicit bindings
In many cases, the evidences passed to the given method are equalities. This is sometimes called "observations": variable x
has been observed to be a given value u
, variable y
has been observed to be a given value v
, etc. For example, consider this situation which shows the probability distributioon of the sum of two dice given the observation of one of these die:
>>> d1 = lea.interval(1,6) >>> d2 = lea.interval(1,6) >>> d = d1 + d2 >>> d.given(d1 == 1) 2 : 0.16666666666666663 3 : 0.16666666666666663 4 : 0.16666666666666663 5 : 0.16666666666666663 6 : 0.16666666666666663 7 : 0.16666666666666663
Actually, since the Lea algorithm is designed to cover any evidence condition, it shall for enumerate all the 36 possible combination of values of (d1,d2)
for the variables and evaluates for each of them whether the condition d1 == 1
is true or false. This works fine but there is some waste of CPU to make this work, especially if the number of the observed variables is getting big (remember also that their domain can be nonboolean, with unlimited number of elements). Fortunately, this waste can be avoided for elementary variables (those defined by a known prior pmf, like d1
and d2
). The technique is called explicit bindings: this consists in specifying, in the bindings
argument, a dictionary associating the observed variables with the their values. This is demonstrated in the following expression:
>>> d.calc(bindings={d1:1})
This returns the same result as previously but by browsing only 6 cases, which are the possible values of d2
. Note that the specified binding holds for the query only, so d1
is unbound as soon as the result has been calculated. A more extreme example is to bind all elementary variables:
>>> d.calc(bindings={d1:1,d2:4}) 5 : 1.0
In such case, the enumeration is limited to one single case.
Such technique could be efficient when the number of cases to examine is large. Remember however that it can be used only for elementary variables; an exception is raised if you try to bind a derived variable like d
.
Disabling memoization
TO BE WRITTEN
Information Theory
Lea allows calculating several measures defined by information theory. These could provide new insights on your probability distributions.
Information
The most basic concept is the information carried by a given event. It is expressed in bits. As a basic example, consider two coins, a fair one and an unfair one:
>>> flip = lea.vals('head','tail') >>> flip_u = lea.pmf({'head': 1/4, 'tail': 3/4})
Here is how to get the information carried by each possible result, after flipping the fair coin :
>>> flip.information_of('head') 1.0 >>> flip.information_of('tail') 1.0
Each result carries an information of 1 bit, as expected by the theory. Now, here are the information values for the unfair coin :
>>> flip_u.information_of('head') 2.0 >>> flip_u.information_of('tail') 0.4150374992788437
We see that the information carried by a 'head' result is higher (2 bits) than the information carried by 'tail'. This is sensible since a 'head' is less likely to occur (prob. 1/4) than a 'tail' (prob. 3/4), hence it carries more information.
For boolean probability distributions, note that .information_of(True)
can be replaced by .information
, which is a convenient shortcut. Example:
>>> rain = lea.event(1/8) >>> rain.information 3.0 >>> (flip_u=='head').information 2.0
Entropy
If you take the average information carried by all events modeled by a given probability distribution, using probabilities of these events as weights, you get the entropy of this distribution. The entropy is one of the most important concept of information theory. It captures the "degree of randomness" of a given probability distribution.
Here is how to get the entropy of each defined coin:
>>> flip.entropy 1.0 >>> flip_u.entropy 0.8112781244591328
The entropy of the fair coin is 1 bit, which is not surprising since we have seen that each possible outcome brings 1 bit of information. The entropy of the unfair coin is about 0.81 bits; this is less than 1 bit, which is consistent with the fact that the unfair coin is less random than the fair one. Note that the theory states that 1 bit is the maximum entropy for a distribution with two values; it happens when the two values are equally likely.
Entropy and conditional probabilities
The information and entropy can be calculated on any Lea instance, whether defined explicitly as above or resulting of calculations, as explained throughout the tutorials. In particular, these indicators can be calculated on conditional probability distributions.
Let us take an example. There are 64 balls in an urn. 62 of these balls are blue and marked with a "x"; 2 of these balls are red: one is marked with an "x", the other with "y". Here is a short model of the distribution of these balls:
>>> ball = lea.pmf({ 'Bx': 62, 'Rx': 1, 'Ry': 1 },prob_type='r') >>> ball Bx : 62/64 Rx : 1/64 Ry : 1/64
Note that it's easy to extract the color code or the mark, using the expression ball[0]
or ball[1]
respectively (see "Indexing and slicing"). Let us create two new variables to ease the work:
>>> color = ball[0] >>> mark = ball[1] >>> color B : 31/32 R : 1/32 >>> mark x : 63/64 y : 1/64
Let us calculate the entropy of ball
:
>>> ball.entropy 0.23187232431271465
It is quite low since it's almost certain that a drawn ball is Blue, with a "x" mark.
Suppose now that someone draws a ball and tell us some information on its color or mark (remember that the three variables ball
, color
and mark
are interdependent due to referential consistency). Let us calculate the new entropy in the following cases:
 What is the entropy if the ball's mark is "x"?
>>> ball.given(mark=='x').entropy 0.11759466565886476
 What is the entropy if the ball's mark is "y"?
>>> ball.given(mark=='y').entropy 0.0
 What is the entropy if the ball's color is blue?
>>> ball.given(color=='B').entropy 0.0
 What is the entropy if the ball's color is red?
>>> ball.given(color=='R').entropy 1.0
We see in the three first cases that the given information lower the entropy, considering the initial value of 0.23 bits. In two cases, the entropy becomes null: this happens because the new information dissolve any uncertainty on the mark and color. In the last case however, the information that the ball is red increases the entropy to 1 bit. This value is consistent with the uncertainty remaining on the mark of the ball, which is equally likely "x" or "y". It may sound odd that new information makes the entropy increase. Actually, there's nothing wrong here, it's simply unusual. The initial unbalanced distribution is such that the reception of unlikely information ("the ball is red") can reduce the prior beliefs we had.
Mutual information, joint entropy and conditional entropy
When dealing with two dependent random variables, it's interesting to know the amount of information they share. This is the purpose of the mutual information, aka transinformation. The examples below use the ball
, color
and mark
interdependent distributions of our last case.
>>> lea.mutual_information(color,mark) 0.08486507530476972 >>> lea.mutual_information(color,ball) 0.20062232431271465 >>> lea.mutual_information(mark,ball) 0.11611507530476972
Note that the mutual information is expressed in bits.
As a stupid example, we can verify that our ball lottery and coin flipping do not share anything:
>>> lea.mutual_information(ball,flip) 0.0
Note that the mutual information is symmetric, so the method call is unaffected by the order of its arguments.
Another interesting measure is the joint entropy of a set of random variables. It gives the entropy of the joint of these random variables, taking into account possible interdependence between them. Here are two basic examples.
>>> lea.joint_entropy(mark,color) 0.23187232431271465 >>> lea.joint_entropy(mark,flip) 1.1161150753047697
The first result shows that the joint entropy of mark
and color
is the same as the basic entropy of ball
; this is not surprising: a given ball is completely described by its mark and color, so there has exactly the same level information that the couple of these two characteristics. In the second example, we join mark
and flip
, since these are independent events, we get the sum of the individual entropies (0.1161 bits + 1.0 bits). You can specify any number of arguments in lea.joint_entropy
. Note that if you specify only one argument, then you get the basic entropy of this argument.
Another measure is the conditional entropy: what is the entropy of x given that y is true? This is also known as equivocation of x about y. Here are two basic examples of usage:
>>> ball.cond_entropy(mark) 0.11575724900794494 >>> mark.cond_entropy(ball) 0.0
The first result shows that the (unconditional) entropy of ball, previously calculated as 0.2319 bits, decreases to 0.1158 bits given that we know the mark on the ball. The second result shows that the entropy of the mark is null given that we know the drawn ball: this is consistent since there is no uncertainty on the mark given a ball. Note that specifying True
(or any relationship that is certainly true) as argument of cond_entropy
boils down to calculate the basic entropy.
Cross entropy and Kullback–Leibler divergence
WARNING: requires Lea 3.2 at least
For completing the topic of entropyrelated measures in Lea, we shall introduce two more indicators that are specially useful for Machine Learning. These are the crossentropy and the Kullback–Leibler divergence (also known as relative entropy). Both indicators allow to compare two probability distributions having same support by providing a measure of how far one fits the other. Assuming that d is the actual observed distribution and e is an estimated distribution, these are measures of how good e fits to d.
Basically, the cross entropy of d with e makes an average of information given by e using as weights the probabilities of d. For example, we could examine how given probability distributions fits the color
distribution defined above ({B: 31/32, R: 1/32}):
>>> estimated_color1 = lea.pmf({ 'B': 20, 'R': 12 },prob_type='r') >>> color.cross_entropy(estimated_color1) 0.7011020799303316 >>> estimated_color2 = lea.pmf({ 'B': 30, 'R': 2 },prob_type='r') >>> color.cross_entropy(estimated_color2) 0.2151997355042477
Note that the cross entropy is not symmetric, i.e. cross entropy of d with e is not generally the same as cross entropy of e with d. When e is exactly the same as d, the cross entropy is minimal and is equal to the entropy of d. Therefore, the cross entropy can be seen as a generalization of the entropy, since the entropy of a given probability distribution is the same as the cross entropy of this distribution with itself.
>>> color.cross_entropy(color) 0.20062232431271465 >>> color.entropy 0.20062232431271465
Since the cross entropy tends to entropy as estimated distribution becomes closer to its target, it's tempting to calculate the difference between these two indicators. This is precisely what the Kullback–Leibler divergence does, which is accessible in Lea through kl_divergence
method.
>>> color.kl_divergence(estimated_color1) 0.5004797556176169 >>> estimated_color2 = lea.pmf({ 'B': 30, 'R': 2 },prob_type='r') >>> color.kl_divergence(estimated_color2) 0.014577411191533052
We see that KL divergence (aka relative entropy) offers a more objective measure of fitness than cross entropy since it tends towards 0 as the estimation becomes closer to its target, whatever target entropy.
For more practical examples of usage, you may have a look on section dedicated to Machine Learning using EM algorithm  case 3.
Likelihood ratio (LR)
The likelihood ratio (LR) of an evidence E given an hypothesis H is defined as
P(E  H) LR =  P(E  not H)
It indicates how far the given evidence E confirms the given hypothesis H. It is traditionally used in medical research and forensic sciences.
As a toy problem, imagine that a die has been thrown but we have very limited ways to gather information on the result. Let us assume that our hypothesis is "the die result is even". After investigations, we get an evidence: the result is not 1. What is the LR of this evidence for our hypothesis? The lea.lr(E,H)
syntax makes this calculation:
>>> die = lea.interval(1,6) >>> parity = die.map(lambda x: 'even' if x%2==0 else 'odd') >>> lea.lr(die != 1, parity == 'even') 1.5
You could check this result, by translating the LR definition above in other Lea primitives:
>>> P((die != 1).given(parity == 'even')) / P((die != 1).given(parity != 'even')) 1.5
Since LR > 1, the given evidence is indeed somehow relevant for the hypothesis (as our intuition suggests!).
Notes: the following rules are enforced, in this sequence:
 if E is impossible, then an exception is raised (LR = 0/0)
 if H is impossible, then an exception is raised (P(E  H) undefined)
 if H is certain, then an exception is raised (P(E  not H) undefined)
 if H is impossible given E, then LR = 0.0
 if H is certain given E, then LR = +infinity (Python's
math.inf
)
Examples:
>>> lea.lr(die == 3, parity == 'even') 0.0 >>> lea.lr(die == 2, parity == 'even') inf
There is another syntax to calculate the LR: E.given(H).lr()
; this flavor emphasizes the numerator of the LR expression and provides same result as the syntax seen above. It may be handier to express an hypothesis that is a conjunction of several subhypothesis, which can be put as arguments of given(…)
:
>>> (die >= 2).given(parity == 'even').lr() 1.5 >>> (die >= 2).given(parity == 'even', die <= 5).lr() 1.3333333333333333
Symbolic computation with probabilities
We have seen that Lea can handle probabilities expressed as variable names (p, q, …), provided that the SymPy library is installed. This approach is commonly referred as symbolic computation. All the techniques for probabilistic calculation seen above in the tutorial can be revisited with probability variables in place of probability figures.
Such approach is useful when the same query is made over and over on the same complex model, with only varying prior probability values. In such case, the query result may be prepared "offline" by Lea, once for all, into an arithmetic expression; this may take maybe a long processing time; then, this expression can be copypasted in a dedicated program to be evaluated many times using fast arithmetic computations, with different input probability values. This dedicated program could be written in any programming language: Python, C, C++, Java, etc, without the need of any probability library.
Let's start with basic examples: let's call p the probability that number of heads obtained by flipping one coin and q the probability that it shall rain tomorrow:
>>> flip = lea.bernoulli('p') >>> rain = lea.event('q') >>> flip 1 : p 0 : p + 1 >>> rain True : q False : q + 1
We can now combine these two independent variables to calculate probabilities of events, defined by conjunction or disjunctions:
>>> (flip == 0) & rain False : p*q  q + 1 True : q*(p  1) >>> (flip == 0)  rain False : p*(q  1) True : p*q  p + 1
Since these probability distributions have a Boolean domain, we can get the probability of True
by using the P
function.
>>> P = lea.P >>> P((flip == 0)  rain) p*q  p + 1
The object returned by P
is a SymPy expression (more precisely, an instance of some subclass of SymPy's Expr
). You can then apply methods provided by SymPy, like substituting some variable(s) using the subs
method:
>>> P((flip == 0)  rain).subs('p',0.5) 0.5*q + 0.5 >>> P((flip == 0)  rain).subs((('p',0.5),('q',0.2))) 0.600000000000000 >>> P((flip == 0)  rain).subs('p','q') q**2  q + 1
Beside booleans, we can also calculate probability distributions.
>>> flip  lea.bernoulli('q') 1 : q*(p  1) 0 : 2*p*q  p  q + 1 1 : p*(q  1) >>> flip4 = flip.times(4) >>> flip4 0 : (p  1)**4 1 : 4*p*(p  1)**3 2 : 6*p**2*(p  1)**2 3 : 4*p**3*(p  1) 4 : p**4
Remind that flip4
probability distribution is also known as the binomial distribution, which you can build using a dedicated function:
>>> flip4 = lea.binom(4,'p')
You can then evaluate the probability of some conditions using the usual Lea syntax:
>>> P((1 <= flip4) & (flip4 < 4)) 2*p*(p  1)*(p**2  p + 2) >>> P(flip4.is_any_of(1,2,3)) 2*p*(p  1)*(p**2  p + 2)
and obtain standard indicators:
>>> flip4.mean 4*p >>> flip4.var 4*p*(p  1) >>> flip4.entropy (p**4*log(p**4) + 4*p**3*(p  1)*log(4*p**3*(p  1))  6*p**2*(p  1)**2*log(6*p**2*(p  1)**2) + 4*p*(p  1)**3*log(4*p*(p  1)**3)  (p  1)**4*log((p  1)**4))/log(2)
Note that all these expressions are calculated, using the same algorithms than those used for usual numerical probabilities. Lea just calls the sympy.factor
function as a posttreatment in order to simplify the resulting expressions as much as possible. You can notice that the calculated mean 4*p
agrees with the theoretical formula for the mean of the binomial.
To make variable substitution in a probability distribution that is nonboolean, you simply have to call subs
method on that distribution: this call is propagated, with the given arguments, to all inner probability expressions:
>>> flip4.subs('p',0.25) 0 : 0.316406250000000 1 : 0.421875000000000 2 : 0.210937500000000 3 : 0.0468750000000000 4 : 0.00390625000000000
Actually, you could check that this last distribution boils to the same distribution as
>>> lea.binom(4,0.25)
The sole difference is that the probability type is slightly different between the two distributions: it's SymPy.Float
for the former while it's Python's float
for the latter.
So far, we have defined symbolic probabilities by means of strings of characters. Behind the scene, these strings are converted to SymPy symbols thanks to sympy.symbols
function. In order to have better control on these variables, you could also build these symbols yourself and pass them to Lea constructors in place of strings. Here is an example:
>>> import sympy >>> (p,q) = sympy.symbols('p q') >>> lea.binom(2,p) + lea.binom(2,q) 0 : (p  1)**2*(q  1)**2 1 : 2*(p  1)*(q  1)*(2*p*q  p  q) 2 : 6*p**2*q**2  6*p**2*q + p**2  6*p*q**2 + 4*p*q + q**2 3 : 2*p*q*(2*p*q  p  q) 4 : p**2*q**2
So far, we have not yet defined a probability distribution using the pmf
function. Let's give it a try:
>>> lea.pmf({"A": 'pA', "B": 'pB', "C": 'pC'}) A : pA/(pA + pB + pC) B : pB/(pA + pB + pC) C : pC/(pA + pB + pC)
You can see that the probability variables have been normalized to ensure that the sum equals 1. This is somehow unpleasant because there is actually 2 degrees of freedom, not 3. You could then prefer to remove pC
as it can be derived from pA
and pB
. As explained, in the section dedicated to normalization, replacing pC
by None
will do the job:
>>> lea.pmf({"A": 'pA', "B": 'pB', "C": None}) A : pA B : pB C : pA  pB + 1
Note that you could achieve the same result using a more explicit approach:
>>> (pA,pB) = sympy.symbols('pA pB') >>> lea.pmf({"A": pA, "B": pB, "C": 1(pA+pB)})
Symbolic computation can handle more complex problems like conditional probabilities and Bayes reasoning. Let us revisit our first example of Bayes network, replacing all the 6 probability values by variables:
burglary = lea.event('a') earthquake = lea.event('b') alarm = lea.joint(burglary,earthquake).switch({ (True ,True ) : lea.event('c'), (True ,False) : lea.event('d'), (False,True ) : lea.event('e'), (False,False) : lea.event('f') }) john_calls = alarm.switch({ True : lea.event('g'), False : lea.event('h') }) mary_calls = alarm.switch({ True : lea.event('i'), False : lea.event('j') })
We can now query the BN as before, getting formulas instead of numbers:
>>> pb = P(burglary.given(mary_calls & john_calls)) >>> pb a*(b*c*g*i  b*c*h*j  b*d*g*i + b*d*h*j + d*g*i  d*h*j + h*j)/(a*b*c*g*i  a*b*c*h*j  a*b*d*g*i + a*b*d*h*j  a*b*e*g*i + a*b*e*h*j + a*b*f*g*i  a*b*f*h*j + a*d*g*i  a*d*h*j  a*f*g*i + a*f*h*j + b*e*g*i  b*e*h*j  b*f*g*i + b*f*h*j + f*g*i  f*h*j + h*j)
To gain confidence in this long formula, you could substitute the 6 variables by their actual values given earlier and check the result:
>>> values = (('a',0.001),('b',0.002),('c',0.95),('d',0.94),('e',0.29),('f',0.001),('g',0.9),('h',0.05),('i',0.7),('j',0.01)) >>> pb.subs(values) 0.284171835364393
You can similarly check results provided for the other queries, by simply appending .subs(values)
on them.
As you can expect it from a symbolic computation system, you are free to fix only a subset of variables: you will get then a new formula with the remaining free variables:
>>> pb.subs((('a',0.001),('b',0.002),('c',0.95),('d',0.94),('e',0.29),('f',0.001))) 0.001*(0.94002*g*i + 0.05998*h*j)/(0.002516442*g*i + 0.997483558*h*j)
As explained in the introduction, the returned formula could be copypasted in a program, which is completely unaware of Bayes networks but which is fast for numbercrunching. Note also that SymPy allows reformatting such formula into various formats, like LaTeX:
>>> print(latex(pb)) \frac{a \left(b c g i  b c h j  b d g i + b d h j + d g i  d h j + h j\right)}{a b c g i  a b c h j  a b d g i + a b d h j  a b e g i + a b e h j + a b f g i  a b f h j + a d g i  a d h j  a f g i + a f h j + b e g i  b e h j  b f g i + b f h j + f g i  f h j + h j}
You can then easily embed nice formula in your papers:
MonteCarlo sampling
Random generation through MC (random_mc
method)
We have seen in the first part of the tutorial how to generate random samples with the random(…)
method. An important point must be known on the way this method works. Consider a set of 9 different "dice", with number of faces ranging from 2 to 10. Let us make 20 throws of these 9 dice and get the sum:
dice = tuple(lea.interval(1,n) for n in range(2,11)) sum_dice = sum(dice) print (sum_dice.random(20)) # > (32, 26, 33, 31, 28, 39, 32, 31, 38, 26, 26, 33, 33, 27, 26, 27, 33, 28, 47, 25)
You will note that the generation of the random sample takes some times. Why? Actually, what happens is that, behind the scene, the actual probability distribution of sum_dice
is calculated (a heavy combinatorial process), which produces a resulting probability distribution; then, from this instance, the 20 random samples are obtained. So, what actually happens is equivalent to this:
calculated_sum_dice = sum_dice.calc() print (calculated_sum_dice.random(20)) # > (32, 26, 33, 31, 28, 39, 32, 31, 38, 26, 26, 33, 33, 27, 26, 27, 33, 28, 47, 25)
The heavy calculation is done by the calc()
call (see below). Note that, because of a cache mechanism, any further call to sum_dice.random(...)
shall be very fast (as fast as calculated_sum_dice.random(...)
, actually).
In several cases, using this approach just to get random samples is intractable because of combinatorial explosion. The random_mc(…)
method proposes an alternative implementation, which tackles this limitation. Let's revisiting the same example:
sum_dice2 = sum(dice) sum_dice2.random_mc(20) # > (37, 35, 38, 34, 29, 31, 32, 21, 31, 30, 35, 39, 35, 40, 31, 22, 30, 29, 39, 31)
This now produces random samples without any latency. How does it work? random_mc
does not try to calculate the final probability distribution; it just make random sample on each elementary distribution, then it aggregates the results together. In the current case, for each of the 20 trials, it throws the 9 dice and it sums up the results. This is closer to the real process since it's basically a simulation of this process.
random_mc
can handle any type of Lea construct, including those specifying a condition:
print (sum_dice2.given(sum_dice2 <= 24).random_mc(20)) # > (24, 18, 18, 20, 21, 23, 24, 21, 23, 24, 22, 22, 24, 24, 24, 23, 24, 21, 22, 24)
Take care that these examples does NOT imply that the random_mc
method is the real McCoy. The random
and random_mc
methods have actually a complimentary usage:
random
tends to be better when the end values result from long calculations and/or when the combinatorial is tractable and/or the random sample to produce is large;random_mc
tends to be better when the end values result from short calculations and/or when the combinatorial is intractable and/or the random sample to produce is small.
Approximate inference by MonteCarlo approaches
The random_mc(…)
method may serve another purpose than pure random sample generation. It may be used to estimate the resulting probability distribution based on random samples. By default, Lea calculates exact probability distributions; this works very fine for all examples shown in previous tutorial pages. However, many realworld probabilistic problems have a large sample space and require a prohibitive time to be solved in an exact way. This is especially the case for large or densely connected Bayesian networks. For such intractable problems, Lea provides several MonteCarlo algorithms that are able to produce approximate results in a reasonable time, with precision tunable by giving a number of random samples.
Let's start by using the random_mc(…)
tool you have just learned. You may estimate the probability distribution of the sum of 9 dice seen above, based on a sample of 10,000 trials:
estimated_sum_dice = lea.vals(*sum_dice2.random_mc(10000)) print (estimated_sum_dice) # > 13 : 0.0002 14 : 0.0002 15 : 0.0007 16 : 0.0016 17 : 0.0017 18 : 0.0039 19 : 0.007 …
One of the interest is that you can tune the sample size as you wish according to the precision you want. Since this approach is very useful, Lea provides a convenience method to do the job more easily, namely estimate_mc(…)
. The following statement
estimated_sum_dice = sum_dice2.estimate_mc(10000)
is equivalent to the one shown above.
This method of estimating a probability distribution by MC sampling is very useful in many situations having large sample space size, like querying a complex Bayesian network. The next subsection elaborates this theme, providing more algorithms and options.
The calc
method and MonteCarlo algorithms
The estimate_mc
seen above is actually a convenience method, which is simple to use. Lea provides actually several probabilistic inference algorithms, which are accessible through the x.calc(...)
method, where x
is any Lea instance. Before coveing the MC algorithms themselves, some explanations are needed on this calc
method. Suppose that you have defined a Lea instance x
with several dependencies to other instances. When you execute statements like print (x)
, print (x.p(4)))
, avg = x.mean
or sample = x.random(1000)
, a calculation occurs behind the scene to evaluate the pmf of x
, before doing the specific required action. This calculation is done implicitly, in order to ease your life. You can however ask to perform this calculation explicitly at any time by making this call:
x.calc()
This performs the required calculations and returns a new instance with the calculated pmf. This instance is independent of any other variables, so it does the same as x.new()
. As such, this call is not very useful except if you want to control finely when the calculation is done. Now, the real interest is that calc
has several optional arguments that let you make fine customization of the calculation algorithm.
Let us consider the example of Bayesian network on Wikipedia.
from lea import * rain = event(0.20) sprinkler = if_(rain, event(0.01), event(0.40)) grass_wet = joint(sprinkler,rain).switch({ (False,False): False, (False,True ): event(0.80), (True ,False): event(0.90), (True ,True ): event(0.99)})
We shall define the following query, which represents the conditional probability distribution P(rain  grass wet):
query = rain.given(grass_wet)
At this stage, nothing is calculated yet. The following three statements are equivalent and produce the exact result:
print (P(query)) print (P(query.calc())) print (P(query.calc(algo=EXACT))) # > 0.35768767563227616
Now, let's imagine that calculating the exact probability of query
is intractable (i.e. the calculation takes ages to terminate). As a fallback, you want to obtain an approximate result using a MonteCarlo algorithm. The easiest option is to use the rejection sampling algorithm. Here is how to proceed for 10,000 samples:
WARNING: requires Lea 3.3 at least
print (P(query.calc(algo=MCRS, nb_samples=10000))) # > 0.3585
Note: This statement is actually equivalent to P(query.estimate_mc(10000))
, in a more pedantic style.
Actually, the rejection sampling has the drawback to be inefficient if the given condition (here, grass_wet
) has a small probability to occur: the algorithm uses most of the processing time to find bindings satisfying the condition. The following statements, with other argument options, provide useful alternatives.
This variant generates 10 subsamples each time a binding satisfying the condition grass_wet
is satisfied (speeding up the calculation):
print (P(query.calc(algo=MCRS, nb_samples=10000, nb_subsamples=10))) # > 0.3595
This variant uses the weighting likelihood algorithm, using the exact algorithm for the condition (grass_wet
) and MC for the conditioned part (rain
):
print (P(query.calc(algo=MCLW, nb_subsamples=10000))) # > 0.357687675632294
Actually, the last statement is a bit absurd for the present query. Since rain
is referred in grass_wet
, it is embedded in the exact calculation, so this calculation boils down to the EXACT algorithm. Actually, setting nb_subsamples=1
is enough to get the exact result (without rounding errors caused by summation of 10,000 probabilities). Actually, the weighting likelihood algorithm would make sense for queries as:
print (P(grass_wet.given(rain).calc(algo=lea.MCLW, nb_subsamples=10000))) # > 0.8066000000000276
What happens here is that the MC is performed in two parts: 1° random samples grass_wet
with fixed rain=True
(probability weight 0.2), 2° random samples grass_wet
with fixed rain=False
(probability weight 0.8)
The full signature of the calc
method is:
calc(prob_type=1,sorting=True,normalization=True,bindings=None,memoization=True, algo=EXACT,nb_samples=None,nb_subsamples=None,nb_tries=None,exact_vars=None)
We only cover here the last five arguments, viz. algo
, nb_samples
, nb_subsamples
, nb_tries
, exact_vars
; the other arguments can safely keep their default values for the present needs. Note that the first two arguments (prob_type
and sorting
) are described in the section dedicated to advanced constructor arguments; the three following arguments normalization
, bindings
, and memoization
are described in this section.
algo
(default:EXACT
): four algorithms are available (note that these are available as variables in thelea
module, so you have to add the prefixlea.
if you import no more than the module):EXACT
: calculates the exact probability distribution using the Statues algorithm; for such choice, the remaining arguments shall not be used;MCRS
(MonteCarlo Rejection Sampling): calculates an approximate probability distribution following the MC rejection sampling algorithm onnb_samples
random samples; if self is an Ilea instance, i.e. evaluating a conditional probabilityx.given(e)
, then the algorithm may be speed up by specifying thenb_subsamples
argument, which shall be a divisor ofnb_samples
; each time the conditione
is satisfied,nb_subsamples
random samples are taken on the conditioned partx
instead of a single one; specifyingnb_subsamples
is especially valuable if the condition has a small probability;MCLW
(MonteCarlo Likelihood Weighting): this requires thatself
is an Ilea instance, i.e. a conditional probabilityx.given(e)
; this algorithm calculates an approximate probability distribution by making first an exact evaluation of the conditione
using the Statues algorithm; then, for each binding that verifies the condition with some probability p, it makesnb_subsamples
random samples on the conditioned partx
, assigning a weight p to these samples; this algorithm is especially valuable if the condition has a small probability while its exact evaluation is tractable; this algorithm accepts also an optionalexact_vars
argument, to include given variables in the exact evaluation, beyond these already referred in the condition (see MCEV);MCEV
(MonteCarlo Exact Variables): calculates an approximate probability distribution by making first an exact evaluation of the variables given inexact_vars
using the Statues algorithm; for each binding found with some probability p, it makesnb_subsamples
random samples on remaining (unbound) variables, assigning a weight p to these samples; MCEV algorithm cannot handle expressions under condition, i.e.x.given(e)
; MCLW shall be used instead;
nb_samples
(default:None
): number of random samples made for MCRS algorithm;nb_subsamples
(default:None
): only for MCRS and MCLW algorithms and if self is an Ilea instance, i.e. a conditional probabilityx.given(e)
; it specifies the number of random samples made onx
for each binding verifying the conditione
; for MCRS,nb_subsamples
is optional, if specified it shall be a divisor ofnb_samples
; for MCLW,nb_subsamples
is mandatory;nb_tries
(default:None
): if notNone
, defines the maximum number of trials in case a random value is incompatible with a condition; this happens only if the current Lea instance is an Ilea instancex.given(e)
or is referring to such instance; for MCLW algorithm onx.given(e)
, it only applies onx
, should it refers to Ilea instances, sincee
is evaluated using the exact algorithm; if a condition cannot be satisfied afternb_tries
tries, then an error exception is raised; WARNING: ifnb_tries
isNone
, any infeasible condition shall cause an infinite loop;exact_vars
(default:None
): only for MCEV algorithm: an iterable giving the variables referred inself
that shall be evaluated by using the exact algorithm, the other ones being subject to random sampling;
On choosing the right algorithm and options...
EXACT is the default algorithm; it is the recommended algorithm for all tractable problems; it allows in particular to work with probability fractions and symbols. For untractable problems, the three other algorithms offer fallback options. MCRS algorithm with sole nb_samples
argument is the easiest option; choosing the value for nb_samples
is a matter of tradeoff between result accuracy and execution time. If the evaluated expression contains conditions having low probabilities, then the MCRS algorithm may be inefficient: as a rejection sampling algorithm, it may use most of processing time to find bindings satisfying the condition. For improving the efficiency, the nb_subsamples
argument can be used: this allows making multiple random samples each time the condition is met, instead of a single one; the samples are generated until the condition has been satisfied n times, with n = nb_samples
/nb_subsamples
. For a given nb_samples
, increasing nb_subsamples
shall speed up the calculation; however, the result accuracy may tend to decrease if the condition is not visited enough (e.g. choosing nb_subsamples
= nb_samples
will satisfy the condition with only one binding). MCLW algorithm with mandatory nb_subsamples
argument is the best choice if the evaluation of the conditioned part is untractable meanwhile the condition is tractable (whatever its probability); every binding verifying the condition is covered and, for each one, nb_subsamples
random samples are generated, weighted by the binding's probability. MCEV algorithm is suited for untractable problems, from which a set of variables v1, ... , vn can be evaluated in a reasonable time or, in other words, if joint(
v1 ,
... ,
vn)
is tractable; if this set of variables is specified in exact_vars
argument, then all their value combinations are browsed systematically by the exact algorithm while random sampling is done for other variables.
How to optimize probabilistic inference?
TO BE WRITTEN
 use floatingpoint representation
 use symbolic computation
 use
times
method  use multiple arguments in
given
method  explicit bindings
 replace
cpt
byswitch
orif_
method  cascaded CPT (contextual independence)
 calculate and save independent results
 use MC sampling
The Statues algorithm
The exact calculation of probabilities in Lea is made by an original algorithm, named the Statues algorithm. The present section provides a short introduction on this algorithm. For more indepth coverage, references are provided at the end.
The What…
The Statues algorithm performs exact probabilistic inference, by opposition to approximate algorithms like the MonteCarlos approaches seen above. This algorithm is limited strictly to discrete probabilities. The name "Statues" is borrowed from the popular Statues game; other names include "Red Light, Green Light" (US), "Grandmother's Footsteps" (UK), "123, SOLEIL!" (France) and "123, PIANO!" (Belgium).
Contrasting to several wellestablished exact algorithms, the Statues algorithm is able to deal with several probabilistic modeling approaches: as seen in the present tutorial, it can handle in particular probabilistic arithmetic, conditional probabilities, JPD, CPT, BN, Markov chains. Also, different modeling approaches can be seamlessly mixed together to provide more expressive models. For instance, assuming two BN variables x, y represent numbers, a new variable z = x + y can be created to represent their sum; this new variable z could then be queried to get its probability distribution given some evidence; also, z could be used itself to express some evidence. So, the Statues algorithm has the ability to unify several probabilistic modeling approaches.
Before going further, it's important to understand what are the input and output of the algorithm.

In input, it requires a given node from a given direct acyclic graph (DAG); this DAG defines how each variable depends from the others, up to "atomic" variables that are defined by simple explicit pmf.

In output, the algorithm returns the pmf of the given node.
The dependency relationships expressed in the input DAG can be arithmetic operators, logical operators, userdefined functions, CPT constructs, etc. This DAG is constructed onthefly by Lea, starting from prior pmf and following the different techniques shown in the present tutorial. Note that the DAG construction is out of the scope of the Statues algorithm; it's just a pretreatment process that builds a data structure, following recursive compositional rules. Sometimes, this is referred as "lazy evaluation" because the actual evaluation is done later, when needed.
If any node has no more than one parent, then the DAG is actually a tree, where the leaf nodes are explicit prior pmf. In such case, a recursive algorithm building a pmf from the pmfs calculated on each child nodes can do the job. Such case is specific to situations where any variable occur only at one place and influence only one other variable. Note that the very earliest prototype of Lea worked with such algorithm, which was fine for basic probabilistic arithmetic. In the general situations (e.g. given conditions), this constraint is not respected and the recursive algorithm mentioned above give wrong results because it does not enforce referential consistency: one given node referenced by two parent nodes will be seen as two independent events, although they share the same pmf. Here comes the "Statues" algorithm...
The How…
The Statues algorithm actually starts when calculating the pmf of a given node is required. This happens when you ask to display a variable or expression; this happens also when you call the P
function on such variable or expression.
How does it work?
Here are the most important features. The Statues algorithm uses generators instead of classical functions. In Python, a generator is, roughly speaking, a special routine that can halt at given places, delivers results to its caller, then resumes when commanded so by the caller. In Python source code, generators are easily identified in functions by the presence of yield
statement(s). Instead of asking children to calculate and return their pmf, the nodes ask their children to browse the possible variable bindings one by one and to generate the associated probabilities, ensuring the consistency at any stage. More precisely, any node requests its children nodes to generate atoms: for a given node X, the generated atoms are (v,p) pairs such that the sum of all p for a given v equals the probability P(X = v). If all children obey this rule, then any node can in turn apply simple rules to generate its own atoms (u,q) from the atoms it receives. In particular,
 a node representing the addition X+Y shall get any atom (x,p) from X and any atom (y,q) from Y and generates in turn the atom (x+y,pq);
 a node representing X  Y (i.e. X given Y) shall get any atom (y, q) from Y; then, only if y is true, it shall get the next atom (x,p) of X and generate the atom (x,pq);
 etc.
Finally, the (u,q) atoms received by the toplevel node (input query) are aggregated by summing the probabilities value by value, producing the expected pmf. The generators created recursively on the nodes allows keeping the binding of any variable consistent. In contrast with usual functions, the created generators are alive during the whole calculation and work cooperatively to delivers individual atoms to the queried node, which is the sole node to calculate a pmf.
References
Admittedly, the explanations above just give you, at the best, a rough idea of the process. If you want a true understanding of the Statues algorithm, you can:
 read the paper "Probabilistic inference using generators  the Statues algorithm",
 have a look at MicroLea, a light Python implementation of the Statues algorithm for educational purpose.
What's next?
Thank you for reading the present tutorial pages! We hope that you found it enough clear and informative. If not done yet, you may have a look on Examples page.
If you have interests in Machine Learning, note that there is an extra tutorial page on this topic. It covers maximum likelihood and EM algorithms.
For enhancements or bugs, please create issues on bitbucket Lea page.
Feel free also to send an email to pie.denis@skynet.be (English or French spoken). Use cases of Lea in your projects/researches are especially appreciated! A "references" page listing the projects using Lea could greatly improve the reputation of the library.
Updated