Wiki
Clone wikiLea / Lea3_Tutorial_3
< Tutorial [2/3]
Wiki Home
Machine Learning >
Table of Content
- Introduction
- Display options (set_display_options and as_string methods)
- Plotting histograms (plot method)
- Mode and maximum a posteriori estimation (MAP)
- Drawing with/without replacement (draw method)
- Evidence contexts (evidence, add_evidence methods)
- Revised probability distributions (given_prob method)
- Min / max functions (min_of, max_of functions)
- Indexing and slicing
- Object attributes and methods
- Advanced Boolean functions (all_different, all_true, all_increasing,... methods)
- Reducing sequences of probability distribution (reduce_all method)
- Information Theory
- Symbolic computation with probabilities
- Monte-Carlo sampling
- Advanced calculation options (calc method)
- How to optimize probabilistic inference?
- The Statues algorithm
- What's next?
Introduction
The present page is the third part of the tutorial for the Lea library. It assumes that you have installed Lea and that you are familiar with the techniques presented in the first part of the tutorial. The main topics are mostly independent, so you could choose to jump to your preferred subject without much trouble (see TOC above).
Display options (set_display_options
and as_string
methods)
The set_display_options
function allows you changing different Lea’s display options on a permanent basis, that is at the level of your Python session. Here are the possible (optional) arguments.
argument | default | description |
---|---|---|
kind |
None |
define how the probabilities are displayed, as None or 1 or 2 characters code: |
- if None : as these have been defined |
||
- if kind[0] is '/' : rational numbers "n/d" or "0" or "1" |
||
- if kind[0] is '.' : decimals with given nb_decimals digits |
||
- if kind[0] is '%' : percentage decimals with given nb_decimals digits |
||
- if kind[0] is '-' : textual histogram bar made up of repeated - , such that a bar length of chart_size represents a probability 1 |
||
if kind[1] is '-' , the textual histogram bars with - are appended after numerical representation of probabilities; |
||
nb_decimals |
6 |
number of decimals displayed for probabilities displayed as floating-point or decimal |
chart_size |
100 |
maximum bar length, as number of - character representing a probability 1 |
tabular |
True |
if True and if values are tuples of same length, then these are represented in a tabular format (fixed column width); in the specific cases of named tuple, a header line is prepended with the field names; |
one_line |
False |
if True , then the values and probabilities are displayed on one single line, separated by commas |
Here are some examples.
>>> die = lea.interval(1, 6, 'r') >>> lea.set_display_options(one_line=True) >>> die { 1: 1/6, 2: 1/6, 3: 1/6, 4: 1/6, 5: 1/6, 6: 1/6 } >>> lea.set_display_options(kind='.') >>> die { 1: 0.166667, 2: 0.166667, 3: 0.166667, 4: 0.166667, 5: 0.166667, 6: 0.166667 } >>> lea.set_display_options(kind='%', nb_decimals=2) >>> die { 1: 16.67 %, 2: 16.67 %, 3: 16.67 %, 4: 16.67 %, 5: 16.67 %, 6: 16.67 % } >>> lea.set_display_options(one_line=False) >>> die 1 : 16.67 % 2 : 16.67 % 3 : 16.67 % 4 : 16.67 % 5 : 16.67 % 6 : 16.67 % >>> lea.set_display_options(kind='/-') >>> die 1 : 1/6 ----------------- 2 : 1/6 ----------------- 3 : 1/6 ----------------- 4 : 1/6 ----------------- 5 : 1/6 ----------------- 6 : 1/6 -----------------
as_string
instance method:
>>> print(die.as_string(kind = '/', one_line=True)) { 1: 1/6, 2: 1/6, 3: 1/6, 4: 1/6, 5: 1/6, 6: 1/6 } >>> print (die.times(3).as_string('/-', chart_size=200)) 3 : 1/216 - 4 : 3/216 --- 5 : 6/216 ------ 6 : 10/216 --------- 7 : 15/216 -------------- 8 : 21/216 ------------------- 9 : 25/216 ----------------------- 10 : 27/216 ------------------------- 11 : 27/216 ------------------------- 12 : 25/216 ----------------------- 13 : 21/216 ------------------- 14 : 15/216 -------------- 15 : 10/216 --------- 16 : 6/216 ------ 17 : 3/216 --- 18 : 1/216 -
Plotting histograms (plot
method)
If you've installed the matplotlib library, you can very easily plot the probability mass function of a given Lea instance. We have already seen that all you have to do is invoking the plot()
method on any Lea instance. By default the plot()
method draws the pmf in a dedicated window. Several arguments can be specified to customize this behavior, including saving the chart in an image file. Here is the full signature, with default values:
your_lea_instance.plot(title=None, fname=None, savefig_args=dict(), **bar_args)
argument | description |
---|---|
title |
a string giving the title to display (no title if None) |
fname |
if not None, then the chart is saved in a file specified by given fname , as described in matplotlib.pyplot.savefig function |
savefig_args |
(used only if fname is not None) a dictionary relayed to matplotlib.pyplot.savefig function and containing named arguments expected by this function |
bar_args |
named arguments relayed to matplotlib.pyplot.bar function |
Example:
from lea.leaf import dice dice(3, 6).plot(title='Three fair dice', fname='pmf_3d6.png', savefig_args=dict(bbox_inches='tight'), color='lightblue')
Mode and maximum a posteriori estimation (MAP)
In some situations, you are just interested in the values that have the highest probability in a given probability distribution. This is traditionally known under the term "mode". On Lea instances, you can get these values as a tuple, using the mode
method. Here are some example showing the mode associated to throws of 1, 2 and 3 dice:
>>> d1 = interval(1, 6) >>> d1.mode (1, 2, 3, 4, 5, 6) >>> d1.times(2).mode() (7,) >>> d1.times(3).mode() (10, 11)
burglary
, earthquake
and john_calls
variables, given that we know that Mary called?"
>>> lea.joint(burglary,earthquake,john_calls).given(mary_calls).mode() ((False, False, False),)
burglary = earthquake = john_calls = False
, which means in clear that Mary actually called without worrying reasons, just to take news. Here are some other MAP queries for the same three variables, given other evidences:
>>> lea.joint(burglary, earthquake, john_calls).given(burglary|earthquake).mode() ((False, True, False),) >>> lea.joint(burglary, earthquake, john_calls).given(mary_calls, burglary|earthquake).mode() ((True, False, True),)
Drawing with/without replacement (draw
method)
In several situations, we are interested in a sequence of values obtained from the same random source: lottery, card drawing, dice throws, etc. These sequences can be defined using different protocols:
- with or without replacement: whether the drawn value is withdrawn from the remaining values (no duplicates)
- sorted or not: whether each sequence shall be sorted or not (i.e. order of drawings is irrelevant or not).
The draw method addresses such need, supporting the different protocols through two Boolean arguments. The signature is as follows:
leaInstance.draw(n,sorted=False,replacement=False)
argument | description |
---|---|
n |
the number of objects to draw, each value of the resulting distribution will be a n -tuple |
sorted |
boolean: True means that the order of drawing is irrelevant and the tuples are arbitrarily sorted by increasing order (default: False) |
replacement |
boolean: True means that the drawing is made WITH replacement so the same element may occur several times in each tuple (default: False) |
The two boolean arguments sorted
and replacement
can be used independently each from each other, leading to four cases. The best way to understand the meaning of these arguments is to show the result obtained for each of the four combinations. Let's define a non-uniform distribution with 3 objects:
>>> d = lea.pmf({'A': '3/6', 'B': '2/6', 'C': '1/6'}) >>> d A : 3/6 B : 2/6 C : 1/6
case 1. No sorting, without replacement
>>> d.draw(2) ('A', 'B') : 20/60 ('A', 'C') : 10/60 ('B', 'A') : 15/60 ('B', 'C') : 5/60 ('C', 'A') : 6/60 ('C', 'B') : 4/60
case 2. No sorting, with replacement
>>> d.draw(2, replacement=True) ('A', 'A') : 9/36 ('A', 'B') : 6/36 ('A', 'C') : 3/36 ('B', 'A') : 6/36 ('B', 'B') : 4/36 ('B', 'C') : 2/36 ('C', 'A') : 3/36 ('C', 'B') : 2/36 ('C', 'C') : 1/36
case 3. Sorting, without replacement
>>> d.draw(2, sorted=True) ('A', 'B') : 35/60 ('A', 'C') : 16/60 ('B', 'C') : 9/60
case 4. Sorting, with replacement
>>> d.draw(2, sorted=True, replacement=True) ('A', 'A') : 9/36 ('A', 'B') : 12/36 ('A', 'C') : 6/36 ('B', 'B') : 4/36 ('B', 'C') : 4/36 ('C', 'C') : 1/36
sorted=True
) is usually more efficient than the other cases thanks to a special combinatorial algorithm due to Paul Moore. Note however that, in case 3 (replacement=False
), this special algorithm will be used ONLY if the probability distribution is uniform (actually, the most efficient algorithm is chosen by the draw
method, which checks whether the distribution is uniform or not); this is not the case in the example above but it applies fortunately in most "fair" games.
Evidence contexts (evidence
, add_evidence
methods)
In "Bayesian networks", we have expressed conditions/evidences using the given
method:
P(burglary.given(mary_calls, john_calls)) # -> 0.28417183536439294 P(earthquake.given(mary_calls, john_calls)) # -> 0.17606683840507922
For factoring out such conditions by declaring evidences once for all, Lea provides evidence contexts. Two constructs are available.
1- Evidence context managers, using the with
keyword of Python:
with lea.evidence(john_calls, mary_calls): P(burglary) P(earthquake) # -> 0.28417183536439294 # 0.17606683840507925
with
block and are removed as soon as the this block is exited. These context managers can be embedded: with lea.evidence(john_calls): P(burglary) P(earthquake) with lea.evidence(mary_calls): P(burglary) P(earthquake) # -> 0.016283729946769937 # 0.01139496877381118 # 0.28417183536439294 # 0.17606683840507925
lea.add_evidence(john_calls, mary_calls) P(burglary) P(earthquake) # -> 0.28417183536439294 # 0.17606683840507925
lea.clear_evidence()
lea.add_evidence(john_calls) lea.add_evidence(mary_calls) P(burglary) P(earthquake) # -> 0.28417183536439294 # 0.17606683840507925
Revised probability distributions (given_prob
method)
The calculation of conditional probabilities using the given
method is fine when the information expressed in the condition is 100% sure. In some situations however, this information is uncertain but it can be associated to a precise probability value p. The goal is then to calculate a revised probability distribution so that the given condition has a probability p to occur. Another interpretation of this goal could be that you want to favor some occurrences of the prior probability distribution by re-balancing the probabilities.
The method given_prob(cond,p)
serves this purpose, where cond
is a condition expressed as a boolean distribution (exactly as given
method), p
is the probability that cond
is true.
As an example, suppose that from a die assumed to be fair initially, we note that it gives eventually the values 5 or 6, with a probability of 1/2. The revised distribution is calculated as follows:
>>> die = lea.interval(1, 6, prob_type='r') >>> revised_die = die.given_prob(die >= 5, p='1/2') >>> revised_die 1 : 1/8 2 : 1/8 3 : 1/8 4 : 1/8 5 : 2/8 6 : 2/8
>>> P(revised_die >= 5) 1/2
given_prob
method is a generalization of the given
method:
d.given(cond)
is equivalent tod.given_prob(cond, p=1)
d.given(~cond)
is equivalent tod.given_prob(cond, p=0)
Min / max functions (min_of
, max_of
functions)
In the "Applying functions" section, we have seen how to build a distribution taking the maximum (or minimum) values from other distributions. Lea provides the min_of
and max_of
methods, which do the same job:
>>> die1 = lea.interval(1, 6, prob_type='r') >>> die2 = die1.new() >>> lea.min_of(die1, die2) 1 : 11/36 2 : 9/36 3 : 7/36 4 : 5/36 5 : 3/36 6 : 1/36 >>> lea.max_of(die1, die2) 1 : 1/36 2 : 3/36 3 : 5/36 4 : 7/36 5 : 9/36 6 : 11/36
Indexing and slicing
When working with Python's strings, tuples or lists, it is common to extract subparts of these objects by using indexing or slicing operations (i.e. the brackets syntax). When such operation is operated on a Lea instance, it is propagated on the inner values, passing the given (slice) index. The result is a new probability distribution containing the indexed or sliced values.
To give an example, consider the following Lord of the Rings' characters micro-database:
>>> character = lea.vals(('Gimli', 'dwarf'), ('Bilbo', 'hobbit'), ('Frodo', 'hobbit'), ('Meriadoc', 'hobbit'), ('Pippin', 'hobbit'), ('Gandalf', 'wizard'), ('Boromir', 'man'), 'Faramir', 'man'), prob_type='r') >>> character ('Bilbo' , 'hobbit') : 1/8 ('Boromir' , 'man' ) : 1/8 ('Faramir' , 'man' ) : 1/8 ('Frodo' , 'hobbit') : 1/8 ('Gandalf' , 'wizard') : 1/8 ('Gimli' , 'dwarf' ) : 1/8 ('Meriadoc', 'hobbit') : 1/8 ('Pippin' , 'hobbit') : 1/8
>>> name = character[0] >>> name Bilbo : 1/8 Boromir : 1/8 Faramir : 1/8 Frodo : 1/8 Gandalf : 1/8 Gimli : 1/8 Meriadoc : 1/8 Pippin : 1/8 >>> race = character[1] >>> race dwarf : 1/8 hobbit : 4/8 man : 2/8 wizard : 1/8
- What is the distribution of the last letter of names?
>>> name[-1] c : 1/8 f : 1/8 i : 1/8 n : 1/8 o : 2/8 r : 2/8
- Who are the hobbits? (note how
name
andrace
are interdependent variables!)>>> name.given(race == 'hobbit') Bilbo : 1/4 Frodo : 1/4 Meriadoc : 1/4 Pippin : 1/4
- What is the race of those having a name ending with "ir"?
>>> race.given(name[-2:] == 'ir') man : 1
- Who is who?
>>> name + ' is a ' + race + '.' Bilbo is a hobbit. : 1/8 Boromir is a man. : 1/8 Faramir is a man. : 1/8 Frodo is a hobbit. : 1/8 Gandalf is a wizard. : 1/8 Gimli is a dwarf. : 1/8 Meriadoc is a hobbit. : 1/8 Pippin is a hobbit. : 1/8
- Who is the Chosen One?
Note the
>>> name.random() + ' is the Chosen One.' 'Frodo is the Chosen One.'
character
distribution is nothing else than a joint probability distribution (JPD). The indexing technique demonstrated here could be used also to extract specific attributes from the inner tuples. Revisiting the example given in "Defining joint probability distributions, you could skip the definition ofjpd
variable using theas_joint
method and do instead:These three variables shall then be interdependent; these could be used as demonstrated in "Defining joint probability distributions.toothache = _jpd[0] catch = _jpd[1] cavity = _jpd[2]
Object attributes and methods
The micro-DB example above can be redone using user-defined objects instead of tuples.
class C(object): def __init__(self,name,race): self.name = name self.race = race def __str__(self): return "%s (%s)"%(self.name,self.race) def health_points(self): return 1 + hash(self.name+self.race) % 10 character = lea.vals(C('Gimli','dwarf'),C('Bilbo','hobbit'),C('Frodo','hobbit'),C('Meriadoc','hobbit'), C('Pippin','hobbit'),C('Gandalf','wizard'),C('Boromir','man'),C('Faramir','man'),prob_type='r')
C
objects as domain, instead of tuples.
>>> character Faramir (man) : 1/8 Meriadoc (hobbit) : 1/8 Gimli (dwarf) : 1/8 Pippin (hobbit) : 1/8 Gandalf (wizard) : 1/8 Bilbo (hobbit) : 1/8 Boromir (man) : 1/8 Frodo (hobbit) : 1/8
>>> character.name Bilbo : 1/8 Boromir : 1/8 Faramir : 1/8 Frodo : 1/8 Gandalf : 1/8 Gimli : 1/8 Meriadoc : 1/8 Pippin : 1/8 >>> character.race dwarf : 1/8 hobbit : 4/8 man : 2/8 wizard : 1/8
>>> character.health_points() 1 : 1/8 3 : 1/8 7 : 2/8 8 : 3/8 9 : 1/8 >>> character.name + ', with HP = ' + character.health_points().map(str) Bilbo, with HP = 10 : 1/8 Boromir, with HP = 4 : 1/8 Faramir, with HP = 1 : 1/8 Frodo, with HP = 5 : 1/8 Gandalf, with HP = 10 : 1/8 Gimli, with HP = 8 : 1/8 Meriadoc, with HP = 6 : 1/8 Pippin, with HP = 9 : 1/8
mean
, stdev
, given
, etc) and those to be relayed to the inner objects (like name
, race
, etc)? Well, the rule is simple: if the attribute/method name is unknown in Lea API, then it is propagated in the inner objects. Of course, this may cause worries in the case of homonyms, i.e. the inner objects have, by mischance, a name defined in Lea API; such attribute is not accessible with the simple technique since it is "hidden" by the Lea API. Should such situation occurs, there exists a trick using Lea's map
method and Python getattr
function. Assume for example, that the class C above has an attribute called mean
. Here is how to get the distribution of this attribute (in place of the probabilistic mean):
>>> character.map(getattr,('mean',))
The ability to easily relay methods on inner objects a given probability distribution can be used for builtin Python objects. String's upper
method provides a trivial example:
>>> lea.set_prob_type('r') >>> lea.vals(*"alea").upper() A : 2/4 E : 1/4 L : 1/4
draw
method to model the three throws:
>>> die = lea.interval(1, 6) >>> three_dice = die.draw(3, replacement=True)
three_dice
contains 6x6x6 = 216 tuples, each containing 3 elements. To get the result, you can directly use the count
method defined on Python's tuples:
>>> count_6 = three_dice.count(6) >>> count_6 0 : 125/216 1 : 75/216 2 : 15/216 3 : 1/216 >>> P(count_6 >= 2) 2/27
>>> P(lea.interval(1,6).draw(3,replacement=True).count(6) >= 2) 2/27
Advanced Boolean functions (all_different
, all_true
, all_increasing
,... methods)
Lea provides several advanced Boolean functions, that take variable number of arguments and return Boolean probability distribution. These not only improve expressiveness but also speed up evaluation by using short-circuit evaluation.
Such functions may be used for calculating conditional probabilities using the given
method, mimicking somehow the Constraint Programming style (CP).
For instance, all_different
is True when values yielded by all arguments are different each from each other.
>>> (a, b, c) = interval(1, 3, 'r').new(3) >>> P(all_different(a, b, c)) 2/9 >>> joint(a, b, c).given(all_different(a, b, c)) (1, 2, 3) : 1/6 (1, 3, 2) : 1/6 (2, 1, 3) : 1/6 (2, 3, 1) : 1/6 (3, 1, 2) : 1/6 (3, 2, 1) : 1/6
The following functions are provided in the lea
module. The expected arguments are Lea instances; if not, these are coerced to Lea instance having probability 1.
function | Probability that... |
---|---|
all_different(*args) |
values yielded by all arguments are different each from each other |
all_equals(*args) |
all values yielded by all arguments are equal |
all_true(*args) |
values yielded by all arguments are all true |
all_false(*args) |
values yielded by all arguments are all false |
any_true(*args) |
at least one value yielded by some argument is true |
any_false(*args) |
at least one value yielded by some argument is false |
all_increasing(*args) |
values yielded by arguments are in non-strict increasing order |
all_strict_increasing(*args) |
values yielded by arguments are in strict increasing order |
all_decreasing(*args) |
values yielded by arguments are in non-strict decreasing order |
all_strict_decreasing(*args) |
values yielded by arguments are in strict increasing order |
all_verify(arg1s,op,arg2) |
all values arg1 yielded by args1 arguments verify op(arg1,arg2), where op is a 2-ary Boolean function |
any_verify(arg1s,op,arg2) |
at least one value arg1 yielded by args1 arguments verifies op(arg1,arg2), where op is a 2-ary Boolean function |
all_pairwise_verify(args,op) |
all pairs (arg1,arg2) yielded by subsequent elements of args verify op(arg1,arg2), where op is a 2-ary Boolean function |
Here are some examples:
>>> joint(a, b, c).given(all_equal(a, b, c)) (1, 1, 1) : 1/3 (2, 2, 2) : 1/3 (3, 3, 3) : 1/3 >>> joint(a, b, c).given(all_true(a<=2, b==3, a<=c)) (1, 3, 1) : 1/5 (1, 3, 2) : 1/5 (1, 3, 3) : 1/5 (2, 3, 2) : 1/5 (2, 3, 3) : 1/5 >>> joint(a, b, c).given(any_true(a+b==2, a-b==1)) (1, 1, 1) : 1/9 (1, 1, 2) : 1/9 (1, 1, 3) : 1/9 (2, 1, 1) : 1/9 (2, 1, 2) : 1/9 (2, 1, 3) : 1/9 (3, 2, 1) : 1/9 (3, 2, 2) : 1/9 (3, 2, 3) : 1/9 >>> joint(a, b, c).given(all_false(a<=2, b==3, a<=c)) (3, 1, 1) : 1/4 (3, 1, 2) : 1/4 (3, 2, 1) : 1/4 (3, 2, 2) : 1/4 >>> joint(a, b).given(any_false(a+b==2, a-b==1)) (1, 1) : 1/9 (1, 2) : 1/9 (1, 3) : 1/9 (2, 1) : 1/9 (2, 2) : 1/9 (2, 3) : 1/9 (3, 1) : 1/9 (3, 2) : 1/9 (3, 3) : 1/9 >>> joint(a, b).given(all_increasing(a, b)) (1, 1) : 1/6 (1, 2) : 1/6 (1, 3) : 1/6 (2, 2) : 1/6 (2, 3) : 1/6 (3, 3) : 1/6 >>> joint(a, b).given(all_decreasing(a, b)) (1, 1) : 1/6 (2, 1) : 1/6 (2, 2) : 1/6 (3, 1) : 1/6 (3, 2) : 1/6 (3, 3) : 1/6 >>> joint(a, b).given(all_strict_increasing(a, b)) (1, 2) : 1/3 (1, 3) : 1/3 (2, 3) : 1/3 >>> import operator >>> joint(a, b, c).given(all_verify((a,b,c), operator.le, 2)) (1, 1, 1) : 1/8 (1, 1, 2) : 1/8 (1, 2, 1) : 1/8 (1, 2, 2) : 1/8 (2, 1, 1) : 1/8 (2, 1, 2) : 1/8 (2, 2, 1) : 1/8 (2, 2, 2) : 1/8 >>> joint(a, b, c).given(all_pairwise_verify((a, b, c), operator.ne)) (1, 2, 1) : 1/12 (1, 2, 3) : 1/12 (1, 3, 1) : 1/12 (1, 3, 2) : 1/12 (2, 1, 2) : 1/12 (2, 1, 3) : 1/12 (2, 3, 1) : 1/12 (2, 3, 2) : 1/12 (3, 1, 2) : 1/12 (3, 1, 3) : 1/12 (3, 2, 1) : 1/12 (3, 2, 3) : 1/12
given
's arguments are treated as all_true
's, so
joint(a, b, c).given(a<=2, b==3, a<=c)
joint(a, b, c).given(all_true(a<=2, b==3, a<=c))
all_true
is not much useful as argument of given
.
Reducing sequences of probability distribution (reduce_all
method)
Lea instances can be put in a Python container type, like lists and tuples. then, these can be "reduced" using a given 2-ary function to produce a new Lea instance. Consider for example the following tuple containing three dices with 2, 4 and 6 faces:
>>> dice = tuple(lea.interval(1, n, prob_type='r') for n in (2, 4, 6)) >>> dice (1 : 1/2 2 : 1/2, 1 : 1/4 2 : 1/4 3 : 1/4 4 : 1/4, 1 : 1/6 2 : 1/6 3 : 1/6 4 : 1/6 5 : 1/6 6 : 1/6)
sum
function can be used:
>>> sum(dice) 3 : 1/48 4 : 3/48 5 : 5/48 6 : 7/48 7 : 8/48 8 : 8/48 9 : 7/48 10 : 5/48 11 : 3/48 12 : 1/48
reduce
function, which is only available in Python 2. To meet this need, Lea provides a dedicate function, reduce_all
, which treats the elements of an iterable of Lea instances according to a given 2-ary function.
>>> from lea import * >>> import operator >>> reduce_all(operator.mul, dice) 1 : 1/48 2 : 3/48 3 : 2/48 4 : 5/48 5 : 1/48 6 : 5/48 8 : 5/48 9 : 1/48 10 : 2/48 12 : 6/48 15 : 1/48 16 : 3/48 18 : 2/48 20 : 2/48 24 : 4/48 30 : 1/48 32 : 1/48 36 : 1/48 40 : 1/48 48 : 1/48
reduce_all
is typically a standard 2-ary operator but any user-defined 2-ary function is valid also, provided that it applies to the values of Lea instances present in the sequence.
Note that reduce_all
admits the optional argument absorber
(default: None
): you can specify the absorber value for the given function; for instance, we could set absorber=0
for the multiplication case above. The effect is that it activates a more efficient algorithm able to stop combinatorial search as soon as the absorber is met. This can accelerate things dramatically. For instance, the following symbolic calculation that multiplies 10 [0,1] flips takes several seconds:
>>> z = tuple(bernoulli('p') for _ in range(10)) >>> reduce_all(operator.mul, z) 0 : 1 - p**10 1 : p**10
>>> reduce_all(operator.mul, z, absorber=0)
all_true
, any_true
, etc. see above.
Information Theory
Lea allows calculating several measures defined by information theory. These could provide new insights on your probability distributions.
Information
The most basic concept is the information carried by a given event. It is expressed in bits. As a basic example, consider two coins, a fair one and an unfair one:
>>> flip = lea.vals('head', 'tail') >>> flip_u = lea.pmf({'head': 1./4., 'tail': 3./4.})
Here is how to get the information carried by each possible result, after flipping the fair coin :
>>> flip.information_of('head') 1.0 >>> flip.information_of('tail') 1.0
>>> flip_u.information_of('head') 2.0 >>> flip_u.information_of('tail') 0.4150374992788437
For boolean probability distributions, note that .information_of(True)
can be replaced by .information
, which is a convenient shortcut. Example:
>>> rain = lea.event(1./8.) >>> rain.information() 3.0 >>> (flip_u=='head').information() 2.0
Entropy
If you take the average information carried by all events modeled by a given probability distribution, using probabilities of these events as weights, you get the entropy of this distribution. The entropy is one of the most important concept of information theory. It captures the "degree of randomness" of a given probability distribution.
Here is how to get the entropy of each defined coin:
>>> flip.entropy() 1.0 >>> flip_u.entropy() 0.8112781244591328
Entropy and conditional probabilities
The information and entropy can be calculated on any Lea instance, whether defined explicitly as above or resulting of calculations, as explained throughout the tutorials. In particular, these indicators can be calculated on conditional probability distributions.
Let us take an example. There are 64 balls in an urn. 62 of these balls are blue and marked with a "x"; 2 of these balls are red: one is marked with an "x", the other with "y". Here is a short model of the distribution of these balls:
>>> ball = lea.pmf({ 'Bx': 62, 'Rx': 1, 'Ry': 1 }, prob_type='r') >>> ball Bx : 62/64 Rx : 1/64 Ry : 1/64
ball[0]
or ball[1]
respectively (see "Indexing and slicing"). Let us create two new variables to ease the work:
>>> color = ball[0] >>> mark = ball[1] >>> color B : 31/32 R : 1/32 >>> mark x : 63/64 y : 1/64
ball
:
>>> ball.entropy() 0.23187232431271465
Suppose now that someone draws a ball and tell us some information on its color or mark (remember that the three variables ball
, color
and mark
are interdependent due to referential consistency). Let us calculate the new entropy in the following cases:
- What is the entropy if the ball's mark is "x"?
>>> ball.given(mark=='x').entropy() 0.11759466565886476
- What is the entropy if the ball's mark is "y"?
>>> ball.given(mark=='y').entropy() 0.0
- What is the entropy if the ball's color is blue?
>>> ball.given(color=='B').entropy() 0.0
- What is the entropy if the ball's color is red?
>>> ball.given(color=='R').entropy() 1.0
We see in the three first cases that the given information lower the entropy, considering the initial value of 0.23 bits. In two cases, the entropy becomes null: this happens because the new information dissolve any uncertainty on the mark and color. In the last case however, the information that the ball is red increases the entropy to 1 bit. This value is consistent with the uncertainty remaining on the mark of the ball, which is equally likely "x" or "y". It may sound odd that new information makes the entropy increase. Actually, there's nothing wrong here, it's simply unusual. The initial unbalanced distribution is such that the reception of unlikely information ("the ball is red") can reduce the prior beliefs we had.
Mutual information, joint entropy and conditional entropy
When dealing with two dependent random variables, it's interesting to know the amount of information they share. This is the purpose of the mutual information, aka transinformation. The examples below use the ball
, color
and mark
interdependent distributions of our last case.
>>> lea.mutual_information(color, mark) 0.08486507530476972 >>> lea.mutual_information(color, ball) 0.20062232431271465 >>> lea.mutual_information(mark, ball) 0.11611507530476972
As a stupid example, we can verify that our ball lottery and coin flipping do not share anything:
>>> lea.mutual_information(ball, flip) 0.0
Another interesting measure is the joint entropy of a set of random variables. It gives the entropy of the joint of these random variables, taking into account possible interdependence between them. Here are two basic examples.
>>> lea.joint_entropy(mark, color) 0.23187232431271465 >>> lea.joint_entropy(mark, flip) 1.1161150753047697
mark
and color
is the same as the basic entropy of ball
; this is not surprising: a given ball is completely described by its mark and color, so there has exactly the same level information that the couple of these two characteristics. In the second example, we join mark
and flip
, since these are independent events, we get the sum of the individual entropies (0.1161 bits + 1.0 bits). You can specify any number of arguments in lea.joint_entropy
. Note that if you specify only one argument, then you get the basic entropy of this argument.
Another measure is the conditional entropy: what is the entropy of x given that y is true? This is also known as equivocation of x about y. Here are two basic examples of usage:
>>> ball.cond_entropy(mark) 0.11575724900794494 >>> mark.cond_entropy(ball) 0.0
True
(or any relationship that is certainly true) as argument of cond_entropy
boils down to calculate the basic entropy.
Cross entropy and Kullback–Leibler divergence
For completing the topic of entropy-related measures in Lea, we shall introduce two more indicators that are specially useful for Machine Learning. These are the cross-entropy and the Kullback–Leibler divergence (also known as relative entropy). Both indicators allow to compare two probability distributions having same support by providing a measure of how far one fits the other. Assuming that d is the actual observed distribution and e is an estimated distribution, these are measures of how good e fits to d.
Basically, the cross entropy of d with e makes an average of information given by e using as weights the probabilities of d. For example, we could examine how given probability distributions fits the color
distribution defined above ({B: 31/32, R: 1/32}):
>>> estimated_color1 = lea.pmf({ 'B': 20, 'R': 12 }, prob_type='r') >>> color.cross_entropy(estimated_color1) 0.7011020799303316 >>> estimated_color2 = lea.pmf({ 'B': 30, 'R': 2 }, prob_type='r') >>> color.cross_entropy(estimated_color2) 0.2151997355042477
>>> color.cross_entropy(color) 0.20062232431271465 >>> color.entropy 0.20062232431271465
kl_divergence
method.
>>> color.kl_divergence(estimated_color1) 0.5004797556176169 >>> estimated_color2 = lea.pmf({ 'B': 30, 'R': 2 }, prob_type='r') >>> color.kl_divergence(estimated_color2) 0.014577411191533052
For more practical examples of usage, you may have a look on section dedicated to Machine Learning using EM algorithm - case 3.
Likelihood ratio (LR)
The likelihood ratio (LR) of an evidence E given an hypothesis H is defined as
P(E | H) LR = -------------- P(E | not H)
As a toy problem, imagine that a die has been thrown but we have very limited ways to gather information on the result. Let us assume that our hypothesis is "the die result is even". After investigations, we get an evidence: the result is not 1. What is the LR of this evidence for our hypothesis? The lea.lr(E,H)
syntax makes this calculation:
>>> die = lea.interval(1,6) >>> parity = die.map(lambda x: 'even' if x%2==0 else 'odd') >>> lea.lr(die != 1, parity == 'even') 1.5
>>> P((die != 1).given(parity == 'even')) / P((die != 1).given(parity != 'even')) 1.5
Notes: the following rules are enforced, in this sequence:
- if E is impossible, then an exception is raised (LR = 0/0)
- if H is impossible, then an exception is raised (P(E | H) undefined)
- if H is certain, then an exception is raised (P(E | not H) undefined)
- if H is impossible given E, then LR = 0.0
- if H is certain given E, then LR = +infinity (Python's
math.inf
)
Examples:
>>> lea.lr(die == 3, parity == 'even') 0.0 >>> lea.lr(die == 2, parity == 'even') inf
E.given(H).lr()
; this flavor emphasizes the numerator of the LR expression and provides same result as the syntax seen above. It may be handier to express an hypothesis that is a conjunction of several sub-hypothesis, which can be put as arguments of given(…)
:
>>> (die >= 2).given(parity == 'even').lr() 1.5 >>> (die >= 2).given(parity == 'even', die <= 5).lr() 1.3333333333333333
Symbolic computation with probabilities
We have seen that Lea can handle probabilities expressed as variable names (p, q, …), provided that the SymPy library is installed. This approach is commonly referred as symbolic computation. All the techniques for probabilistic calculation seen above in the tutorial can be revisited with probability variables in place of probability figures.
Such approach is useful when the same query is made over and over on the same complex model, with only varying prior probability values. In such case, the query result may be prepared "offline" by Lea, once for all, into an arithmetic expression; this may take maybe a long processing time; then, this expression can be copy-pasted in a dedicated program to be evaluated many times using fast arithmetic computations, with different input probability values. This dedicated program could be written in any programming language: Python, C, C++, Java, etc, without the need of any probability library.
Let's start with basic examples: let's call p the probability that number of heads obtained by flipping one coin and q the probability that it shall rain tomorrow:
>>> flip = lea.bernoulli('p') >>> rain = lea.event('q') >>> flip 1 : p 0 : -p + 1 >>> rain True : q False : -q + 1
>>> (flip == 0) & rain False : p*q - q + 1 True : -q*(p - 1) >>> (flip == 0) | rain False : -p*(q - 1) True : p*q - p + 1
True
by using the P
function.
>>> P = lea.P >>> P((flip == 0) | rain) p*q - p + 1
P
is a SymPy expression (more precisely, an instance of some subclass of SymPy's Expr
). You can then apply methods provided by SymPy, like substituting some variable(s) using the subs
method:
>>> P((flip == 0) | rain).subs('p', 0.5) 0.5*q + 0.5 >>> P((flip == 0) | rain).subs((('p', 0.5), ('q', 0.2))) 0.600000000000000 >>> P((flip == 0) | rain).subs('p', 'q') q**2 - q + 1
>>> flip - lea.bernoulli('q') -1 : -q*(p - 1) 0 : 2*p*q - p - q + 1 1 : -p*(q - 1) >>> flip4 = flip.times(4) >>> flip4 0 : (p - 1)**4 1 : -4*p*(p - 1)**3 2 : 6*p**2*(p - 1)**2 3 : -4*p**3*(p - 1) 4 : p**4
flip4
probability distribution is also known as the binomial distribution, which you can build using a dedicated function:
>>> flip4 = lea.binom(4, 'p')
>>> P((1 <= flip4) & (flip4 < 4)) -2*p*(p - 1)*(p**2 - p + 2) >>> P(flip4.is_any_of(1,2,3)) -2*p*(p - 1)*(p**2 - p + 2)
>>> flip4.mean() 4*p >>> flip4.var() -4*p*(p - 1) >>> flip4.entropy() (-p**4*log(p**4) + 4*p**3*(p - 1)*log(-4*p**3*(p - 1)) - 6*p**2*(p - 1)**2*log(6*p**2*(p - 1)**2) + 4*p*(p - 1)**3*log(-4*p*(p - 1)**3) - (p - 1)**4*log((p - 1)**4))/log(2)
sympy.factor
function as a post-treatment in order to simplify the resulting expressions as much as possible. You can notice that the calculated mean 4*p
agrees with the theoretical formula for the mean of the binomial.
To make variable substitution in a probability distribution that is non-boolean, you simply have to call subs
method on that distribution: this call is propagated, with the given arguments, to all inner probability expressions:
>>> flip4.subs('p', 0.25) 0 : 0.316406250000000 1 : 0.421875000000000 2 : 0.210937500000000 3 : 0.0468750000000000 4 : 0.00390625000000000
>>> lea.binom(4, 0.25)
SymPy.Float
for the former while it's Python's float
for the latter.
So far, we have defined symbolic probabilities by means of strings of characters. Behind the scene, these strings are converted to SymPy symbols thanks to sympy.symbols
function. In order to have better control on these variables, you could also build these symbols yourself and pass them to Lea constructors in place of strings. Here is an example:
>>> import sympy >>> (p,q) = sympy.symbols('p q') >>> lea.binom(2, p) + lea.binom(2, q) 0 : (p - 1)**2*(q - 1)**2 1 : -2*(p - 1)*(q - 1)*(2*p*q - p - q) 2 : 6*p**2*q**2 - 6*p**2*q + p**2 - 6*p*q**2 + 4*p*q + q**2 3 : -2*p*q*(2*p*q - p - q) 4 : p**2*q**2
pmf
function. Let's give it a try:
>>> lea.pmf({"A": 'pA', "B": 'pB', "C": 'pC'}) A : pA/(pA + pB + pC) B : pB/(pA + pB + pC) C : pC/(pA + pB + pC)
pC
as it can be derived from pA
and pB
. As explained, in the section dedicated to normalization, replacing pC
by None
will do the job:
>>> lea.pmf({"A": 'pA', "B": 'pB', "C": None}) A : pA B : pB C : -pA - pB + 1
>>> (pA,pB) = sympy.symbols('pA pB') >>> lea.pmf({"A": pA, "B": pB, "C": 1-(pA+pB)})
burglary = lea.event('a') earthquake = lea.event('b') alarm = lea.joint(burglary,earthquake).switch({ (True ,True ) : lea.event('c'), (True ,False) : lea.event('d'), (False,True ) : lea.event('e'), (False,False) : lea.event('f') }) john_calls = alarm.switch({ True : lea.event('g'), False : lea.event('h') }) mary_calls = alarm.switch({ True : lea.event('i'), False : lea.event('j') })
>>> pb = P(burglary.given(mary_calls & john_calls)) >>> pb a*(b*c*g*i - b*c*h*j - b*d*g*i + b*d*h*j + d*g*i - d*h*j + h*j)/(a*b*c*g*i - a*b*c*h*j - a*b*d*g*i + a*b*d*h*j - a*b*e*g*i + a*b*e*h*j + a*b*f*g*i - a*b*f*h*j + a*d*g*i - a*d*h*j - a*f*g*i + a*f*h*j + b*e*g*i - b*e*h*j - b*f*g*i + b*f*h*j + f*g*i - f*h*j + h*j)
>>> values = (('a',0.001),('b',0.002),('c',0.95),('d',0.94),('e',0.29),('f',0.001),('g',0.9),('h',0.05),('i',0.7),('j',0.01)) >>> pb.subs(values) 0.284171835364393
.subs(values)
on them.
As you can expect it from a symbolic computation system, you are free to fix only a subset of variables: you will get then a new formula with the remaining free variables:
>>> pb.subs((('a',0.001),('b',0.002),('c',0.95),('d',0.94),('e',0.29),('f',0.001))) 0.001*(0.94002*g*i + 0.05998*h*j)/(0.002516442*g*i + 0.997483558*h*j)
As explained in the introduction, the returned formula could be copy-pasted in a program, which is completely unaware of Bayes networks but which is fast for number-crunching. Note also that SymPy allows reformatting such formula into various formats, like LaTeX:
>>> print(latex(pb)) \frac{a \left(b c g i - b c h j - b d g i + b d h j + d g i - d h j + h j\right)}{a b c g i - a b c h j - a b d g i + a b d h j - a b e g i + a b e h j + a b f g i - a b f h j + a d g i - a d h j - a f g i + a f h j + b e g i - b e h j - b f g i + b f h j + f g i - f h j + h j}
Monte-Carlo sampling
Random generation through MC (random_mc
method)
We have seen in the first part of the tutorial how to generate random samples with the random(…)
method. An important point must be known on the way this method works. Consider a set of 9 different "dice", with number of faces ranging from 2 to 10. Let us make 20 throws of these 9 dice and get the sum:
dice = tuple(lea.interval(1,n) for n in range(2,11)) sum_dice = sum(dice) print (sum_dice.random(20)) # -> (32, 26, 33, 31, 28, 39, 32, 31, 38, 26, 26, 33, 33, 27, 26, 27, 33, 28, 47, 25)
sum_dice
is calculated (a heavy combinatorial process), which produces a resulting probability distribution; then, from this instance, the 20 random samples are obtained. So, what actually happens is equivalent to this:
calculated_sum_dice = sum_dice.calc() print (calculated_sum_dice.random(20)) # -> (32, 26, 33, 31, 28, 39, 32, 31, 38, 26, 26, 33, 33, 27, 26, 27, 33, 28, 47, 25)
calc()
call (see below). Note that, because of a cache mechanism, any further call to sum_dice.random(...)
shall be very fast (as fast as calculated_sum_dice.random(...)
, actually).
In several cases, using this approach just to get random samples is intractable because of combinatorial explosion. The random_mc(…)
method proposes an alternative implementation, which tackles this limitation. Let's revisiting the same example:
sum_dice2 = sum(dice) sum_dice2.random_mc(20) # -> (37, 35, 38, 34, 29, 31, 32, 21, 31, 30, 35, 39, 35, 40, 31, 22, 30, 29, 39, 31)
random_mc
does not try to calculate the final probability distribution; it just make random sample on each elementary distribution, then it aggregates the results together. In the current case, for each of the 20 trials, it throws the 9 dice and it sums up the results. This is closer to the real process since it's basically a simulation of this process.
random_mc
can handle any type of Lea construct, including those specifying a condition:
print (sum_dice2.given(sum_dice2 <= 24).random_mc(20)) # -> (24, 18, 18, 20, 21, 23, 24, 21, 23, 24, 22, 22, 24, 24, 24, 23, 24, 21, 22, 24)
random_mc
method is the real McCoy. The random
and random_mc
methods have actually a complimentary usage:
random
tends to be better when the end values result from long calculations and/or when the combinatorial is tractable and/or the random sample to produce is large;random_mc
tends to be better when the end values result from short calculations and/or when the combinatorial is intractable and/or the random sample to produce is small.
Approximate inference by Monte-Carlo approaches
The random_mc(…)
method may serve another purpose than pure random sample generation. It may be used to estimate the resulting probability distribution based on random samples. By default, Lea calculates exact probability distributions; this works very fine for all examples shown in previous tutorial pages. However, many real-world probabilistic problems have a large sample space and require a prohibitive time to be solved in an exact way. This is especially the case for large or densely connected Bayesian networks. For such intractable problems, Lea provides several Monte-Carlo algorithms that are able to produce approximate results in a reasonable time, with precision tunable by giving a number of random samples.
Let's start by using the random_mc(…)
tool you have just learned. You may estimate the probability distribution of the sum of 9 dice seen above, based on a sample of 10,000 trials:
estimated_sum_dice = lea.vals(*sum_dice2.random_mc(10000)) print (estimated_sum_dice) # -> 13 : 0.0002 14 : 0.0002 15 : 0.0007 16 : 0.0016 17 : 0.0017 18 : 0.0039 19 : 0.007 …
estimate_mc(…)
. The following statement
estimated_sum_dice = sum_dice2.estimate_mc(10000)
This method of estimating a probability distribution by MC sampling is very useful in many situations having large sample space size, like querying a complex Bayesian network. The next subsection elaborates this theme, providing more algorithms and options.
The calc
method and Monte-Carlo algorithms
The estimate_mc
seen above is actually a convenience method, which is simple to use. Lea provides actually several probabilistic inference algorithms, which are accessible through the x.calc(...)
method, where x
is any Lea instance. Before coveing the MC algorithms themselves, some explanations are needed on this calc
method. Suppose that you have defined a Lea instance x
with several dependencies to other instances. When you execute statements like print (x)
, print (x.p(4)))
, avg = x.mean()
or sample = x.random(1000)
, a calculation occurs behind the scene to evaluate the pmf of x
, before doing the specific required action. This calculation is done implicitly, in order to ease your life. You can however ask to perform this calculation explicitly at any time by making this call:
x.calc()
x.new()
. As such, this call is not very useful except if you want to control finely when the calculation is done. Now, the real interest is that calc
has several optional arguments that let you make fine customization of the calculation algorithm.
Let us consider the example of Bayesian network on Wikipedia.
from lea import * rain = event(0.20) sprinkler = if_(rain, event(0.01), event(0.40)) grass_wet = joint(sprinkler,rain).switch({ (False,False): False, (False,True ): event(0.80), (True ,False): event(0.90), (True ,True ): event(0.99)})
query = rain.given(grass_wet)
print (P(query)) print (P(query.calc())) print (P(query.calc(algo=EXACT))) # -> 0.35768767563227616
query
is intractable (i.e. the calculation takes ages to terminate). As a fallback, you want to obtain an approximate result using a Monte-Carlo algorithm. The easiest option is to use the rejection sampling algorithm. Here is how to proceed for 10,000 samples:
print (P(query.calc(algo=MCRS, nb_samples=10000))) # -> 0.3585
P(query.estimate_mc(10000))
, in a more pedantic style.
Actually, the rejection sampling has the drawback to be inefficient if the given condition (here, grass_wet
) has a small probability to occur: the algorithm uses most of the processing time to find bindings satisfying the condition. The following statements, with other argument options, provide useful alternatives.
This variant generates 10 sub-samples each time a binding satisfying the condition grass_wet
is satisfied (speeding up the calculation):
print (P(query.calc(algo=MCRS, nb_samples=10000, nb_subsamples=10))) # -> 0.3595
grass_wet
) and MC for the conditioned part (rain
):
print (P(query.calc(algo=MCLW, nb_subsamples=10000))) # -> 0.357687675632294
rain
is referred in grass_wet
, it is embedded in the exact calculation, so this calculation boils down to the EXACT algorithm. Actually, setting nb_subsamples=1
is enough to get the exact result (without rounding errors caused by summation of 10,000 probabilities). Actually, the weighting likelihood algorithm would make sense for queries as:
print (P(grass_wet.given(rain).calc(algo=lea.MCLW, nb_subsamples=10000))) # -> 0.8066000000000276
grass_wet
with fixed rain=True
(probability weight 0.2), 2° random samples grass_wet
with fixed rain=False
(probability weight 0.8)
The full signature of the calc
method is:
calc(prob_type=-1, sorting=True, normalization=True, bindings=None,memoization=True, algo=EXACT, nb_samples=None, nb_subsamples=None, nb_tries=None, exact_vars=None)
algo
, nb_samples
, nb_subsamples
, nb_tries
, exact_vars
; the other arguments can safely keep their default values for the present needs. Note that the first two arguments (prob_type
and sorting
) are described in the section dedicated to advanced constructor arguments; the three following arguments normalization
, bindings
, and memoization
are described in this section.
algo
(default:EXACT
): four algorithms are available (note that these are available as variables in thelea
module, so you have to add the prefixlea.
if you import no more than the module):EXACT
: calculates the exact probability distribution using the Statues algorithm; for such choice, the remaining arguments shall not be used;MCRS
(Monte-Carlo Rejection Sampling): calculates an approximate probability distribution following the MC rejection sampling algorithm onnb_samples
random samples; if self is an Ilea instance, i.e. evaluating a conditional probabilityx.given(e)
, then the algorithm may be speed up by specifying thenb_subsamples
argument, which shall be a divisor ofnb_samples
; each time the conditione
is satisfied,nb_subsamples
random samples are taken on the conditioned partx
instead of a single one; specifyingnb_subsamples
is especially valuable if the condition has a small probability;MCLW
(Monte-Carlo Likelihood Weighting): this requires thatself
is an Ilea instance, i.e. a conditional probabilityx.given(e)
; this algorithm calculates an approximate probability distribution by making first an exact evaluation of the conditione
using the Statues algorithm; then, for each binding that verifies the condition with some probability p, it makesnb_subsamples
random samples on the conditioned partx
, assigning a weight p to these samples; this algorithm is especially valuable if the condition has a small probability while its exact evaluation is tractable; this algorithm accepts also an optionalexact_vars
argument, to include given variables in the exact evaluation, beyond these already referred in the condition (see MCEV);MCEV
(Monte-Carlo Exact Variables): calculates an approximate probability distribution by making first an exact evaluation of the variables given inexact_vars
using the Statues algorithm; for each binding found with some probability p, it makesnb_subsamples
random samples on remaining (unbound) variables, assigning a weight p to these samples; MCEV algorithm cannot handle expressions under condition, i.e.x.given(e)
; MCLW shall be used instead;
nb_samples
(default:None
): number of random samples made for MCRS algorithm;nb_subsamples
(default:None
): only for MCRS and MCLW algorithms and if self is an Ilea instance, i.e. a conditional probabilityx.given(e)
; it specifies the number of random samples made onx
for each binding verifying the conditione
; for MCRS,nb_subsamples
is optional, if specified it shall be a divisor ofnb_samples
; for MCLW,nb_subsamples
is mandatory;nb_tries
(default:None
): if notNone
, defines the maximum number of trials in case a random value is incompatible with a condition; this happens only if the current Lea instance is an Ilea instancex.given(e)
or is referring to such instance; for MCLW algorithm onx.given(e)
, it only applies onx
, should it refers to Ilea instances, sincee
is evaluated using the exact algorithm; if a condition cannot be satisfied afternb_tries
tries, then an error exception is raised; WARNING: ifnb_tries
isNone
, any infeasible condition shall cause an infinite loop;exact_vars
(default:None
): only for MCEV algorithm: an iterable giving the variables referred inself
that shall be evaluated by using the exact algorithm, the other ones being subject to random sampling;
On choosing the right algorithm and options...
EXACT is the default algorithm; it is the recommended algorithm for all tractable problems; it allows in particular to work with probability fractions and symbols. For untractable problems, the three other algorithms offer fallback options. MCRS algorithm with sole nb_samples
argument is the easiest option; choosing the value for nb_samples
is a matter of trade-off between result accuracy and execution time. If the evaluated expression contains conditions having low probabilities, then the MCRS algorithm may be inefficient: as a rejection sampling algorithm, it may use most of processing time to find bindings satisfying the condition. For improving the efficiency, the nb_subsamples
argument can be used: this allows making multiple random samples each time the condition is met, instead of a single one; the samples are generated until the condition has been satisfied n times, with n = nb_samples
/nb_subsamples
. For a given nb_samples
, increasing nb_subsamples
shall speed up the calculation; however, the result accuracy may tend to decrease if the condition is not visited enough (e.g. choosing nb_subsamples
= nb_samples
will satisfy the condition with only one binding). MCLW algorithm with mandatory nb_subsamples
argument is the best choice if the evaluation of the conditioned part is untractable meanwhile the condition is tractable (whatever its probability); every binding verifying the condition is covered and, for each one, nb_subsamples
random samples are generated, weighted by the binding's probability. MCEV algorithm is suited for untractable problems, from which a set of variables v1, ... , vn can be evaluated in a reasonable time or, in other words, if joint(
v1 ,
... ,
vn)
is tractable; if this set of variables is specified in exact_vars
argument, then all their value combinations are browsed systematically by the exact algorithm while random sampling is done for other variables.
Advanced calculation options (calc
method)
Suppose that you have defined a Lea instance x
with several dependencies to other instances. Remember that Lea works with lazy evaluation: when you execute statements like print (x)
, print (x.p(4)))
or avg = x.mean()
, a calculation occurs behind the scene to evaluate the pmf of x
, before doing the specific required action. This calculation is done implicitly, in order to ease your life. You can however ask to perform this calculation explicitly at any time by making this call:
x.calc()
x.new()
. As such, this call is not very useful except if you want to control finely when the calculation is done. Now, the real interest is that calc
has three optional arguments that let you make some customization of the calculation algorithm. Before describing these arguments, beware that these are meant for "advanced" use: you have to handle them with care if you want to avoid any misinterpretation of the delivered results.
The full signature of the calc
method is:
calc(prob_type=-1, sorting=True, normalization=True, bindings=None, memoization=True, algo=EXACT, nb_samples=None, nb_subsamples=None, nb_tries=None, exact_vars=None)
prob_type
and sorting
) are described in the section dedicated to advanced constructor arguments. The three following arguments (normalization
, bindings
and memoization
) are described below. Note that normalization
has been presented earlier but it requires a special coverage here. The last five arguments (algo
, nb_samples
, nb_subsamples
, nb_tries
and exact_vars
) are covered in the section dedicated to Monte-Carlo algorithms.
Avoiding normalization
The optional normalization
argument has the same purpose as explained for the normalization
argument in the pmf
method. By calling calc(normalization=False)
, the probabilities that are collected are NOT divided by their sum. This means that you can get a "pseudo" probability distribution, for which the probability sum is different from 1. This occurs in particular when you deal with conditional probabilities (given
method): the probability total you get is less than 1; the missing denominator is actually the probability of the given condition: in mathematical term: instead of getting P(A|B) = P(A&B)/P(B), you get P(A&B).
This feature may seem uninteresting but, in some cases, it could be used to have explanations on calculated probabilities. Consider the BN example given earlier. Imagine you want to know the probability that BOTH John AND Mary calls. This is easy to calculate:
>>> P(john_calls & mary_calls) 0.002084100239
joint
method:
lea.joint(earthquake, burglary, alarm, john_calls, mary_calls)
given
method to make the filtering for you:
>>> lea.joint(earthquake, burglary, alarm, john_calls, mary_calls).given(john_calls & mary_calls) (False, False, False, True, True) : 0.23895323731595242 (False, False, True , True, True) : 0.30138246147957953 (False, True , False, True, True) : 1.4365911696438337e-05 (False, True , True , True, True) : 0.28358309688769245 (True , False, False, True, True) : 0.00034033391807504135 (True , False, True , True, True) : 0.17515213192200016 (True , True , False, True, True) : 2.3991168497726016e-08 (True , True , True , True, True) : 0.0005743485738355601
normalization=False
trick:
>>> lea.joint(earthquake, burglary, alarm, john_calls, mary_calls).given((john_calls & mary_calls)).calc(normalization=False) (False, False, False, True, True) : 0.0004980024990000001 (False, False, True , True, True) : 0.0006281112599999999 (False, True , False, True, True) : 2.994000000000003e-08 (False, True , True , True, True) : 0.0005910156 (True , False, False, True, True) : 7.0929e-07 (True , False, True , True, True) : 0.0003650346 (True , True , False, True, True) : 5.0000000000000054e-11 (True , True , True , True, True) : 1.197e-06
.p_sum
:
>>> lea.joint(earthquake, burglary, alarm, john_calls, mary_calls).given((john_calls & mary_calls)).calc(normalization=False).p_sum 0.0020841002389999997
Making explicit bindings
In many cases, the evidences passed to the given method are equalities. This is sometimes called "observations": variable x
has been observed to be a given value u
, variable y
has been observed to be a given value v
, etc. For example, consider this situation which shows the probability distributioon of the sum of two dice given the observation of one of these die:
>>> d1 = lea.interval(1, 6) >>> d2 = lea.interval(1, 6) >>> d = d1 + d2 >>> d.given(d1 == 1) 2 : 0.16666666666666663 3 : 0.16666666666666663 4 : 0.16666666666666663 5 : 0.16666666666666663 6 : 0.16666666666666663 7 : 0.16666666666666663
(d1,d2)
for the variables and evaluates for each of them whether the condition d1 == 1
is true or false. This works fine but there is some waste of CPU to make this work, especially if the number of the observed variables is getting big (remember also that their domain can be non-boolean, with unlimited number of elements). Fortunately, this waste can be avoided for elementary variables (those defined by a known prior pmf, like d1
and d2
). The technique is called explicit bindings: this consists in specifying, in the bindings
argument, a dictionary associating the observed variables with the their values. This is demonstrated in the following expression:
>>> d.calc(bindings={d1: 1})
d2
. Note that the specified binding holds for the query only, so d1
is unbound as soon as the result has been calculated. A more extreme example is to bind all elementary variables:
>>> d.calc(bindings={d1: 1, d2: 4}) 5 : 1.0
Such technique could be efficient when the number of cases to examine is large. Remember however that it can be used only for elementary variables; an exception is raised if you try to bind a derived variable like d
.
Disabling memoization
TO BE WRITTEN
How to optimize probabilistic inference?
TO BE WRITTEN
- use floating-point representation
- use symbolic computation
- use
times
method - use multiple arguments in
given
method - explicit bindings
- replace
cpt
byswitch
orif_
method - cascaded CPT (contextual independence)
- calculate and save independent results
- use MC sampling
The Statues algorithm
The exact calculation of probabilities in Lea is made by an original algorithm, named the Statues algorithm. The present section provides a short introduction on this algorithm. For more in-depth coverage, references are provided at the end.
The What…
The Statues algorithm performs exact probabilistic inference, by opposition to approximate algorithms like the Monte-Carlos approaches seen above. This algorithm is limited strictly to discrete probabilities. The name "Statues" is borrowed from the popular Statues game; other names include "Red Light, Green Light" (US), "Grandmother's Footsteps" (UK), "1-2-3, SOLEIL!" (France) and "1-2-3, PIANO!" (Belgium).
Contrasting to several well-established exact algorithms, the Statues algorithm is able to deal with several probabilistic modeling approaches: as seen in the present tutorial, it can handle in particular probabilistic arithmetic, conditional probabilities, JPD, CPT, BN, Markov chains. Also, different modeling approaches can be seamlessly mixed together to provide more expressive models. For instance, assuming two BN variables x, y represent numbers, a new variable z = x + y can be created to represent their sum; this new variable z could then be queried to get its probability distribution given some evidence; also, z could be used itself to express some evidence. So, the Statues algorithm has the ability to unify several probabilistic modeling approaches.
Before going further, it's important to understand what are the input and output of the algorithm.
-
In input, it requires a given node from a given direct acyclic graph (DAG); this DAG defines how each variable depends from the others, up to "atomic" variables that are defined by simple explicit pmf.
-
In output, the algorithm returns the pmf of the given node.
The dependency relationships expressed in the input DAG can be arithmetic operators, logical operators, user-defined functions, CPT constructs, etc. This DAG is constructed on-the-fly by Lea, starting from prior pmf and following the different techniques shown in the present tutorial. Note that the DAG construction is out of the scope of the Statues algorithm; it's just a pretreatment process that builds a data structure, following recursive compositional rules. Sometimes, this is referred as "lazy evaluation" because the actual evaluation is done later, when needed.
If any node has no more than one parent, then the DAG is actually a tree, where the leaf nodes are explicit prior pmf. In such case, a recursive algorithm building a pmf from the pmfs calculated on each child nodes can do the job. Such case is specific to situations where any variable occur only at one place and influence only one other variable. Note that the very earliest prototype of Lea worked with such algorithm, which was fine for basic probabilistic arithmetic. In the general situations (e.g. given conditions), this constraint is not respected and the recursive algorithm mentioned above give wrong results because it does not enforce referential consistency: one given node referenced by two parent nodes will be seen as two independent events, although they share the same pmf. Here comes the "Statues" algorithm...
The How…
The Statues algorithm actually starts when calculating the pmf of a given node is required. This happens when you ask to display a variable or expression; this happens also when you call the P
function on such variable or expression.
How does it work?
Here are the most important features. The Statues algorithm uses generators instead of classical functions. In Python, a generator is, roughly speaking, a special routine that can halt at given places, delivers results to its caller, then resumes when commanded so by the caller. In Python source code, generators are easily identified in functions by the presence of yield
statement(s). Instead of asking children to calculate and return their pmf, the nodes ask their children to browse the possible variable bindings one by one and to generate the associated probabilities, ensuring the consistency at any stage. More precisely, any node requests its children nodes to generate atoms: for a given node X, the generated atoms are (v,p) pairs such that the sum of all p for a given v equals the probability P(X = v). If all children obey this rule, then any node can in turn apply simple rules to generate its own atoms (u,q) from the atoms it receives. In particular,
- a node representing the addition X+Y shall get any atom (x,p) from X and any atom (y,q) from Y and generates in turn the atom (x+y,pq);
- a node representing X | Y (i.e. X given Y) shall get any atom (y, q) from Y; then, only if y is true, it shall get the next atom (x,p) of X and generate the atom (x,pq);
- etc.
Finally, the (u,q) atoms received by the top-level node (input query) are aggregated by summing the probabilities value by value, producing the expected pmf. The generators created recursively on the nodes allows keeping the binding of any variable consistent. In contrast with usual functions, the created generators are alive during the whole calculation and work cooperatively to delivers individual atoms to the queried node, which is the sole node to calculate a pmf.
References
Admittedly, the explanations above just give you, at the best, a rough idea of the process. If you want a true understanding of the Statues algorithm, you can:
- read the paper "Probabilistic inference using generators - the Statues algorithm",
- have a look at MicroLea, a light Python implementation of the Statues algorithm for educational purpose.
What's next?
Thank you for reading the present tutorial pages! We hope that you found it enough clear and informative. If not done yet, you may have a look on Examples page.
If you have interests in Machine Learning, note that there is an extra tutorial page on this topic. It covers maximum likelihood and EM algorithms.
For enhancements or bugs, please create issues on bitbucket Lea page.
Feel free also to send an e-mail to pie.denis@skynet.be (English or French spoken). Use cases of Lea in your projects/researches are especially appreciated! A "references" page listing the projects using Lea could greatly improve the reputation of the library.
Updated