Wiki
Clone wikiLea / Lea3_Tutorial_2
< Tutorial [1/3]
Wiki Home
Tutorial [3/3] >
Table of Content
 Introduction
 Standard probability distributions
 Advanced constructor arguments
 Changing probability representation
 Joint probability distributions
 Bayesian reasoning
 Markov chains
 What's next?
Introduction
The present page is the second part of the tutorial for the Lea library. Along with the third part, it covers advanced topics in probability. It assumes that you have installed Lea 3 and that you are familiar with the techniques presented in the first part of the tutorial. The main topics are mostly independent, so you could choose to jump to your preferred subject without much trouble (see TOC above).
Note: You may also practice the present tutorial online in the Jupyter Notebook Tutorial #2 (no installation required)!
In some sections, we shall use some examples found in the excellent "Artificial Intelligence: A Modern Approach" book of Stuart Russell and Peter Norvig (second edition). The reader is invited to refer to this book, named the "AIMA book" in the following, for more details on presented methods and results. These examples are reproduced with the kind permissions of Stuart Russell and Peter Norvig.
Standard probability distributions
Beside generalpurpose functions presented in the first part of the tutorial, Lea provides several functions for defining standard probability distributions, as well as gambling devices.
Note that all these functions admit an optional prob_type
argument for setting a specific probability representation.
Bernoulli distribution
The method bernoulli(p)
defines a Bernoulli distribution: it gives the value 1 with probability p and 0 with probability 1p.
>>> lea.bernoulli(0.25) 1 : 0.25 0 : 0.75
Binomial distribution
The method binomial(n,p)
defines a binomial distribution: it gives the number of successes among a number n of independent experiments, each having probability p of success. For example, suppose a biased coin comes up heads with probability 0.3 when tossed. What is the probability of achieving 0, 1, … or 6 heads after six tosses?
>>> lea.binom(6,0.3) 0 : 0.11764899999999995 1 : 0.3025259999999999 2 : 0.324135 3 : 0.18522 4 : 0.059535 5 : 0.010206000000000002 6 : 0.000729
binomial(1,p)
is equivalent to bernoulli(p)
. Also, this distribution can be obtained also by using the times(n)
method: the binomial distribution above is exactly the same as
>>> lea.bernoulli(0.3).times(6)
Poisson distribution
The method poisson(m)
defines a Poisson distribution having mean m.
>>> lea.poisson(2) 0 : 0.1353352832366127 1 : 0.2706705664732254 2 : 0.2706705664732254 3 : 0.1804470443154836 4 : 0.0902235221577418 5 : 0.03608940886309672 6 : 0.012029802954365574 7 : 0.0034370865583901638 8 : 0.0008592716395975409 9 : 0.00019094925324389798 10 : 3.8189850648779595e05
poisson(2,1e30)
. Note that, since the highest values are dropped, the remaining probabilities are all slightly overestimated in order to keep a probability total equal to 1.
Convenience functions and objects
Beside the standard probability distributions seen above, Lea provides a couple of convenience functions and prebuilt distributions related to gambling. These functions and objects are not loaded by default when importing lea; these are in a separated module called lea.leaf
that you must import:
from lea import leaf
function  equivalent to  description 

leaf.die(nb_faces=6) 
lea.interval(1,nb_faces) 
a fair die with faces marked from 1 to nb_faces (default: 6) 
leaf.dice(nb_dice,nb_faces=6) 
lea.interval(1,nb_faces).times(nb_dice) 
total of nb_dice independent fair dice with faces marked from 1 to nb_faces (default: 6) 
leaf.dice_seq(nb_dice,nb_faces=6,sorted=True) 
lea.interval(1,nb_faces).draw(nb_dice,sorted=sorted,replacement=True) 
tuple with nb_dice elements showing a throw of nb_dice independent fair dice with faces marked from 1 to nb_faces (default: 6); if sorted is True (default), then the combinations of dice which are the same apart from their order of throw are considered equal; the particular value used is chosen to be in order from smallest to largest value; if sorted is False , then all nb_dice**nb_faces combinations are produced, with equal probabilities 
leaf.D6 
lea.interval(1,6,prob_type='r') 
a fair die with 6 faces (fraction probability type) 
leaf.flip 
lea.event('1/2',prob_type='r') 
a True/False variable with 5050 chances (fraction probability type) 
leaf.card_suite 
lea.vals(*'SHDC',prob_type='r') 
random onecharacter symbol representing a card suite among Spades, Hearts, Diamonds and Clubs (fraction probability type) 
leaf.card_rank 
lea.vals(*'A23456789TJQK',prob_type='r') 
random onecharacter symbol representing a card rank among Ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen and King (fraction probability type) 
leaf.card 
leaf.card_rank + leaf.card_suite 
random twocharacters symbol representing a card chosen in a deck of 52 cards, leaf.card is the concatenation of leaf.card_suite and leaf.card_rank (fraction probability type) 
Note that the prebuilt objects use fractions for probabilities. On the other side, the three function work with the default probability type; they also admit an optional prob_type
argument, which allows choosing a probability type different from the default one. For example, here is how you could see the probabilities of throwing two fair dice, with fractions or with floatingpoint numbers:
>>> leaf.dice(2,prob_type='r') 2 : 1/36 3 : 2/36 4 : 3/36 5 : 4/36 6 : 5/36 7 : 6/36 8 : 5/36 9 : 4/36 10 : 3/36 11 : 2/36 12 : 1/36
>>> card.random_draw(13) ('3S', 'KH', '3H', 'AC', '6D', '9D', '3C', 'QH', 'KC', '8D', 'KD', '2S', 'TD')
leaf.card
, leaf.card_suite
and leaf.card_rank
are interdependent:
>>> card.given(card_suite=='S',card_rank.is_none_of(*'JQKA')) 2S : 1/9 3S : 1/9 4S : 1/9 5S : 1/9 6S : 1/9 7S : 1/9 8S : 1/9 9S : 1/9 TS : 1/9 >>> card_rank.given(card.is_any_of('8S','KS','8H','KC')) 8 : 1/2 K : 1/2 >>> card_suite.given(card.is_any_of('8S','KS','8H','KC')) C : 1/4 H : 1/4 S : 2/4
Advanced constructor arguments
The pmf
and new
method accept several optional arguments, which may be useful in some occasions. The following table provides the description of these arguments
argument  default  description 

sorting 
True 
if True , then the values for displaying the distribution or getting the values will be sorted if possible (i.e. no exception on sort); otherwise, the order of values is unspecified unless ordered=True 
ordered 
False 
if True , then the values for displaying the distribution or getting the values will follow the order in which you provide them (requires that sorting=False and that the given pmf is an iterable or a collections.OrderedDict) 
normalization 
True 
if True , then each element of the given ps is divided by the sum of all ps before being stored (in such case, it's not mandatory to provide true probabilities; these could be simple counters, for example) 
prob_type 
1 
allows converting the given probabilities: 1: no conversion, other codes here 
check 
True 
if True , then the constructor may make some checks on the given data and report errors, if any (exception); if you guarantee the given dat, you can make a slight optimization by setting the check=False 
Changing probability representation
In the Lea 3 first tutorial, we have seen that probabilities are stored and displayed as floatingpoint numbers. This is a classical/conservative representation, which is used in the majority of statistical/probability packages. Other representations may however be more suitable. Lea proposes actually four choices for representing probabilities, namely floats, decimals, fractions and symbols:
 float: based on Python builtin float type
 decimal: based on Python builtin Decimal class
 fraction: based on Python builtin Fraction class
 symbol: based on SymPy’s classes (requires installation of the SymPy, a Python library for symbolic mathematics).
The representation as floatingpoint numbers is clearly the most standard/conservative representation; it's easy to use and it integrates seamlessly with many functions and libraries. However, floatingpoint numbers present several drawbacks due to its rounding errors. In particular, it's not the most suited when dealing with probability distributions calculated by combinatorial (e.g. cards and dice games). The three other probability representations provide alternatives that can better fit your type of problems.
There are several ways to specify the type used for representing probability in a given distribution: by automatic detection, by passing declaration, by permanent declaration.
Setting probability type by automatic detection
Lea is able to choose the probability representation based on the way you express it. This is probably the simplest way to work when you use Lea in an interactive session. The rules are quite simple:
 If the probabilities are written as number literals or expressions, then the float type is used
 If the probabilities are written as strings, then conversions are tempted until success in the following order: Decimal, Fraction, SymPy's Symbol.
Here is a table giving examples of the different possible cases:
if you type  you get 

0.333 
0.333 as a float instance 
'0.333' 
0.333 as a Decimal instance 
1/3 
0.333333... as a float instance 
'1/3' 
1/3 as a Fraction instance 
'p' 
p as a Sympy Symbol instance 
To illustrate this, here are some samples showing how a Boolean probability distribution and a probability distribution with the 3 possible outcomes "A", "B" and "C", can be represented.
Probabilities as floatingpoint numbers
>>> lea.event(2/3) True : 0.6666666666666666 False : 0.33333333333333337 >>> lea.pmf({"A":1/6, "B":1/3, "C":1/2}) A : 0.16666666666666666 B : 0.3333333333333333 C : 0.5
>>> lea.event(0.6666666666666666) >>> lea.pmf({"A":1, "B":2, "C":3}) >>> lea.pmf({"A":0.16666666666666666, "B":0.33333333333333334, "C":0.5})
Probabilities as decimal numbers
>>> lea.event('0.666666666666666666666') True : 0.666666666666666666666 False : 0.333333333333333333334 >>> lea.pmf({"A":'0.1666666666666666666', "B":'0.3333333333333333334', "C":'0.5'}) A : 0.1666666666666666666 B : 0.3333333333333333334 C : 0.5
Probabilities as fractions
>>> lea.event('2/3') True : 2/3 False : 1/3 >>> lea.pmf({"A":'1/6', "B":'1/3', "C":'1/2'}) A : 1/6 B : 2/6 C : 3/6
Note that the probabilities are displayed using the common denominator, so as to ease comparisons; when extracting a single probability value however, the fraction is reduced.
Probabilities as variables (symbolic representation)
Reminder: the following requires the SymPy library.
>>> lea.event('p') True : p False : p + 1 >>> lea.pmf({"A":'pA', "B":'pB', "C":'pC'}) A : pA/(pA + pB + pC) B : pB/(pA + pB + pC) C : pC/(pA + pB + pC)
Setting probability type by passing declaration
As alternative to automatic detection that we have seen above, you can specify the required type explicitly in the constructor. This is done by specifying the optional prob_type
argument on Lea constructors; the simplest way to define this argument is to specify a onecharacter code, among 'f'
, 'd'
, 'r'
, 's'
and 'x'
.
prob_type 
probability type 

'f' 
float 
'd' 
Decimal 
'r' 
Fraction 
's' 
Symbol 
'x' 
automatic (default) 
The 'x'
code is the default: it makes automatic probability type detection, as explained before. Here are some examples, using the other codes:
>>> lea.event(1/3,prob_type='d') True : 0.333333333333333314829616256247390992939472198486328125 False : 0.6666666666666666851703837438 >>> lea.event(1/3,prob_type='r') True : 6004799503160661/18014398509481984 False : 12009599006321323/18014398509481984 >>> lea.event('0.6666666666666667',prob_type='r') True : 6666666666666667/10000000000000000 False : 3333333333333333/10000000000000000 >>> lea.pmf({"A":1/6, "B":1/3, "C":1/2},prob_type='r') A : 6004799503160661/36028797018963967 B : 12009599006321322/36028797018963967 C : 18014398509481984/36028797018963967 >>> lea.pmf({"A":1, "B":2, "C":3},prob_type='r') A : 1/6 B : 2/6 C : 3/6
prob_type
specifies is the target probability type, towards which the given probability values will be converted. If you provide float numbers in input, as in the examples above, you get results values of the target type that reveal the inner float representation (which can be somehow surprising!).
The prob_type
argument is also required if you want to change the probability type of a distribution that is specified by a list of values (since you don't provide any probability explicitly):
>>> lea.vals("A","B","B","C","C","C",prob_type='r') A : 1/6 B : 2/6 C : 3/6
>>> lea.event(1/3,prob_type='s') True : 0.3333333333333333 False : 0.6666666666666667 >>> lea.event('1/3',prob_type='s') True : (1/3) False : (1/3) + 1
1/3
is a float, not a string, so it is kept as a float. In the second case, '1/3'
is a string, then it is converted to a symbol; note that it has been surrounded by parentheses, in order to avoid any misinterpretations in expression (this happens only if the string is not a valid variable name, according to Python's isidentifier
method).
Setting probability type by permanent declaration
If you want to avoid specifying prob_type
over and over, then you may invoke lea.set_prob_type(…)
static method to set it once for all (well, more precisely: until the next call to lea.set_prob_type
or until the end of your program/session). This sets the default probability type, the argument being interpreted as seen before for the prob_type
argument.
Suppose that you want to work only with probability fractions. You'll do this:
>>> lea.set_prob_type('r')
>>> lea.pmf({"A":1, "B":2, "C":3}) A : 1/6 B : 2/6 C : 3/6 >>> lea.vals("A","B","B","C","C","C") A : 1/6 B : 2/6 C : 3/6
'x'
code:
>>> lea.set_prob_type('x')
Converting from one probability type to another
It's sometimes useful to convert the probabilities of given probability distribution to a given type. Typically, you will like to convert fractions to floatingpoint numbers in order to verify results found in the literature. This is easy to do: just invoke the new(prob_type='_t_')
method, where argument _t_
is the type code define in the table above, and you'll get a cloned probability distribution with the required type.
Here is an example of conversion from fractions to floats:
>>> flip = lea.vals('Head','Tail',prob_type='r') >>> flip Head : 1/2 Tail : 1/2 >>> flip.new(prob_type='f') Head : 0.5 Tail : 0.5
 conversions of symbolic probabilities,
 conversions from fractions to decimals.
For the last case, a workaround consists in chaining a conversion to float with a conversion to decimal, i.e. .new(prob_type='f').new(prob_type='d')
.
If you need to convert an individual probability, as returned for example by the P
function, you have to call builtin Python functions or types:
>>> lea.P(flip == 'Tail') 1/2 >>> float(lea.P(flip == 'Tail')) 0.5
Pf
, which performs this conversion directly on the result of P
:
>>> lea.Pf(flip == 'Tail') 0.5
Joint probability distributions
Calculating joint probability distributions (joint
method)
Given a set of Lea probability distributions, the joint(d1,d2,...)
method allows you to build a new distribution that represents the joint distribution of the given probability distributions d1
, d2
, .... Basically, it builds up tuples giving all possible combinations of values of given distributions, with the associated probability. If the arguments are independent variables, then the returned joint distribution is a cartesian product, which gives all possible combinations. Here is a first example:
>>> die1 = lea.interval(1,6,prob_type='r') >>> die2 = die1.new() >>> lea.joint(die1,die2) (1, 1) : 1/36 (1, 2) : 1/36 (1, 3) : 1/36 (1, 4) : 1/36 (1, 5) : 1/36 (1, 6) : 1/36 (2, 1) : 1/36 (2, 2) : 1/36 (2, 3) : 1/36 ... (6, 3) : 1/36 (6, 4) : 1/36 (6, 5) : 1/36 (6, 6) : 1/36
lea.joint
method has unlimited number of arguments. If there are some dependencies between the given arguments, the tuples are built up consistently, such that they contain only possible combinations:
>>> lea.joint(die1,die1) (1, 1) : 1/6 (2, 2) : 1/6 (3, 3) : 1/6 (4, 4) : 1/6 (5, 5) : 1/6 (6, 6) : 1/6 >>> lea.joint(die1,7die1) (1, 6) : 1/6 (2, 5) : 1/6 (3, 4) : 1/6 (4, 3) : 1/6 (5, 2) : 1/6 (6, 1) : 1/6
Calculating joint probability distribution may seem uninteresting for most of real use cases but, actually, it is used behind the scene for most of Lea's calculations. Also, it could be helpful, for a user point of view, in order to understand (or verify) how Lea calculates a probability distribution from atomic events.
For instance, imagine you have executed the following query to give the probability distribution of the first die, knowing the fact that the total of the two dice does not exceed 4:
>>> dice = die1 + die2 >>> die1.given(dice <= 4) 1 : 3/6 2 : 2/6 3 : 1/6
>>> lea.joint(die1,die2,dice).given(dice <= 4) (1, 1, 2) : 1/6 (1, 2, 3) : 1/6 (1, 3, 4) : 1/6 (2, 1, 3) : 1/6 (2, 2, 4) : 1/6 (3, 1, 4) : 1/6
die1
is the first element of the tuples, so you can now easily check the given results by adding atomic probabilities 1/6 together.
Defining joint probability distributions
So far, we have seen how a joint probability distribution can be calculated. In the present section, we shall see how a joint probability distribution (JPD) can be constructed from scratch.
Let us briefly explain the motivation for constructing a JPD. So far, we have seen different techniques to build random variables mutually dependent, by defining arithmetic/logical rules explaining how each variable is derived from others. This assumes that the rules are known and are expressible in mathematical formula. As we have seen, this works fine in many "simple" situations, like those found in gambling. However, our world is usually more complex than that: in many situations, the rules are unknown or too complex to model for a human being. In such case, one approach is collecting data and trying to find correlations in these data. This approach, studied in statistics field, is probably one of the first steps towards the now popular research field called "Machine Learning". Constructing a JPD from collected data is a basic way to express dependencies between random variables: the idea is to identify a set of variables and build a table thereof giving the probability of each possible combination of these variables. When the actual probabilities are missing (which is often the case), these are replaced by frequencies of occurrences; of course, providing as many data as possible should tend to improve the model (at least, this explains the hype with that "Big Data" thing!).
To illustrate the definition of JPD on a simple example, we shall use the example of 3 boolean variables Toothache / Cavity / Catch, given in the AIMA book referred in the introduction (section 13.4 in the second edition). We assume that there are some dependencies between these three variables, but in contrast to all previous examples, we assume that it is impossible to express these dependencies as formulas with logical operators. What we have however is a table indicating the 2 x 2 x 2 = 8 probabilities of each True/False combinations. We shall first make a temporary JPD with 3tuples: for instance (True,False,False)
shall mean "toothache AND no Cavity AND no Catch". For this purpose, we call the usual pmf
function:
>>> (T,F) = (True,False) >>> _jpd = lea.pmf({ (T,T,T): '0.108', (T,F,T): '0.012', (F,T,T): '0.072', (F,F,T): '0.008', ... (T,T,F): '0.016', (T,F,F): '0.064', (F,T,F): '0.144', (F,F,F): '0.576' }) >>> _jpd (False, False, False) : 0.576 (False, False, True ) : 0.008 (False, True , False) : 0.144 (False, True , True ) : 0.072 (True , False, False) : 0.064 (True , False, True ) : 0.012 (True , True , False) : 0.016 (True , True , True ) : 0.108
Incidentally, we have put the given probabilities between quotes in order to use decimal numbers (this is not mandatory, this is just to avoid unpleasant rounding errors occurring with floatingpoint numbers, as explained here). The _jpd
distribution captures the given JPD data but it misses the name of the underlying variables (Toothache / Cavity / Catch). The method as_Joint(…)
allows defining a new JPD, which is aware of the variable names:
>>> jpd = _jpd.as_joint('toothache','catch','cavity') toothache, catch, cavity (False , False, False ) : 0.576 (False , False, True ) : 0.008 …
as_joint(…)
does is converting Python usual tuples into named tuples. If you want to see the true inner representation of these named tuple, you can do this:
>>> print (jpd.as_string(tabular=False)) _(toothache=False, catch=False, cavity=False) : 0.576 _(toothache=False, catch=False, cavity=True) : 0.008 …
Now, jpd
is a JPD ready to be used: it can be queried to get probabilities of given variables or logical combination of variables.
>>> P(jpd.cavity) 0.2 >>> P(jpd.cavity  jpd.toothache) 0.28 >>> P(jpd.cavity & ~jpd.toothache) 0.08
You can check that the variables defined in the joint distribution are actually dependant, as P(catch AND cavity) is different from P(catch) x P(cavity):
>>> P(jpd.catch & jpd.cavity) 0.18 >>> P(jpd.catch) * P(jpd.cavity) 0.068
 the probability of cavity for a patient having toothache,
 the probability of cavity for a patient having toothache and a probe being caught.
>>> P(jpd.cavity.given(jpd.toothache)) 0.6 >>> P(jpd.cavity.given(jpd.toothache & jpd.catch)) 0.8709677419354838709677419355
The conditional probability method can be used also to produce "reduced" or "filtered" joint distributions from the initial joint distribution, on the basis of known facts or assumptions:
>>> jpd.given(jpd.cavity, ~jpd.toothache) toothache, catch, cavity (False , False, True ) : 0.1 (False , True , True ) : 0.9 >>> jpd.given(jpd.cavity) toothache, catch, cavity (False , False, True ) : 0.04 (False , True , True ) : 0.36 (True , False, True ) : 0.06 (True , True , True ) : 0.54
>>> P(jpd.catch.given(jpd.cavity, ~jpd.toothache)) 0.9 >>> lea.joint(jpd.toothache,jpd.catch).given(jpd.cavity).as_joint('toothache','catch') toothache, catch (False , False) : 0.04 (False , True ) : 0.36 (True , False) : 0.06 (True , True ) : 0.54
jpd.
over and over, there's some good news! You can assign to your own variables and rely to the lazy evaluation of Lea. For instance, if you make the following assignments:
>>> toothache = jpd.toothache >>> catch = jpd.catch >>> cavity = jpd.cavity
jpd.
prefix. This assignment trick is actually something that can be used for any Lea expression. For the special case of joint distributions, the convenience function make_vars
is handy to avoid typing boilerplate statements. Here is how you shall call it:
lea.make_vars(jpd,globals())
>>> lea.make_vars(jpd,globals(),'_','2') >>> P(_toothache2) 0.2
Beyond boolean random variables, JPD can also be used for numerical values or other Python objects, for which you can evaluate expressions containing usual comparison operators. We shall now elaborate this topic, showing how Lea can be used to read larger set of data and get statistics on them. First, we shall see how to build a JPD from a given CSV file or from a given pandas dataframe. Then, we shall show advanced techniques to extract data from the JPD, including filtering, marginalisation, grouping, etc.
Building JPD from dataset
Building JPD from a CSV file (read_csv_file
method)
For the example, we shall assume that we have a database of 50 patients within a CSV file (the fields are: given name, surname, gender, title, birthday, blood_type, weight in kg, height in cm, smoker (Y/N)):
patients.csv
Elaine,McLaughlin,female,Ms.,12011984,O+,44.9,141.0,Y Alan,Music,male,Mr.,30121966,A,61.3,181.0,Y Debra,Roberts,female,Mrs.,01121927,O+,79.6,168.0,N Martine,Albright,female,Mrs.,05011975,O+,46.0,156.0,N Terry,Jordan,male,Mr.,28121963,A,96.9,181.0,N Joshua,Spinney,male,Mr.,19121952,O+,106.4,183.0,N Tawanna,Short,female,Ms.,02022012,O+,93.4,175.0,N Frances,Moore,female,Ms.,07011978,B+,91.6,164.0,N Jon,Overbey,male,Mr.,12011985,A+,60.9,164.0,N Milton,Silva,male,Mr.,04121932,O+,88.9,177.0,N Damon,Kiser,male,Mr.,09121938,A+,97.2,181.0,N Ruby,Nunez,female,Mrs.,30121966,O,23.6,126.0,Y David,Seguin,male,Mr.,21011996,A+,28.8,118.0,N Lillian,Wooley,female,Ms.,28121964,A,54.5,156.0,N Lena,Sharp,female,Ms.,02011971,AB+,58.4,170.0,N Felecia,Decker,female,Mrs.,20121953,O+,95.8,172.0,N Belva,Dry,female,Mrs.,07121936,AB+,76.9,151.0,N Lawrence,McGovern,male,Mr.,06011976,A+,16.8,91.0,Y Shelley,Bailey,female,Ms.,14121945,A+,91.2,157.0,N Mildred,Heaps,female,Mrs.,01011970,O+,67.8,162.0,N Melinda,Edie,female,Ms.,20121953,A+,34.5,145.0,N Angela,Vaz,female,Mrs.,11121941,AB+,30.3,126.0,N Gilberto,Schultz,male,Mr.,02011971,B+,63.5,175.0,N Dianne,Zubia,female,Ms.,28012006,O+,69.1,157.0,N Truman,Conner,male,Mr.,10121940,O,105.4,176.0,N Dorothy,Keegan,female,Mrs.,14121945,A+,54.7,174.0,N Pamela,Guzman,female,Ms.,11121941,B,70.9,154.0,Y John,Davenport,male,Mr.,04011974,B+,96.9,177.0,N Dorothy,Christen,female,Ms.,05121933,B+,98.0,171.0,N Mary,Baird,female,Mrs.,26012003,O+,61.3,170.0,N Luis,Boyett,male,Mr.,17011991,O+,103.4,170.0,N Tricia,Lombardi,female,Mrs.,13121944,A+,50.5,155.0,Y Joshua,West,male,Mr.,17011991,A+,14.9,93.0,Y Jimmie,Martinez,male,Mr.,03121930,B+,114.0,171.0,N Carla,Moore,female,Ms.,24121958,B,61.5,154.0,N Megan,Johanson,female,Mrs.,16121948,B+,85.3,159.0,N Jenna,Tipton,female,Ms.,06011977,O+,80.1,155.0,Y Goldie,Maestas,female,Mrs.,06121934,O,83.4,161.0,N Jeffrey,Evatt,male,Mr.,26121961,O+,99.6,183.0,N Lloyd,Carroll,male,Mr.,16121947,B,98.6,189.0,N Dennis,Akins,male,Mr.,07011978,B+,66.2,174.0,N Naomi,Paulus,female,Mrs.,04022015,B+,73.4,167.0,N Dallas,Iverson,male,Mr.,03011973,A+,66.5,177.0,N Thomas,Boren,male,Mr.,10121939,A+,67.9,188.0,N Daniel,Helms,male,Mr.,22121956,A+,87.0,174.0,N Muriel,Labrecque,female,Ms.,30012009,B+,48.3,155.0,Y Bernice,Humphries,female,Mrs.,26012003,O+,23.2,123.0,N Kimberly,Victory,female,Ms.,10011982,A+,100.8,167.0,N Pam,Beach,female,Mrs.,12121942,O+,79.8,155.0,N Rita,Hines,female,Ms.,06011977,A+,51.5,166.0,N
Here is how you can read this file and build a joint probability distribution from it:
>>> patient = lea.read_csv_file('patients.csv',col_names=('given_name','surname','gender','title','birthday','blood_type','weight{f}','height{f}','smoker{b}'),prob_type='d')
suffix  conversion type 

{s} 
string (note that this suffix can be omitted) 
{i} 
integer 
{f} 
float 
{b} 
boolean 
For converting string to boolean, the following rules are applied (case insensitive):
 '1', 't', 'true', 'y', 'yes' > Python's
True
,  '0', 'f', 'false', 'n', 'no' > Python's
False
As we shall see, putting {f} on weight and height shall allow to make comparisons and calculations and these attributes; putting {b} on smoker shall ease the queries by removing references to 'Y'/'N' (see below).
Now, patient
is a JPD with named attributes, which you can display in tabular format
>>> patient given_name , surname , gender , title , birthday , blood_type, weight, height, smoker ('Alan' , 'Music' , 'male' , 'Mr.' , '30121966', 'A' , 61.3, 181.0, True ) : 0.02 ('Angela' , 'Vaz' , 'female', 'Mrs.', '11121941', 'AB+' , 30.3, 126.0, False ) : 0.02 ('Belva' , 'Dry' , 'female', 'Mrs.', '07121936', 'AB+' , 76.9, 151.0, False ) : 0.02 ('Bernice' , 'Humphries' , 'female', 'Mrs.', '26012003', 'O+' , 23.2, 123.0, False ) : 0.02 ('Carla' , 'Moore' , 'female', 'Ms.' , '24121958', 'B' , 61.5, 154.0, False ) : 0.02 …
>>> patient.gender female : 0.6 male : 0.4 >>> patient.title Mr. : 0.4 Mrs. : 0.3 Ms. : 0.3
Handling of CSV formats
The read_csv_file
method as used above will work fine provided that your CSV file as a format similar to the format of our patient.csv file example. Here are indications on how to handle other CSV format.
Some CSV files contain a first line to be interpreted as a header containing field names.
patients2.csv
given_name,surname,gender,title,birthday,blood_type,weight,height,smoker Elaine,McLaughlin,female,Ms.,12011984,O+,44.9,141.0,Y Alan,Music,male,Mr.,30121966,A,61.3,181.0,Y …
col_names
argument in the call to read_csv_file
:
>>> patient = lea.read_csv_file('patients2.csv',prob_type='d')
Another thing to know is that read_csv_file
relies on Python's standard csv module and it supports the same arguments (with same default) as csv.reader
function, namely dialect
and **fmtparams
. So, assuming for instance that your CSV file is tabdelimited, you can use:
>>> myData = lea.read_csv_file('data.csv',colNames=…,delimiter='\t')
Lea.read_csv_file
(instead of a filename). For example, in Python 3,
>>> with open('exoticData.csv','r',encoding='utf8') as f ... myExoticData = Lea.fromCSVFile(f,…)
Building JPD from a pandas dataframe
If you do data analysis using pandas library, you would be tempted to build a Lea JPD from a pandas "dataframe". To do this, you could manage to extract raw tuples from the data frames and use Lea from_joint
method. Actually, Lea provides a dedicated method to do that, Lea.from_pandas_df
. In the following example, we build first a pandas dataframe by reading the abovementioned patients2.csv file (this time using pandas method, not Lea's).
>>> import pandas as pd >>> df = pd.read_csv('patients2.csv')
read_pandas_df
function.
>>> patient2 = lea.read_pandas_df(df,prob_type='d')
Data mining on JPD
As explained in Defining joint probability distributions, all JPD attributes are accessed by using the standard dot syntax. Remember that, to work with simpler expressions, you could assign variables:
>>> given_name = patient.given_name >>> surname = patient.surname >>> gender = patient.gender >>> …
lea.make_vars(patient,globals())
patient
's attributes, without the patient.
prefix. That's what we'll assume in the following section. Should you prefer to be safer with your namespace, simply keep prefixing these attribute variables with patient.
.
We can now start to query the JPD data. In all examples below, the returned probabilities have to be interpreted primarily as frequencies in the given 50 patients dataset; if the sample is large enough, and enough representative of the whole population, then these figures could be interpreted as probabilities.
Let's calculate the distribution of genders and titles.
>>> gender female : 0.6 male : 0.4 >>> title Mr. : 0.4 Mrs. : 0.3 Ms. : 0.3
>>> lea.joint(gender,title) ('female', 'Mrs.') : 0.3 ('female', 'Ms.' ) : 0.3 ('male' , 'Mr.' ) : 0.4
>>> height_by_ten = (height//10) * 10 >>> height_by_ten 90.0 : 0.04 110.0 : 0.02 120.0 : 0.06 140.0 : 0.04 150.0 : 0.24 160.0 : 0.16 170.0 : 0.3 180.0 : 0.14
>>> height_cat = height_by_ten.map(lambda h: 'H%03d%03d'%(h,h+9)) >>> height_cat H090099 : 0.04 H110119 : 0.02 H120129 : 0.06 H140149 : 0.04 H150159 : 0.24 H160169 : 0.16 H170179 : 0.3 H180189 : 0.14
>>> (given_name+' is '+(height/100.).map(str)+' m high').given(height_cat=='H120129') Angela is 1.26 m high : 0.3333333333333333333333333333 Bernice is 1.23 m high : 0.3333333333333333333333333333 Ruby is 1.26 m high : 0.3333333333333333333333333333
>>> female = gender=='female' >>> male = gender=='male' >>> height_cat.given(female) H120129 : 0.1 H140149 : 0.06666666666666666666666666667 H150159 : 0.4 H160169 : 0.2333333333333333333333333333 H170179 : 0.2 >>> height_cat.given(male) H090099 : 0.1 H110119 : 0.05 H160169 : 0.05 H170179 : 0.45 H180189 : 0.35
height_cat
, male
and female
variables are interdependent. Another approach, more integrated, is to use the joint operation:
>>> lea.joint(gender,height_cat) ('female', 'H120129') : 0.06 ('female', 'H140149') : 0.04 ('female', 'H150159') : 0.24 ('female', 'H160169') : 0.14 ('female', 'H170179') : 0.12 ('male' , 'H090099') : 0.04 ('male' , 'H110119') : 0.02 ('male' , 'H160169') : 0.02 ('male' , 'H170179') : 0.18 ('male' , 'H180189') : 0.14
>>> bmi = (weight/(height/100.)**2).map(round) >>> bmi = (weight/(height/100.)**2).map(round) >>> bmi 15 : 0.04 16 : 0.02 17 : 0.02 18 : 0.02 19 : 0.1 20 : 0.06 21 : 0.1 22 : 0.04 23 : 0.04 26 : 0.06 28 : 0.08 29 : 0.02 30 : 0.1 31 : 0.02 32 : 0.06 33 : 0.04 34 : 0.1 36 : 0.04 37 : 0.02 39 : 0.02 >>> bmi.mean_f 26.22
>>> underweight = bmi < 18.5 >>> obese = bmi >= 30 >>> normal = ~underweight & ~obese >>> P(underweight) 0.1 >>> P(obese) 0.4 >>> P(normal) 0.5
cat_bmi
, mapping the BMI values into strings, using if
/ elif
construct or, simpler, the Python's bisect
module).
As a final example, let's select the female patients who are underweight and aged 18 years or more; for these patients, we want to display the full name, age, height, weight and BMI. Let's first build the full name and age; to ease things, we shall calculate the approximate age, for the year 2018, without considering the birth day and month.
>>> full_name = title + ' ' + given_name + ' ' + surname >>> age = 2018  birthday[4:].map(int)
>>> lea.joint(full_name,age,height,weight,bmi).given(female,underweight,age>=18) ('Mrs. Dorothy Keegan', 73, 174.0, 54.7, 18) : 0.3333333333333333333333333333 ('Mrs. Ruby Nunez' , 52, 126.0, 23.6, 15) : 0.3333333333333333333333333333 ('Ms. Melinda Edie' , 65, 145.0, 34.5, 16) : 0.3333333333333333333333333333
lea.joint
function to reorder attributes as needed. An alternative to avoid this consists in using the sort_by
method, which takes any number of attributes as arguments. For example here is how you can order underweight patients by their blood type, then their gender, then their BMI:
>>> patient.given(underweight).sort_by(blood_type,gender,bmi) given_name, surname , gender , title , birthday , blood_type, weight, height, smoker ('Melinda', 'Edie' , 'female', 'Ms.' , '20121953', 'A+' , 34.5, 145.0, False ) : 0.2 ('Dorothy', 'Keegan' , 'female', 'Mrs.', '14121945', 'A+' , 54.7, 174.0, False ) : 0.2 ('Joshua' , 'West' , 'male' , 'Mr.' , '17011991', 'A+' , 14.9, 93.0, True ) : 0.2 ('Bernice', 'Humphries', 'female', 'Mrs.', '26012003', 'O+' , 23.2, 123.0, False ) : 0.2 ('Ruby' , 'Nunez' , 'female', 'Mrs.', '30121966', 'O' , 23.6, 126.0, True ) : 0.2
WARNING: The sort_by
method returns a distribution that has lost any dependency relationship. It has to be the last call before displaying the results. As a counterexample, the following construct patient.sort_by(blood_type,gender,bmi).given(underweight)
will display the full list of patients, ignoring the filtering condition on underweight
.
Bayesian reasoning
So far, we have seen that the given
method allows calculating conditional probabilities. However, there is a whole class of probability problems where conditional probabilities P(AB) are given. The general formula
P(A  B) . P(B) = P(A and B)
allows you, when applied with care and patience, answering most problems. We have seen how to define dependant random variables through joint distributions. Basically, this boils down to define P(A and B and ...) for each conjunction of 'atomic' events A, B, ... . This method allows modeling very accurately the dependences and to answer any query, including conditional probabilities. However, it requires the knowledge of the probability of each atomic combination, which could be difficult to know. Also, it has the drawback that combinatorial explosion entails the need to enter huge buckets of probability values.
As alternative to joint distributions, Lea provides a more effective approach using conditional probability tables (CPT). The idea is to reduce the degrees of freedom of joint distributions by modeling only causal relationships. In the following, we shall see that this approach is general and powerful when dealing with problems where conditional probabilities are given. We will see that the CPT technique is the cornerstone for Bayes reasoning.
Introducing CPT with a beetle
We shall start with the "beetle" example (Wikipedia):
"An entomologist spots what might be a rare subspecies of beetle, due to the pattern on its back. In the rare subspecies, 98% have the pattern, or P(PatternRare) = 98%. In the common subspecies, 5% have the pattern. The rare subspecies accounts for only 0.1% of the population. How likely is the beetle having the pattern to be rare, or what is P(RarePattern)?"
The random variables and their dependency can be represented by the following graphical model.
To solve this, we need to define two boolean probability distributions, namely rare
and pattern
. The first one is unconditional and easy to define:
rare = lea.event(0.001)
For pattern
, we cannot (without manual calculations) set it so easily. We need to set two conditional probabilities
* P(Pattern  Rare) = 98%
* P(Pattern  ~Rare) = 5%
Lea provides a special construct to set these probabilities (requires Lea 2.3+):
pattern_if_rare = lea.event(0.98) pattern_if_not_rare = lea.event(0.05) pattern = rare.switch({ True : pattern_if_rare, False : pattern_if_not_rare })
This switch
method defines a special Lea instance, which represents a CPT. The idea is that pattern
probability shall depend on the value of rare
, according to the given dictionary: the keys of this dictionary shall be the possible values of rare
, namely True
and False
. Let us check that our definition is in line with what we need:
 What is the probability of having pattern for a rare beetle?
>>> P(pattern.given(rare)) 0.98

What is the probability of having pattern for a common beetle?
OK,>>> pattern.given(~rare) False : 19/20 True : 1/20 >>> Pf(pattern.given(~rare)) 0.05
pattern
gives back the data we put in it; this does not bring any new information, it is just a sanity test. Now, let us come back to the initial question: 
What is the probability to be rare for a beetle having the pattern?
This result is really what Bayes inference is about. It can be checked by manual calculations using the Bayes theorem.>>> P(rare.given(pattern)) 0.01924209699587669
Once the CPT is defined, other calculations can be done easily:

What is the probability to be rare for a beetle NOT having the pattern?
>>> P(rare.given(~pattern)) 2.107326119253587e05

What is the probability for any beetle to have the pattern?
>>> P(pattern) 0.05093

What is the probability for a beetle to be rare AND to have the pattern?
>>> P(rare & pattern) 0.0009800000000000002
It is even possible to build a joint distribution giving all possible conjunctions (AND), by using the joint operation:
>>> lea.joint(rare,pattern) (False, False) : 0.94905 (False, True ) : 0.04995 (True , False) : 2.000000000000002e05 (True , True ) : 0.00098
CPT with nonboolean probability distributions
The previous example use boolean probability distributions, which is common with conditional probabilities. However, depending on the problem at hand, other types of distribution can be handled. To illustrate this point we shall revisit the previous problem with 2 variables, kind
and aspect
, which refer to string probability distributions.
kind = lea.pmf({ 'rare' : 0.001, 'common' : 0.999 }) aspect_if_rare = lea.pmf({ 'pattern' : 0.98, 'no pattern' : 0.02 }) aspect_if_common = lea.pmf({ 'pattern' : 0.05, 'no pattern' : 0.95 }) aspect = kind.switch({ 'rare' : aspect_if_rare, 'common' : aspect_if_common })
cpt
method).
Now, aspect
is a new CPT that gives probability distribution of 'pattern' vs 'no pattern', according to the value of kind
. Now, the question "what is the probability to be rare for a beetle having the pattern?" can be restated in one of the following manners:
>>> kind.given(aspect == 'pattern') common : 0.9807579030041232 rare : 0.01924209699587669 >>> (kind == 'rare').given(aspect == 'pattern') False : 0.9807579030041232 True : 0.01924209699587669 >>> P((kind == 'rare').given(aspect == 'pattern')) 0.01924209699587669
kind2 = lea.pmf({ 'rare' : 0.001, 'common_A' : 0.342, 'common_B' : 0.657 })
aspect2_if_rare = lea.pmf({ 'stripped' : 0.78, 'mottled' : 0.20, 'no pattern' : 0.02 }) aspect2_if_common_A = lea.pmf({ 'stripped' : 0.01, 'mottled' : 0.04, 'no pattern' : 0.95 }) aspect2_if_common_B = lea.pmf({ 'stripped' : 0.03, 'mottled' : 0.02, 'no pattern' : 0.95 }) aspect2 = kind2.switch({ 'rare' : aspect2_if_rare, 'common_A' : aspect2_if_common_A, 'common_B' : aspect2_if_common_B } )
>>> aspect2 mottled : 0.027020000000000002 no pattern : 0.94907 stripped : 0.02391 >>> kind2.given(aspect2 == 'stripped') common_A : 0.14303638644918445 common_B : 0.8243412797992472 rare : 0.03262233375156839 >>> kind2.given(aspect2 != 'no pattern') common_A : 0.3357549577851953 common_B : 0.6450029452189279 rare : 0.019242096995876694 >>> P((kind2[:3] == 'com').given(aspect2 != 'no pattern')) 0.9807579030041232 >>> P((kind2 == 'rare').given(aspect2 != 'no pattern')) 0.019242096995876694
>>> P((kind == 'rare').given(aspect == 'pattern')) 0.01924209699587669
P(common) = P(common_A) + P(common_B) and P(pattern) = P(stripped) + P(mottled).
Bayesian networks
The technique to build CPT can be used to define Bayesian networks (BN), which model causality chains between uncertain events. There is no new syntax here but you shall see how multiple CPT can be connected together to define complex networks.
We shall use the wellknown "burglary Bayesian network" of J. Pearl, explained in AIMA book. You can find also good descriptions of this network on the Internet with the following keywords: burglar, Bayes (note that that the classical, yet simpler, example of "rainsprinklergrass" is covered here, in Lea examples page).
Here is the graphical model of this BN.
Here is how to model this Bayesian network in Lea:
burglary = lea.event(0.001) earthquake = lea.event(0.002) alarm = lea.joint(burglary,earthquake).switch({ (True ,True ) : lea.event(0.950), (True ,False) : lea.event(0.940), (False,True ) : lea.event(0.290), (False,False) : lea.event(0.001) }) john_calls = alarm.switch({ True : lea.event(0.90), False : lea.event(0.05) }) mary_calls = alarm.switch({ True : lea.event(0.70), False : lea.event(0.01) })
We have not done more than building three CPT, using the syntax explained in the previous sections. The sole new technique used here is for the alarm
CPT, which depends on two variables, viz. burglary
and earthquake
: the joint
method is used to join these two variables so as to produce the four combinations as tuples containing booleans.
Now, we are ready to query the network. Let us first make "forward" derivations (i.e from causes to effects).

What is the probability that Mary calls if the alarm is triggered? (this is a given!)
>>> P(mary_calls.given(alarm)) 0.7

What is the probability that Mary calls if there is a burglary?
>>> P(mary_calls.given(burglary)) 0.6586138
Now, we can also query the Bayes network in a "backward" manner (i.e. from effects to cause)

What is the probability that there is a burglary if alarm is triggered?
>>> P(burglary.given(alarm)) 0.373551228281836

What is the probability that the alarm is triggered if Mary calls?
>>> P(alarm.given(mary_calls)) 0.1500901177497596

What is the probability that there is a burglary if Mary OR John calls (or both)?
>>> P(burglary.given(mary_calls  john_calls)) 0.014814211524985793

What is the probability that there is a burglary if Mary AND John call?
>>> P(burglary.given(mary_calls & john_calls)) 0.28417183536439294
It is also possible to get unconditional probabilities of events or conjunction of events

What is the probability that the alarm is triggered?
>>> P(alarm) 0.002516442

What is the probability that there is a burglary and Mary calls?
>>> P(burglary & mary_calls) 0.0006586138000000001

What is the probability that there is neither burglary nor earthquake, the alarm triggers and both John and Mary call?
>>> P(~burglary & ~earthquake & alarm & john_calls & mary_calls) 0.00062811126
Note that the last result can be checked in the AIMA book.
As an academic exercise, you can very easily "flatten" the network to build a joint table giving the probabilities of each conjunction of events. This boils down to calculate the joint probability distribution between the 5 variables.
>>> lea.joint(burglary,earthquake,alarm,john_calls,mary_calls) (False, False, False, False, False) : 0.9367427006190001 (False, False, False, False, True ) : 0.009462047481000001 (False, False, False, True , False) : 0.04930224740100002 (False, False, False, True , True ) : 0.0004980024990000002 (False, False, True , False, False) : 2.9910060000000004e05 (False, False, True , False, True ) : 6.979013999999999e05 (False, False, True , True , False) : 0.00026919054000000005 (False, False, True , True , True ) : 0.00062811126 (False, True , False, False, False) : 0.0013341744900000002 (False, True , False, False, True ) : 1.3476510000000005e05 (False, True , False, True , False) : 7.021971000000001e05 (False, True , False, True , True ) : 7.092900000000001e07 (False, True , True , False, False) : 1.7382600000000002e05 (False, True , True , False, True ) : 4.0559399999999997e05 (False, True , True , True , False) : 0.00015644340000000006 (False, True , True , True , True ) : 0.00036503460000000007 (True , False, False, False, False) : 5.631714000000006e05 (True , False, False, False, True ) : 5.688600000000006e07 (True , False, False, True , False) : 2.9640600000000033e06 (True , False, False, True , True ) : 2.9940000000000035e08 (True , False, True , False, False) : 2.8143600000000003e05 (True , False, True , False, True ) : 6.56684e05 (True , False, True , True , False) : 0.0002532924000000001 (True , False, True , True , True ) : 0.0005910156000000001 (True , True , False, False, False) : 9.40500000000001e08 (True , True , False, False, True ) : 9.50000000000001e10 (True , True , False, True , False) : 4.9500000000000054e09 (True , True , False, True , True ) : 5.0000000000000066e11 (True , True , True , False, False) : 5.7e08 (True , True , True , False, True ) : 1.3299999999999996e07 (True , True , True , True , False) : 5.130000000000002e07 (True , True , True , True , True ) : 1.1970000000000001e06
Reading BIF files
BIF (Bayesian Interchange Format) is a standard file format for defining Markov networks. A repository of BIF files can be found on the bnlearn site.
Lea is able to read BIF files and create the Markov networks in one statement. For example, here is how to load the earthquake.bif
file found on the abovementioned page.
Requires Lea 3.4.1+
lea.read_bif_file("earthquake.bif",globals()) # > ('Burglary', 'Earthquake', 'Alarm', 'JohnCalls', 'MaryCalls')
globals()
allows directly injecting these variables in your current Python session. The names of created variables are returned in a tuple. You may then query the BN by using the techniques seen above.
Burglary # > True : 0.01 False : 0.99 P(MaryCalls.given(Alarm)) # > 0.7 P(Burglary.given(JohnCalls)) # > 0.13331382432504352
earthquake.bif
is defined as 1%, while it was 0.1% in the previous example.
Advanced features for Bayesian networks
One of the drawbacks of Bayesian networks using the simple CPT technique presented is that it requires specifying probability data for each combination of influencing variables. Since the number of cases grows exponentially with the number of influencing variables, this could become difficult to specify.
The present section presents several ways to reduce the number of cases to specify in CPT, by means of factorization.
Note that the special topic of calculating CPT from data shall be covered later, in a specific section dedicated to Machine Learning.
Specifying a default entry
One of the simplest ways to reduce the enumeration of cases in CPT is to have a default entry that groups all the cases that have not been explicitly given.
Let us revisit the burglary network. Consider a new model of alarm device for which the two cases "burglary without earthquake" and "earthquake without burglary" cannot be distinguished, these being associated to a probability equal to 0.9. Instead of writing this:
alarm2 = lea.joint(burglary,earthquake).switch({ (True ,True ) : lea.event(0.950), (True ,False) : lea.event(0.900), (False,True ) : lea.event(0.900), (False,False) : lea.event(0.001) })
switch
method, just after the dictionary:
alarm2 = lea.joint(burglary,earthquake).switch({ (True ,True ) : lea.event(0.950), (False,False) : lea.event(0.001) }, lea.event(0.900) )
Building CPT with one condition (if_
method)
For variables depending on one Boolean variable, the switch method can be abbreviated using the if_
convenience function: it mimics the ifthenelse constructions found in programming languages (the underscore is put to avoid a clash with Python’s if
keyword). For instance, the definition
john_calls = alarm.switch({ True : lea.event(0.90), False : lea.event(0.05) })
john_calls = lea.if_( alarm, lea.event(0.90), lea.event(0.05) )
if_
construct builds a CPT in a somehow more expressive way than the switch
method.
Reducing the size of CPT
All the examples seen above use the switch
method to build up CPT. This allows specifying a set of clauses, each of which is triggered when the decision variables are equal to a specific combination of values. Although the switch
method works fine and is efficient (through Python dictionaries), it is limited in the way the conditions are expressed since it requires an enumeration of all possible values. This could be a burden when the decision variable has a large domain, for example if it joins multiple variables or if the domain covers a big range of values. More importantly, such explicit CPT may be intractable due to memory usage. In several cases however, it happens that the CPT
 returns the same probability distribution for different values (see examples below), possibly if some condition is verified (contextspecific independence),
 or returns probability distributions that can be calculated by a deterministic function (e.g. noisyor, noisymax models).
In such situations, it's possible to avoid the definition of an explicit CPT by exploiting the properties/assumptions of the probabilistic model. For this purpose, we present below three techniques, all available in Lea:
 CPT defined by function
 CPT with Boolean expressions
 Cascaded CPT
CPT defined by function (switch_func
method)
Note: the technique shown in the present section requires Lea 3.1 at least.
Let's imagine a world where the temperatures are distributed uniformly from 10°C to 29°C, as integer numbers:
temperature = lea.interval(10,29)
ind_if_temperature_below_0 = lea.pmf({ 'COLD': 0.75, 'MEDIUM': 0.25 }) ind_if_temperature_between_0_and_19 = lea.pmf({ 'COLD': 0.25, 'MEDIUM': 0.60, 'HOT': 0.15 }) ind_if_temperature_above_20 = lea.pmf({ 'MEDIUM': 0.20, 'HOT': 0.80 })
switch
technique seen above, there are 40 entries to specify:
indicator = temperature.switch({ 10: ind_if_temperature_below_0, … 1: ind_if_temperature_below_0, 0: ind_if_temperature_between_0_and_19, … 19: ind_if_temperature_between_0_and_19, 20: ind_if_temperature_above_20, … 29: ind_if_temperature_above_20 })
The switch_func
method offers a way to remove the need to provide such redundant CPT. The idea is to provide a method that emulates the CPT lookup on the fly. So instead of relying on a dictionary as above, you provide a Python function that returns the probability distribution of the indicator for any temperature given in argument.
temperature = lea.interval(10,29) def indicator_func(temperature_value): if temperature_value < 0: return ind_if_temperature_below_0 if temperature_value < 20: return ind_if_temperature_between_0_and_19 return ind_if_temperature_above_20 indicator = temperature.switch_func(indicator_func)
>>> indicator.given(temperature == 5) COLD : 0.7500000000000001 MEDIUM : 0.25 >>> indicator.given(temperature <= 0) COLD : 0.7045454545454545 HOT : 0.013636363636363634 MEDIUM : 0.28181818181818175 >>> indicator COLD : 0.3124999999999998 HOT : 0.27500000000000013 MEDIUM : 0.4125000000000001
>>> P((temperature < 0).given(indicator == 'COLD')) 0.6 >>> P((temperature < 0).given(indicator == 'MEDIUM')) 0.15151515151515144 >>> P((temperature < 0).given(indicator == 'HOT')) 0.0 >>> P((temperature <= 0).given(indicator == 'HOT')) 0.013636363636363629 >>> P((temperature > 0).given(indicator == 'HOT')) 0.9863636363636364 >>> P((temperature <= 0).given(indicator != 'COLD')) 0.11818181818181817 >>> P((temperature <= 0).given(indicator.is_any_of('MEDIUM','HOT'))) 0.11818181818181817
>>> lea.joint(indicator,temperature) ('COLD' , 10) : 0.01875000000000001 ('COLD' , 9) : 0.01875000000000001 ('COLD' , 8) : 0.01875000000000001 ('COLD' , 7) : 0.01875000000000001 …
switch_func
construct, it's important to note that, internally, the indicator_func
function is stored in place of the CPT. The probabilistic inference algorithm is smart enough to call this function when needed, instead of expanding the full CPT. So, this construct is very efficient in term of memory usage.
WARNING: the function you pass to switch_func
shall return Lea instances that are defined by given elementary pmf, e.g. as returnd by lea.pmf(...)
or lea.vals(...)
calls. Raw values are accepted also: these are automatically coerced to a pmf with a single value having probability 1. If the passed function returns more complex Lea instances (functions, arithmetic expressions, joints, CPT, etc.), then an exception is raised at evaluation.
We shall see later how the switch_func
can be used to build up noisyor / noisymax models (see here). Also, it can be used to solve the intriguing Monty Hall problem, more easily than with the switch
method.
CPT with Boolean expressions (cpt
method)
The cpt
function offers an alternate solution to the avoid CPT redundancies. The idea is to specify a list of entries made up of Boolean conditions with associated Lea instances. It's a bit like you would do in Python with a sequence of if
constructs (see indicator_func
above). Revisiting the example given in the previous section, here is the syntax to build the indicator
variable using cpt
:
indicator = lea.cpt( (temperature < 0 , ind_if_temperature_below_0 ), ((0 <= temperature) & (temperature < 20), ind_if_temperature_between_0_and_19 ), (20 <= temperature , ind_if_temperature_above_20 ))
Note that the method expects that the given conditions cover all possible cases and are mutually exclusive (so, it’s not exactly comparable to an if
–elif
–else
construct). If this is not respected, then an exception is raised.
Similarly to the switch
method, the cpt
method accepts a default clause: it is represented by condition set to None
. This special value is a placeholder for any case not covered by the other condition(s). To illustrate this, here is a definition of indicator
that is exactly equivalent to the previous one.
indicator = lea.cpt( (temperature < 0 , ind_if_temperature_below_0 ), (None , ind_if_temperature_between_0_and_19 ), (20 <= temperature, ind_if_temperature_above_20 ))
None
can be specified at any place of lea.cpt
’s arguments, not specially the last one. Of course, no more than one None
is authorized (an exception is raised, otherwise).
Cascaded CPT
In several CPT presenting contextspecific independence, it's possible to remove redundancies by building CPT "in cascade". Here is a very basic example, where z
is a CPT depending of x
and y
:
x = lea.event('1/3') y = lea.event('1/4') z = lea.joint(x,y).switch({ (False, False): lea.event('1/5'), (False, True ): lea.event('1/7'), (True , False): lea.event('1/2'), (True , True ): lea.event('1/2') })
x
is True then z
is a 5050 distribution, which does not depend of y
. The redundancy can then be removed by embedding a CPT governed by y
only within a CPT governerd by x
only, as follows:
z1 = x.switch({ False: y.switch({ False: lea.event('1/5'), True : lea.event('1/7')}), True : lea.event('1/2') })
cpt
:
z2 = lea.cpt( ( ~x, lea.cpt( (~y, lea.event('1/5')), ( y, lea.event('1/7')))), ( x, lea.event('1/2')) )
z
, z1
and z2
represent the same logical CPT. For instance,
print (P(z.given(y))) # > 11/42 print (P(z1.given(y))) # > 11/42 print (P(z2.given(y))) # > 11/42
Of course, all these examples are quite contrived and the gain is negligible; you can anyway extrapolate the benefit of these techniques on bigger CPT, having prohibitive combinatorial.
Noisyor / noisymax models
The noisyor / noisymax models provide practical alternatives to naive CPT, provided that some assumptions can be made on the causal dependencies (we won't detail these here). Lea provides several means to build up such models. For the example, we shall define a symptom boolean random variable (fever
), which is modeled as a noisyor of disease boolean random variables (cold
, flu
and malaria
) (see figures and explanations in AIMA book or in this presentation.
Let's first define the input data, which are the conditional probabilities of fever if the occurrence of one single disease:
prob_fever_if_sole_cold = 0.4 prob_fever_if_sole_flu = 0.8 prob_fever_if_sole_malaria = 0.9
cold = lea.event(0.082) flu = lea.event(0.025) malaria = lea.event(0.001)
fever
variable as a noisyor of the three diseases:fever_by_cold = lea.if_(cold , lea.event(prob_fever_if_sole_cold) , False) fever_by_flu = lea.if_(flu , lea.event(prob_fever_if_sole_flu) , False) fever_by_malaria = lea.if_(malaria, lea.event(prob_fever_if_sole_malaria), False) fever = fever_by_cold  fever_by_flu  fever_by_malaria
fever_by_cold
variable can be interpreted as "having fever due to a cold": in case of cold, it has a probability 0.4 to be true, if there is no cold, it is surely false (i.e. either there is no fever or the fever is not caused by cold). The fever
is obtained by ORing the three 'noised' variables. This uses the dedicated Lea's construct, which simply uses the classical OR to combine all possible values. Note that this noisyor model is very different to:
light_fever = cold  flu  malaria
prob_fever_if_sole_cold
= prob_fever_if_sole_flu
= prob_fever_if_sole_malaria
= 1. In such special case, the noisyor boils down to a classical or.
One way to check that fever
is correctly defined is by checking one by one each entry of the equivalent CPT, which can be calculated easily from the 3 given probabilities. Here is how this check can be done:
diseases = lea.joint(cold,flu,malaria) for (c,f,m) in diseases.support: print (c, f, m, lea.P(fever.given(cold==c,flu==f,malaria==m))) # > False False False 0.0 # False False True 0.9 # False True False 0.8 # False True True 0.98 # True False False 0.4 # True False True 0.94 # True True False 0.88 # True True True 0.9879999999999999
You may then execute queries, as described earlier in the tutorial:
print(P(fever.given(~cold,flu,malaria))) # > 0.98 print(P(malaria.given(fever))) # > 0.017080461111676846 print(P(flu.given(~fever,cold))) # > 0.005102040816326529
Noisymax models may be define in a similar way, using lea.max_of
method. These may use nonboolean random variables, provided that their values are ordered. You may note that the definition of fever
above could be replaced by
fever = lea.max_of(fever_by_cold, fever_by_flu, fever_by_malaria)
False < True
). This tends to confirm that noisymax is a generalization of noisyor.
Before concluding the section, let's mention that there exists an alternate way to make the noisyor / noisymax models, which relies on the switch_func method. The idea is to define a function that mimics the CPT, by calculating the probability distributions expected for values of causing variables passed as argument. This function shall of course use the assumptions made for the noisy model at hand. Here is how to proceed to redo the fever model:
Note: the technique shown below requires Lea 3.1 at least.
def fever_event(disease_val): (cold_val, fever_val, malaria_val) = disease_val inv_prob = 1. if cold_val: inv_prob *= 1.  prob_fever_if_sole_cold if fever_val: inv_prob *= 1.  prob_fever_if_sole_flu if malaria_val: inv_prob *= 1.  prob_fever_if_sole_malaria return lea.event(1.inv_prob) diseases = lea.joint(cold,flu,malaria) fever2 = diseases.switch_func(fever_event)
fever
by fever2
in the query above give the very same results.
To conclude, all the models seen here do not necessitate the definition of large CPT. Furthermore, the inference algorithm used does not require to calculate such CPT internally, so these models are very efficient in memory usage.
Prior probabilities (revised_with_cpt
method)
The methods seen so far for building CPT miss an important class of problems of conditional probability: those mixing a conditional probability and a prior, unconditional, probability.
We shall take here an example of the AIMA book. It models a causal dependency between the meningitis and the stiff neck. The assumptions are the following:
 the probability that a patient has meningitis is 1/50,000 (prior probability)
 the probability that a patient has stiff neck is 1/20 (prior probability)
 the meningitis causes stiff neck 50 % of the times (conditional probability)
OK, the probability of meningitis is easily tackled:
meningitis = lea.event(1/50000)
if_
syntax, we are stuck on the "else" argument:
stiffneck = Lea.if_( meningitis, lea.event(1/2), ?? WHAT HERE ?? )
prior_lea
argument for this:
Note: requires Lea 3.2.2 at least.
stiffneck = lea.if_( meningitis, lea.event(1/2), prior_lea=lea.event(1/20))
revised_with_cpt
:
stiffneck = lea.event(1/20).revised_with_cpt((meningitis,lea.event(1/2)))
Let us check these statements:
>>> P(stiffneck) 0.05 >>> P(stiffneck.given(meningitis)) 0.5
stiffneck
object is a true CPT. Behind the scene, Lea has calculated the probability of stiff neck if no meningitis; then, it was able to fill in the missing data. You can get this value:
>>> P(stiffneck.given(~meningitis)) 0.0499909998199964
0.0499909998199964
and get the same CPT. Of course, this is not needed since Lea has made the work for you!
Now, we are able to compare the probability of meningitis, prior and after stiff neck information:
>>> P(meningitis) 2e05 >>> P(meningitis.given(stiffneck)) 0.0002
P(meningitis  stiffneck) = P(stiffneck  meningitis) x P(meningitis) / P(stiffneck)
= (1/2) . (1/50000) / (1/20) = 1/5000 = 0.0002
We can check that also using Lea:
>>> P(stiffneck.given(meningitis)) * P(meningitis) / P(stiffneck) 0.0002
You should take care that some problems have no solutions. In such cases, Lea raises an exception with an error message giving the possible range for the prior probability. Example:
stiffneck2 = lea.event(1/200000).revised_with_cpt((meningitis,lea.event(1/2))) lea.lea.Error: prior probability of 'False' is 0.999995, outside the range [ 1e05 , 0.9999899999999999 ]
Prior probabilities work also with nonboolean distributions. Let us revisit the beetle example modeled with attribute. Imagine that the problem is restated as follows: the prior probability of a beetle with pattern is 5.093 %, in the rare subspecies, 98 % have the pattern; the rare subspecies accounts for only 0.1 % of the population. Note that, considering the initial problem, we have removed one data (conditional probability of pattern for common beetle) and added one data (prior probability pattern).
Here are the statements to model these data:
aspect0 = lea.pmf({ 'pattern' : 0.05093, 'no pattern' : 0.94907 }) aspect1 = aspect0.revised_with_cpt(( kind == 'rare', lea.pmf({ 'pattern' : 0.98, 'no pattern' : 0.02 })))
>>> aspect1.given(kind == 'common') no pattern : 0.95 pattern : 0.05
aspect
, not aspect1
):
>>> aspect no pattern : 0.94907 pattern : 0.05093
Markov chains
Lea allows defining Markov chains. We shall show here how to model the stock market Markov chain example given on Wikipedia.
To execute the examples shown in this section, you have to import the markov
module present in the Lea package:
from lea import markov
Definition from transition probability matrix
The states defining the Markov chain can be any Python object. We shall use here the most simple option : strings. The three market states of the example shall be named "BULL", "BEAR" and "STAG". Here is how you define these states in Lea, along with the probability transition matrix (note that we omit the prompts, so you can copypaste easily the multline statement):
market = markov.chain_from_matrix( ( 'BULL', 'BEAR', 'STAG' ), ( 'BULL', ( 0.900 , 0.075 , 0.025 )), ( 'BEAR', ( 0.150 , 0.800 , 0.050 )), ( 'STAG', ( 0.250 , 0.250 , 0.500 )))
We have here a Markov chain, named market
, which captures the set of states and all probabilities of transition from state to state. These probabilities can be displayed by using the same methods as shown for simple distributions:
>>> market BULL > BULL : 0.9 > BEAR : 0.075 > STAG : 0.025 BEAR > BULL : 0.15 > BEAR : 0.8 > STAG : 0.05 STAG > BULL : 0.25 > BEAR : 0.25 > STAG : 0.5
market
is NOT a Lea instance (actually, it's a markov.Chain
). Apart displaying itself, the operations you can do on this object are minimal: getting its inner states or given probability distributions of these states. We shall see this in the following.
Querying the Markov chain
To make queries, we need to be able to define an initial state. Here is how to proceed:
>>> (bull_state,bear_state,stag_state) = market.get_states()
bear_state
object represents the state BEAR:
>>> bear_state BEAR : 1.0
Note: In Lea version previous to 3.1, write _state
instead of state
.
>>> market.state BULL : 0.3333333333333333 BEAR : 0.3333333333333333 STAG : 0.3333333333333333
next_state()
method:
>>> market.state.next_state() BULL : 0.4333333333333333 BEAR : 0.375 STAG : 0.19166666666666665 >>> bear_state.next_state() BULL : 0.15 BEAR : 0.8 STAG : 0.05
next_state(…)
is a Lea instance, on which you can apply usual techniques seen in the tutorial.
To get the state probability distribution after n steps, instead of chaining n times the next_state()
method, you can simply pass n as argument:
>>> bear_state.next_state(3) BULL : 0.3575 BEAR : 0.56825 STAG : 0.07425
To define an initial nonuniform probability distribution, you need to pass the state probability distribution to the .get_state
method :
>>> fp_state = market.get_state(lea.pmf({ 'BULL': 0.6250, 'BEAR': 0.3125, 'STAG': 0.0625 })) >>> fp_state BULL : 0.625 BEAR : 0.3125 STAG : 0.0625
>>> fp_state.next_state() BULL : 0.6250000000000001 BEAR : 0.3125 STAG : 0.0625
bear_state.next_state(1000)
).
Another way to define a nontrivial probability distribution is by specifying a condition on the state:
>>> market.state_given(market.state != 'STAG') BULL : 0.5 BEAR : 0.5 >>> market.state_given(market.state[0] == 'B') BULL : 0.5 BEAR : 0.5
Definition from a sequence of states
There exists another way to define a Markov chain: the empirical way. Instead of setting a probability matrix, which could be hard to define, it is possible to define a Markov chain by providing a sequence of states supposed to be long enough to be representative of all transitions. Typically, this could be a sample sequence of observed states for a given system.
This sort of definition is done through the markov.chain_from_seq
method. Here is an example, using a very short sequence:
>>> market2 = markov.chain_from_seq(('BEAR','BEAR','BULL','BULL','BULL','BEAR','STAG','BULL','STAG','BEAR')) >>> market2 BEAR > BEAR : 0.3333333333333333 > BULL : 0.3333333333333333 > STAG : 0.3333333333333333 BULL > BEAR : 0.25 > BULL : 0.5 > STAG : 0.25 STAG > BEAR : 0.5 > BULL : 0.5
Generating random sequence of states
Starting from a given state of a given Markov chain, it is possible to generate a random sequence of states, obeying the transition probabilities of that Markov chain. This is done by calling the random_seq(n)
method where n
is the size of the sample sequence. Example:
>>> bear_state.random_seq(40) ('BEAR', 'BEAR', 'BEAR', 'BULL', 'BULL', 'BULL', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'STAG', 'BULL', 'BULL', 'BULL', 'BULL', 'BULL', 'BULL', 'BULL', 'BULL', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BULL', 'BULL', 'BULL', 'BULL')
markov.chain_from_seq
seen above, which can be seen also as a transition frequency calculator. Let's use a random sequence of 100,000 states this time:
>>> state_seq_sample = bear_state.random_seq(100000) >>> markov.chain_from_seq(state_seq_sample) BEAR > BEAR : 0.8024547965846309 > BULL : 0.14929683576092417 > STAG : 0.04824836765444501 BULL > BEAR : 0.0768908443327048 > BULL : 0.8988863640026431 > STAG : 0.02422279166465213 STAG > BEAR : 0.2497538562520512 > BULL : 0.2490974729241877 > STAG : 0.501148670823761
We see that the frequencies of transition are close to the probabilities defined for the market
Markov chain, as expected.
Matrix extraction / absorbing Markov chains
Note: what follows requires Lea 3.1 at least.
For advanced treatments on Markov chains, you may desire to obtain characteristic data as matrices.
The probability transition matrix can be retrieved by using the matrix
method. If you have installed NumPy package, then you can obtain this matrix as a numpy array using the as_array
optional argument:
print (market.matrix()) # > ((0.9, 0.075, 0.025), (0.15, 0.8, 0.05), (0.25, 0.25, 0.5)) >>> market.matrix(as_array=True) # > array([[ 0.9 , 0.075, 0.025], [ 0.15 , 0.8 , 0.05 ], [ 0.25 , 0.25 , 0.5 ]])
absorbing_mc_info()
serves this purpose, it returns a 6tuple:
(is_absorbing,transient_states,absorbing_states,q_matrix,r_matrix,n_matrix) = market.absorbing_mc_info() print (is_absorbing,transient_states,absorbing_states,q_matrix,r_matrix,n_matrix) # > (False, (), (), (), (), None)
is_absorbing
is False
and the other elements give no interesting data. Consider now the following MC:
dw = markov.chain_from_matrix( ( 'S0', 'S1', 'S2', 'S3', 'S4'), ( 'S0', ( 1.0 , 0.0 , 0.0 , 0.0 , 0.0 )), ( 'S1', ( 0.5 , 0.0 , 0.5 , 0.0 , 0.0 )), ( 'S2', ( 0.0 , 0.5 , 0.0 , 0.5 , 0.0 )), ( 'S3', ( 0.0 , 0.0 , 0.5 , 0.0 , 0.5 )), ( 'S4', ( 0.0 , 0.0 , 0.0 , 0.0 , 1.0 )))
(is_absorbing,transient_states,absorbing_states,q_matrix,r_matrix,n_matrix) = dw.absorbing_mc_info() print (is_absorbing) # > True print (transient_states) # > ('S1', 'S2', 'S3') print (absorbing_states) # > ('S0', 'S4') print (q_matrix) # > ((0.0, 0.5, 0.0), (0.5, 0.0, 0.5), (0.0, 0.5, 0.0)) print (r_matrix) # > ((0.5, 0.0), (0.0, 0.0), (0.0, 0.5)) print (n_matrix) # > None
n_matrix
is supposed to contain the fundamental matrix N = inv(IQ). To be actually calculated, you need NumPy package and you have to put the as_array
optional argument: then, the 3 matrices Q, R and calculated N are returned as numpy arrays:
(is_absorbing,transient_states,absorbing_states,q_matrix,r_matrix,n_matrix) = dw.absorbing_mc_info(as_array=True) print (q_matrix) # > array([[ 0. , 0.5, 0. ], [ 0.5, 0. , 0.5], [ 0. , 0.5, 0. ]]) print (r_matrix) # > array([[ 0.5, 0. ], [ 0. , 0. ], [ 0. , 0.5]]) print (n_matrix) # > array([[ 1.5, 1. , 0.5], [ 1. , 2. , 1. ], [ 0.5, 1. , 1.5]])
What's next?
Thank you for reading the present tutorial page! We hope that you found it enough clear and informative.
To learn more, you are invited to jump to
 tutorial [3/3]  plotting, drawing without replacement, machine learning, information theory, MC estimation, symbolic calculation, …
 Examples page
For enhancements or bugs, please create issues on bitbucket Lea page.
Feel free also to send an email to pie.denis@skynet.be (English or French spoken). Use cases of Lea in your projects/researches are especially appreciated! A "references" page listing the projects using Lea could greatly improve the reputation of the library.
Updated