Wiki

Clone wiki

Lea / Lea3_Tutorial_2

leas3_title_2.png

< Tutorial [1/3] Wiki Home Tutorial [3/3] >

Table of Content


Introduction

The present page is the second part of the tutorial for the Lea library. Along with the third part, it covers advanced topics in probability. It assumes that you have installed Lea and that you are familiar with the techniques presented in the first part of the tutorial. The main topics are mostly independent, so you could choose to jump to your preferred subject without much trouble (see TOC above).

Note: You may also practice the present tutorial online in the Jupyter Notebook Tutorial #2 (no installation required)!

In some sections, we shall use some examples found in the excellent "Artificial Intelligence: A Modern Approach" book of Stuart Russell and Peter Norvig (second edition). The reader is invited to refer to this book, named the "AIMA book" in the following, for more details on presented methods and results. These examples are reproduced with the kind permissions of Stuart Russell and Peter Norvig.

Standard probability distributions

Beside general-purpose functions presented in the first part of the tutorial, Lea provides several functions for defining standard probability distributions, as well as gambling devices.

Note that all these functions admit an optional prob_type argument for setting a specific probability representation.

Bernoulli distribution

The method bernoulli(p) defines a Bernoulli distribution: it gives the value 1 with probability p and 0 with probability 1-p.

>>> lea.bernoulli(0.25)
1 : 0.25
0 : 0.75

Binomial distribution

The method binomial(n,p) defines a binomial distribution: it gives the number of successes among a number n of independent experiments, each having probability p of success. For example, suppose a biased coin comes up heads with probability 0.3 when tossed. What is the probability of achieving 0, 1, … or 6 heads after six tosses?

>>> lea.binom(6,0.3)
0 : 0.11764899999999995
1 : 0.3025259999999999
2 : 0.324135
3 : 0.18522
4 : 0.059535
5 : 0.010206000000000002
6 : 0.000729
Note that the binomial distribution is a generalization of the Bernoulli distribution since binomial(1,p) is equivalent to bernoulli(p). Also, this distribution can be obtained also by using the times(n) method: the binomial distribution above is exactly the same as
>>> lea.bernoulli(0.3).times(6)

Poisson distribution

The method poisson(m) defines a Poisson distribution having mean m.

>>> lea.poisson(2)
 0 : 0.1353352832366127
 1 : 0.2706705664732254
 2 : 0.2706705664732254
 3 : 0.1804470443154836
 4 : 0.0902235221577418
 5 : 0.03608940886309672
 6 : 0.012029802954365574
 7 : 0.0034370865583901638
 8 : 0.0008592716395975409
 9 : 0.00019094925324389798
10 : 3.8189850648779595e-05
Because the Poisson distribution has, in theory, an infinite number of possible values, Lea provides an approximation where values with a probability below 1e-20 are dropped. The threshold can be changed by issuing the needed precision as a second argument, e.g. poisson(2,1e-30). Note that, since the highest values are dropped, the remaining probabilities are all slightly overestimated in order to keep a probability total equal to 1.

Convenience functions and objects

Beside the standard probability distributions seen above, Lea provides a couple of convenience functions and prebuilt distributions related to gambling. These functions and objects are not loaded by default when importing lea; these are in a separated module called lea.leaf that you must import:

from lea import leaf
Here is a table providing details on the functions provided in this module:

function equivalent to description
leaf.die(nb_faces=6) lea.interval(1, nb_faces) a fair die with faces marked from 1 to nb_faces (default: 6)
leaf.dice(nb_dice, nb_faces=6) lea.interval(1, nb_faces).times(nb_dice) total of nb_dice independent fair dice with faces marked from 1 to nb_faces (default: 6)
leaf.dice_seq(nb_dice,nb_faces=6,sorted=True) lea.interval(1, nb_faces).draw(nb_dice, sorted=sorted, replacement=True) tuple with nb_dice elements showing a throw of nb_dice independent fair dice with faces marked from 1 to nb_faces (default: 6); if sorted is True (default), then the combinations of dice which are the same apart from their order of throw are considered equal; the particular value used is chosen to be in order from smallest to largest value; if sorted is False, then all nb_dice**nb_faces combinations are produced, with equal probabilities
leaf.D6 lea.interval(1, 6, prob_type='r') a fair die with 6 faces (fraction probability type)
leaf.flip lea.event('1/2', prob_type='r') a True/False variable with 50-50 chances (fraction probability type)
leaf.card_suite lea.vals(*'SHDC', prob_type='r') random one-character symbol representing a card suite among Spades, Hearts, Diamonds and Clubs (fraction probability type)
leaf.card_rank lea.vals(*'A23456789TJQK', prob_type='r') random one-character symbol representing a card rank among Ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen and King (fraction probability type)
leaf.card leaf.card_rank + leaf.card_suite random two-characters symbol representing a card chosen in a deck of 52 cards, leaf.card is the concatenation of leaf.card_suite and leaf.card_rank (fraction probability type)

Note that the prebuilt objects use fractions for probabilities. On the other side, the three function work with the default probability type; they also admit an optional prob_type argument, which allows choosing a probability type different from the default one. For example, here is how you could see the probabilities of throwing two fair dice, with fractions or with floating-point numbers:

>>> leaf.dice(2, prob_type='r')
 2 : 1/36
 3 : 2/36
 4 : 3/36
 5 : 4/36
 6 : 5/36
 7 : 6/36
 8 : 5/36
 9 : 4/36
10 : 3/36
11 : 2/36
12 : 1/36
Here is how you could draw 13 cards from a deck of cards:
>>> card.random_draw(13)
('3S', 'KH', '3H', 'AC', '6D', '9D', '3C', 'QH', 'KC', '8D', 'KD', '2S', 'TD')
Note also that leaf.card, leaf.card_suite and leaf.card_rank are interdependent:
>>> card.given(card_suite=='S', card_rank.is_none_of(*'JQKA'))
2S : 1/9
3S : 1/9
4S : 1/9
5S : 1/9
6S : 1/9
7S : 1/9
8S : 1/9
9S : 1/9
TS : 1/9
>>> card_rank.given(card.is_any_of('8S', 'KS', '8H', 'KC'))
8 : 1/2
K : 1/2
>>> card_suite.given(card.is_any_of('8S', 'KS', '8H', 'KC'))
C : 1/4
H : 1/4
S : 2/4

Advanced constructor arguments

The pmf and new method accept several optional arguments, which may be useful in some occasions. The following table provides the description of these arguments

argument default description
sorting True if True, then the values for displaying the distribution or getting the values will be sorted if possible (i.e. no exception on sort); otherwise, the order of values is unspecified unless ordered=True
ordered False if True, then the values for displaying the distribution or getting the values will follow the order in which you provide them (requires that sorting=False and that the given pmf is an iterable or a collections.OrderedDict)
normalization True if True, then each element of the given ps is divided by the sum of all ps before being stored (in such case, it's not mandatory to provide true probabilities; these could be simple counters, for example)
prob_type -1 allows converting the given probabilities: -1: no conversion, other codes here
check True if True, then the constructor may make some checks on the given data and report errors, if any (exception); if you guarantee the given dat, you can make a slight optimization by setting the check=False

Changing probability representation

In the Lea's first tutorial, we have seen that probabilities are stored and displayed as floating-point numbers. This is a classical/conservative representation, which is used in the majority of statistical/probability packages. Other representations may however be more suitable. Lea proposes actually four choices for representing probabilities, namely floats, decimals, fractions and symbols:

  • float: based on Python builtin float type
  • decimal: based on Python built-in Decimal class
  • fraction: based on Python built-in Fraction class
  • symbol: based on SymPy’s classes (requires installation of the SymPy, a Python library for symbolic mathematics).

The representation as floating-point numbers is clearly the most standard/conservative representation; it's easy to use and it integrates seamlessly with many functions and libraries. However, floating-point numbers present several drawbacks due to its rounding errors. In particular, it's not the most suited when dealing with probability distributions calculated by combinatorial (e.g. cards and dice games). The three other probability representations provide alternatives that can better fit your type of problems.

There are several ways to specify the type used for representing probability in a given distribution: by automatic detection, by passing declaration, by permanent declaration.

Setting probability type by automatic detection

Lea is able to choose the probability representation based on the way you express it. This is probably the simplest way to work when you use Lea in an interactive session. The rules are quite simple:

  • If the probabilities are written as number literals or expressions, then the float type is used
  • If the probabilities are written as strings, then conversions are tempted until success in the following order: Decimal, Fraction, SymPy's Symbol.

Here is a table giving examples of the different possible cases:

if you type you get
0.333 0.333 as a float instance
'0.333' 0.333 as a Decimal instance
1/3 0.333333... as a float instance
'1/3' 1/3 as a Fraction instance
'p' p as a Sympy symbol instance

To illustrate this, here are some samples showing how a Boolean probability distribution and a probability distribution with the 3 possible outcomes "A", "B" and "C", can be represented.

Probabilities as floating-point numbers

>>> lea.event(2/3)
True  : 0.6666666666666666
False : 0.33333333333333337
>>> lea.pmf({"A": 1/6, "B": 1/3, "C": 1/2})
A : 0.16666666666666666
B : 0.3333333333333333
C : 0.5
Note that the same probability distributions can be obtained also by the following constructions:
>>> lea.event(0.6666666666666666)
>>> lea.pmf({"A": 1, "B": 2, "C": 3})
>>> lea.pmf({"A":0.16666666666666666, "B":0.33333333333333334, "C":0.5})

Probabilities as decimal numbers

>>> lea.event('0.666666666666666666666')
True  : 0.666666666666666666666
False : 0.333333333333333333334
>>>  lea.pmf({"A":'0.1666666666666666666', "B":'0.3333333333333333334', "C":'0.5'})
A : 0.1666666666666666666
B : 0.3333333333333333334
C : 0.5
Note that we intentionally increased the number of decimals to highlight the difference with floating-point numbers. The decimal representation allows representing probability values precisely, keeping the given decimals unaltered and controlling rounding errors (see details in Python's Decimal class doc).

Probabilities as fractions

>>> lea.event('2/3')
True  : 2/3
False : 1/3
>>> lea.pmf({"A": '1/6', "B": '1/3', "C": '1/2'})
A : 1/6
B : 2/6
C : 3/6
The fraction representation is another way to avoid rounding errors appearing with floating-point numbers. It is respectful of the very origins of probability theory, combinatory and … gambling! Also, contrarily to float and decimals, fractions offer unlimited precision (see details in Python's Fraction class doc).

Note that the probabilities are displayed using the common denominator, so as to ease comparisons; when extracting a single probability value however, the fraction is reduced.

Probabilities as variables (symbolic representation)

Reminder: the following requires the SymPy library.

>>> lea.event('p')
True  : p
False : -p + 1
>>> lea.pmf({"A": 'pA', "B": 'pB', "C": 'pC'})
A : pA/(pA + pB + pC)
B : pB/(pA + pB + pC)
C : pC/(pA + pB + pC)
You can see that the probability variables have been normalized, to ensure that the sum equals 1. Other way to construct symbolic probability distributions, without requiring normalization, shall be presented is the section dedicated to symbolic computation.

If you are not pleased with this, you may use the 'None' probability:

>>> pmf({"A": 'pA', "B": 'pB', "C": None})
A : pA
B : pB
C : -pA - pB + 1
or disable normalization:
>>> pmf({"A": 'pA', "B": 'pB', "C": 'pC'}, normalization=False)
A : pA
B : pB
C : pC

Setting probability type by passing declaration

As alternative to automatic detection that we have seen above, you can specify the required type explicitly in the constructor. This is done by specifying the optional prob_type argument on Lea constructors; the simplest way to define this argument is to specify a one-character code, among 'f', 'd', 'r', 's' and 'x'.

prob_type probability type
'f' float
'd' Decimal
'r' Fraction
's' Symbol
'x' -automatic- (default)

The 'x' code is the default: it makes automatic probability type detection, as explained before. Here are some examples, using the other codes:

>>> lea.event(1/3, prob_type='d')
True  : 0.333333333333333314829616256247390992939472198486328125
False : 0.6666666666666666851703837438
>>> lea.event(1/3, prob_type='r')
True  :  6004799503160661/18014398509481984
False : 12009599006321323/18014398509481984
>>> lea.event('0.6666666666666667', prob_type='r')
True  : 6666666666666667/10000000000000000
False : 3333333333333333/10000000000000000
>>> lea.pmf({"A": 1/6, "B": 1/3, "C": 1/2}, prob_type='r')
A :  6004799503160661/36028797018963967
B : 12009599006321322/36028797018963967
C : 18014398509481984/36028797018963967
>>> lea.pmf({"A": 1, "B": 2, "C": 3}, prob_type='r')
A : 1/6
B : 2/6
C : 3/6
Note that what prob_type specifies is the target probability type, towards which the given probability values will be converted. If you provide float numbers in input, as in the examples above, you get results values of the target type that reveal the inner float representation (which can be somehow surprising!).

The prob_type argument is also required if you want to change the probability type of a distribution that is specified by a list of values (since you don't provide any probability explicitly):

>>> lea.vals("A", "B", "B", "C", "C", "C", prob_type='r')
A : 1/6
B : 2/6
C : 3/6
For symbolic type, the conversion to a symbol is applied only if the probability is provided as a string. For example:
>>> lea.event(1/3, prob_type='s')
True  : 0.3333333333333333
False : 0.6666666666666667
>>> lea.event('1/3', prob_type='s')
True  : (1/3)
False : -(1/3) + 1
In the first case, 1/3 is a float, not a string, so it is kept as a float. In the second case, '1/3' is a string, then it is converted to a symbol; note that it has been surrounded by parentheses, in order to avoid any misinterpretations in expression (this happens only if the string is not a valid variable name, according to Python's isidentifier method).

Setting probability type by permanent declaration

If you want to avoid specifying prob_type over and over, then you may invoke lea.set_prob_type(…) static method to set it once for all (well, more precisely: until the next call to lea.set_prob_type or until the end of your program/session). This sets the default probability type, the argument being interpreted as seen before for the prob_type argument.

Suppose that you want to work only with probability fractions. You'll do this:

>>> lea.set_prob_type('r')
Then, every Lea constructor shall use fractions:
>>> lea.pmf({"A": 1, "B": 2, "C": 3})
A : 1/6
B : 2/6
C : 3/6
>>> lea.vals("A", "B", "B", "C", "C", "C")
A : 1/6
B : 2/6
C : 3/6
To set back the default behavior (automatic type detection), set the 'x' code:
>>> lea.set_prob_type('x')

Converting from one probability type to another

It's sometimes useful to convert the probabilities of given probability distribution to a given type. Typically, you will like to convert fractions to floating-point numbers in order to verify results found in the literature. This is easy to do: just invoke the new(prob_type='_t_') method, where argument _t_ is the type code define in the table above, and you'll get a cloned probability distribution with the required type.

Here is an example of conversion from fractions to floats:

>>> flip = lea.vals('Head', 'Tail', prob_type='r')
>>> flip
Head : 1/2
Tail : 1/2
>>> flip.new(prob_type='f')
Head : 0.5
Tail : 0.5
All conversions are feasible, excepting

  • conversions of symbolic probabilities,
  • conversions from fractions to decimals.

For the last case, a workaround consists in chaining a conversion to float with a conversion to decimal, i.e. .new(prob_type='f').new(prob_type='d').

If you need to convert an individual probability, as returned for example by the P function, you have to call builtin Python functions or types:

>>> lea.P(flip == 'Tail')
1/2
>>> float(lea.P(flip == 'Tail'))
0.5
Since such conversion to floating-point number is quite common, Lea provides a convenience function, Pf, which performs this conversion directly on the result of P:
>>> lea.Pf(flip == 'Tail')
0.5

Joint probability distributions

Calculating joint probability distributions (joint method)

Given a set of Lea probability distributions, the joint(d1,d2,...) method allows you to build a new distribution that represents the joint distribution of the given probability distributions d1, d2, .... Basically, it builds up tuples giving all possible combinations of values of given distributions, with the associated probability. If the arguments are independent variables, then the returned joint distribution is a cartesian product, which gives all possible combinations. Here is a first example:

>>> die1 = lea.interval(1, 6, prob_type='r')
>>> die2 = die1.new()
>>> lea.joint(die1,die2)
(1, 1) : 1/36
(1, 2) : 1/36
(1, 3) : 1/36
(1, 4) : 1/36
(1, 5) : 1/36
(1, 6) : 1/36
(2, 1) : 1/36
(2, 2) : 1/36
(2, 3) : 1/36
...
(6, 3) : 1/36
(6, 4) : 1/36
(6, 5) : 1/36
(6, 6) : 1/36
The lea.joint method has unlimited number of arguments. If there are some dependencies between the given arguments, the tuples are built up consistently, such that they contain only possible combinations:
>>> lea.joint(die1, die1)
(1, 1) : 1/6
(2, 2) : 1/6
(3, 3) : 1/6
(4, 4) : 1/6
(5, 5) : 1/6
(6, 6) : 1/6
>>> lea.joint(die1, 7-die1)
(1, 6) : 1/6
(2, 5) : 1/6
(3, 4) : 1/6
(4, 3) : 1/6
(5, 2) : 1/6
(6, 1) : 1/6
These results are again a direct application of the concept of referential consistency, which is ubiquitous in Lea.

Calculating joint probability distribution may seem uninteresting for real use cases but, actually, it is used behind the scene for most of Lea's calculations. Also, it could be helpful, for a user point of view, in order to understand (or verify) how Lea calculates a probability distribution from atomic events.

For instance, imagine you have executed the following query to give the probability distribution of the first die, knowing the fact that the total of the two dice does not exceed 4:

>>> dice = die1 + die2
>>> die1.given(dice <= 4)
1 : 3/6
2 : 2/6
3 : 1/6
Now, if you want to understand why you get this distribution, then the following joint will provide some clues:
>>> lea.joint(die1, die2, dice).given(dice <= 4)
(1, 1, 2) : 1/6
(1, 2, 3) : 1/6
(1, 3, 4) : 1/6
(2, 1, 3) : 1/6
(2, 2, 4) : 1/6
(3, 1, 4) : 1/6
This joint distribution shows all combinations of dice consistent with the condition. The value of die1 is the first element of the tuples, so you can now easily check the given results by adding atomic probabilities 1/6 together.

Defining joint probability distributions

So far, we have seen how a joint probability distribution can be calculated. In the present section, we shall see how a joint probability distribution (JPD) can be constructed from scratch.

Let us briefly explain the motivation for constructing a JPD. So far, we have seen different techniques to build random variables mutually dependent, by defining arithmetic/logical rules explaining how each variable is derived from others. This assumes that the rules are known and are expressible in mathematical formula. As we have seen, this works fine in many "simple" situations, like those found in gambling. However, our world is usually more complex than that: in many situations, the rules are unknown or too complex to model for a human being. In such case, one approach is collecting data and trying to find correlations in these data. This approach, studied in statistics field, is probably one of the first steps towards the now popular research field called "Machine Learning". Constructing a JPD from collected data is a basic way to express dependencies between random variables: the idea is to identify a set of variables and build a table thereof giving the probability of each possible combination of these variables. When the actual probabilities are missing (which is often the case), these are replaced by frequencies of occurrences; of course, providing as many data as possible should tend to improve the model (at least, this explains the hype with that "Big Data" thing!).

To illustrate the definition of JPD on a simple example, we shall use the example of 3 boolean variables Toothache / Cavity / Catch, given in the AIMA book referred in the introduction (section 13.4 in the second edition). We assume that there are some dependencies between these three variables, but in contrast to all previous examples, we assume that it is impossible to express these dependencies as formulas with logical operators. What we have however is a table indicating the 2 x 2 x 2 = 8 probabilities of each True/False combinations. We shall first make a temporary JPD with 3-tuples: for instance (True,False,False) shall mean "toothache AND no Cavity AND no Catch". For this purpose, we call the usual pmf function:

>>> (T,F) = (True,False)
>>> _jpd = lea.pmf({ (T,T,T): '0.108', (T,F,T): '0.012', (F,T,T): '0.072', (F,F,T): '0.008',
...                  (T,T,F): '0.016', (T,F,F): '0.064', (F,T,F): '0.144', (F,F,F): '0.576' })
>>> _jpd
(False, False, False) : 0.576
(False, False, True ) : 0.008
(False, True , False) : 0.144
(False, True , True ) : 0.072
(True , False, False) : 0.064
(True , False, True ) : 0.012
(True , True , False) : 0.016
(True , True , True ) : 0.108
Incidentally, we have put the given probabilities between quotes in order to use decimal numbers (this is not mandatory, this is just to avoid unpleasant rounding errors occurring with floating-point numbers, as explained here). The _jpd distribution captures the given JPD data but it misses the name of the underlying variables (Toothache / Cavity / Catch). The method as_Joint(…) allows defining a new JPD, which is aware of the variable names:
>>> jpd = _jpd.as_joint('toothache', 'catch', 'cavity')
>>> jpd
 toothache, catch, cavity
(False    , False, False ) : 0.576
(False    , False, True  ) : 0.008

Note that a header with field names is now displayed at the top. Technically, what as_joint(…) does is converting Python usual tuples into named tuples. If you want to see the true inner representation of these named tuple, you can do this:
>>> print (jpd.as_string(tabular=False))
_(toothache=False, catch=False, cavity=False) : 0.576
_(toothache=False, catch=False, cavity=True)  : 0.008 

Note: in "Indexing and slicing", we present another way to proceed; the technique we use here is detailed in "Object attributes and methods" section.

Now, jpd is a JPD ready to be used: it can be queried to get probabilities of given variables or logical combination of variables.

>>> P(jpd.cavity)
0.2
>>> P(jpd.cavity | jpd.toothache)
0.28
>>> P(jpd.cavity & ~jpd.toothache)
0.08
The resulting distributions are calculated by summing the atomic probabilities given in the joint distribution. This kind of calculation is referred to as a marginalisation.

You can check that the variables defined in the joint distribution are actually dependant, as P(catch AND cavity) is different from P(catch) x P(cavity):

>>> P(jpd.catch & jpd.cavity)
0.18
>>> P(jpd.catch) * P(jpd.cavity)
0.068
Since variables are dependent, the calculation of conditional probabilities provides useful results like

  • the probability of cavity for a patient having toothache,
  • the probability of cavity for a patient having toothache and a probe being caught.

>>> P(jpd.cavity.given(jpd.toothache))
0.6
>>> P(jpd.cavity.given(jpd.toothache & jpd.catch))
0.8709677419354838709677419355
You can verify the correctness of these values in the AIMA book. The correctness of these calculations results again from the referential consistency enforced in Lea.

The conditional probability method can be used also to produce "reduced" or "filtered" joint distributions from the initial joint distribution, on the basis of known facts or assumptions:

>>> jpd.given(jpd.cavity, ~jpd.toothache)
 toothache, catch, cavity
(False    , False, True  ) : 0.1
(False    , True , True  ) : 0.9
>>> jpd.given(jpd.cavity)
 toothache, catch, cavity
(False    , False, True  ) : 0.04
(False    , True , True  ) : 0.36
(True     , False, True  ) : 0.06
(True     , True , True  ) : 0.54
In order to remove the 'fixed' attributes, the following techniques can be used:
>>> P(jpd.catch.given(jpd.cavity, ~jpd.toothache))
0.9
>>> lea.joint(jpd.toothache,jpd.catch).given(jpd.cavity).as_joint('toothache','catch')
 toothache, catch
(False    , False) : 0.04
(False    , True ) : 0.36
(True     , False) : 0.06
(True     , True ) : 0.54
Now, for those who are tired to type jpd. over and over, there's some good news! You can assign to your own variables and rely to the lazy evaluation of Lea. For instance, if you make the following assignments:
>>> toothache = jpd.toothache
>>> catch = jpd.catch
>>> cavity = jpd.cavity
then, you should be able to redo ALL the examples given above WITHOUT any jpd. prefix. This assignment trick is actually something that can be used for any Lea expression. For the special case of joint distributions, you may call the as_joint method with argument create_vars=True, which is handy to avoid typing boilerplate statements. The sole prerequisite is to "declare your namespace", so that you grant the method the rights to create variables on the fly. Here is how you should proceed:
>>> declare_namespace(globals())
>>> jpd = pmf({ (T,T,T): '0.108', (T,F,T): '0.012', (F,T,T): '0.072', (F,F,T): '0.008',
...             (T,T,F): '0.016', (T,F,F): '0.064', (F,T,F): '0.144', (F,F,F): '0.576' }) \
...       .as_joint('toothache', 'catch', 'cavity', create_vars=True)```
This call does the same as all the assignments seen above, creating interdedependent random variables toothache, catch and cavity in your session (take care however of possible variable overwriting!).

The technique of joint distribution is very flexible, allowing defining any possible dependencies between variables. However, it requires defining many probability values and it scales not well as the number of variables grows. We shall see later in the tutorial a most efficient alternative, namely the Bayesian networks.

Beyond boolean random variables, JPD can also be used for numerical values or other Python objects, for which you can evaluate expressions containing usual comparison operators. We shall now elaborate this topic, showing how Lea can be used to read larger set of data and get statistics on them. First, we shall see how to build a JPD from a given CSV file or from a given pandas dataframe. Then, we shall show advanced techniques to extract data from the JPD, including filtering, marginalisation, grouping, etc.

Building JPD from dataset

Building JPD from a CSV file (read_csv_file method)

For the example, we shall assume that we have a database of 50 patients within a CSV file (the fields are: given name, surname, gender, title, birthday, blood_type, weight in kg, height in cm, smoker (Y/N)):

patients.csv

Elaine,McLaughlin,female,Ms.,12-01-1984,O+,44.9,141.0,Y
Alan,Music,male,Mr.,30-12-1966,A-,61.3,181.0,Y
Debra,Roberts,female,Mrs.,01-12-1927,O+,79.6,168.0,N
Martine,Albright,female,Mrs.,05-01-1975,O+,46.0,156.0,N
Terry,Jordan,male,Mr.,28-12-1963,A-,96.9,181.0,N
Joshua,Spinney,male,Mr.,19-12-1952,O+,106.4,183.0,N
Tawanna,Short,female,Ms.,02-02-2012,O+,93.4,175.0,N
Frances,Moore,female,Ms.,07-01-1978,B+,91.6,164.0,N
Jon,Overbey,male,Mr.,12-01-1985,A+,60.9,164.0,N
Milton,Silva,male,Mr.,04-12-1932,O+,88.9,177.0,N
Damon,Kiser,male,Mr.,09-12-1938,A+,97.2,181.0,N
Ruby,Nunez,female,Mrs.,30-12-1966,O-,23.6,126.0,Y
David,Seguin,male,Mr.,21-01-1996,A+,28.8,118.0,N
Lillian,Wooley,female,Ms.,28-12-1964,A-,54.5,156.0,N
Lena,Sharp,female,Ms.,02-01-1971,AB+,58.4,170.0,N
Felecia,Decker,female,Mrs.,20-12-1953,O+,95.8,172.0,N
Belva,Dry,female,Mrs.,07-12-1936,AB+,76.9,151.0,N
Lawrence,McGovern,male,Mr.,06-01-1976,A+,16.8,91.0,Y
Shelley,Bailey,female,Ms.,14-12-1945,A+,91.2,157.0,N
Mildred,Heaps,female,Mrs.,01-01-1970,O+,67.8,162.0,N
Melinda,Edie,female,Ms.,20-12-1953,A+,34.5,145.0,N
Angela,Vaz,female,Mrs.,11-12-1941,AB+,30.3,126.0,N
Gilberto,Schultz,male,Mr.,02-01-1971,B+,63.5,175.0,N
Dianne,Zubia,female,Ms.,28-01-2006,O+,69.1,157.0,N
Truman,Conner,male,Mr.,10-12-1940,O-,105.4,176.0,N
Dorothy,Keegan,female,Mrs.,14-12-1945,A+,54.7,174.0,N
Pamela,Guzman,female,Ms.,11-12-1941,B-,70.9,154.0,Y
John,Davenport,male,Mr.,04-01-1974,B+,96.9,177.0,N
Dorothy,Christen,female,Ms.,05-12-1933,B+,98.0,171.0,N
Mary,Baird,female,Mrs.,26-01-2003,O+,61.3,170.0,N
Luis,Boyett,male,Mr.,17-01-1991,O+,103.4,170.0,N
Tricia,Lombardi,female,Mrs.,13-12-1944,A+,50.5,155.0,Y
Joshua,West,male,Mr.,17-01-1991,A+,14.9,93.0,Y
Jimmie,Martinez,male,Mr.,03-12-1930,B+,114.0,171.0,N
Carla,Moore,female,Ms.,24-12-1958,B-,61.5,154.0,N
Megan,Johanson,female,Mrs.,16-12-1948,B+,85.3,159.0,N
Jenna,Tipton,female,Ms.,06-01-1977,O+,80.1,155.0,Y
Goldie,Maestas,female,Mrs.,06-12-1934,O-,83.4,161.0,N
Jeffrey,Evatt,male,Mr.,26-12-1961,O+,99.6,183.0,N
Lloyd,Carroll,male,Mr.,16-12-1947,B-,98.6,189.0,N
Dennis,Akins,male,Mr.,07-01-1978,B+,66.2,174.0,N
Naomi,Paulus,female,Mrs.,04-02-2015,B+,73.4,167.0,N
Dallas,Iverson,male,Mr.,03-01-1973,A+,66.5,177.0,N
Thomas,Boren,male,Mr.,10-12-1939,A+,67.9,188.0,N
Daniel,Helms,male,Mr.,22-12-1956,A+,87.0,174.0,N
Muriel,Labrecque,female,Ms.,30-01-2009,B+,48.3,155.0,Y
Bernice,Humphries,female,Mrs.,26-01-2003,O+,23.2,123.0,N
Kimberly,Victory,female,Ms.,10-01-1982,A+,100.8,167.0,N
Pam,Beach,female,Mrs.,12-12-1942,O+,79.8,155.0,N
Rita,Hines,female,Ms.,06-01-1977,A+,51.5,166.0,N
To be able to execute the statements below, you should create such file by copy-pasting the data above. Note that these data are fictional; these have been created thanks to the excellent online tool Fake Name Generator.

Here is how you can read this file and build a joint probability distribution from it:

>>> patient = lea.read_csv_file('patients.csv', col_names=('given_name', 'surname', 'gender', 'title', 'birthday', 'blood_type', 'weight{f}', 'height{f}', 'smoker{b}'), prob_type='d')
Note that the given column names shall be valid Python identifiers (starts with a letter, no blank, etc). The special suffix with curly braces used for the three last names are NOT part of the name. By default (no suffix), all read values are stored as strings in the built JPD; a curly braces suffix indicates the corresponding read values shall be converted as follows:

suffix conversion type
{s} string (note that this suffix can be omitted)
{i} integer
{f} float
{b} boolean

For converting string to boolean, the following rules are applied (case insensitive):

  • '1', 't', 'true', 'y', 'yes' --> Python's True,
  • '0', 'f', 'false', 'n', 'no' --> Python's False

As we shall see, putting {f} on weight and height shall allow to make comparisons and calculations and these attributes; putting {b} on smoker shall ease the queries by removing references to 'Y'/'N' (see below).

Now, patient is a JPD with named attributes, which you can display in tabular format

>>> patient
 given_name , surname     , gender  , title , birthday    , blood_type, weight, height, smoker
('Alan'     , 'Music'     , 'male'  , 'Mr.' , '30-12-1966', 'A-'      ,   61.3,  181.0, True  ) : 0.02
('Angela'   , 'Vaz'       , 'female', 'Mrs.', '11-12-1941', 'AB+'     ,   30.3,  126.0, False ) : 0.02
('Belva'    , 'Dry'       , 'female', 'Mrs.', '07-12-1936', 'AB+'     ,   76.9,  151.0, False ) : 0.02
('Bernice'  , 'Humphries' , 'female', 'Mrs.', '26-01-2003', 'O+'      ,   23.2,  123.0, False ) : 0.02
('Carla'    , 'Moore'     , 'female', 'Ms.' , '24-12-1958', 'B-'      ,   61.5,  154.0, False ) : 0.02

You can access the attributes using standard Python syntax, as shown in the previous section. For example, to get distribution of genders and titles (marginalisation):
>>> patient.gender
female : 0.6
male   : 0.4
>>> patient.title
Mr.  : 0.4
Mrs. : 0.3
Ms.  : 0.3
We'll see more elaborated queries in a next section.

Handling of CSV formats

The read_csv_file method as used above will work fine provided that your CSV file as a format similar to the format of our patient.csv file example. Here are indications on how to handle other CSV format.

Some CSV files contain a first line to be interpreted as a header containing field names.

patients2.csv

given_name,surname,gender,title,birthday,blood_type,weight,height,smoker
Elaine,McLaughlin,female,Ms.,12-01-1984,O+,44.9,141.0,Y
Alan,Music,male,Mr.,30-12-1966,A-,61.3,181.0,Y

In such case, all you have to do is to omit the col_names argument in the call to read_csv_file:
>>> patient = lea.read_csv_file('patients2.csv', prob_type='d')
Then, the names found in the first row shall be used to set the attribute names. Note that the curly braces suffixes could be added in the CSV file, with the same semantics as described above.

Another thing to know is that read_csv_file relies on Python's standard csv module and it supports the same arguments (with same default) as csv.reader function, namely dialect and **fmtparams. So, assuming for instance that your CSV file is tab-delimited, you can use:

>>> myData = lea.read_csv_file('data.csv', colNames=…, delimiter='\t')
A last point concerns possible problems of CSV file's encoding. If such problem occurs, you'll have to open the CSV file yourself with the right encoding and pass the file object to Lea.read_csv_file (instead of a filename). For example, in Python 3,
>>> with open('exoticData.csv', 'r', encoding='utf-8') as f
...     myExoticData = Lea.fromCSVFile(f, …)

Building JPD from a pandas dataframe

If you do data analysis using pandas library, you would be tempted to build a Lea JPD from a pandas "dataframe". To do this, you could manage to extract raw tuples from the data frames and use Lea from_joint method. Actually, Lea provides a dedicated method to do that, Lea.from_pandas_df. In the following example, we build first a pandas dataframe by reading the above-mentioned patients2.csv file (this time using pandas method, not Lea's).

>>> import pandas as pd
>>> df = pd.read_csv('patients2.csv')
You could of course create or transform such dataframe using other pandas functions. We then build a Lea JPD from that dataframe, using read_pandas_df function.
>>> patient2 = lea.read_pandas_df(df, prob_type='d')
The resulting JPD shall be the same as before, except that all the fields of the JPD are strings, as it is the field format used in the present dataframe. For other dataframe formats, the fields in Lea JPD will mimic the dataframe's.

Data mining on JPD

As explained in Defining joint probability distributions, all JPD attributes are accessed by using the standard dot syntax. Remember that, to work with simpler expressions, you could assign variables:

>>> given_name = patient.given_name
>>> surname = patient.surname
>>> gender = patient.gender
>>> …
or do these assignments automatically in one shot with:
lea.make_vars(patient,globals())
From now on, you can use the patient's attributes, without the patient. prefix. That's what we'll assume in the following section. Should you prefer to be safer with your namespace, simply keep prefixing these attribute variables with patient..

We can now start to query the JPD data. In all examples below, the returned probabilities have to be interpreted primarily as frequencies in the given 50 patients dataset; if the sample is large enough, and enough representative of the whole population, then these figures could be interpreted as probabilities.

Let's calculate the distribution of genders and titles.

>>> gender
female : 0.6
male   : 0.4
>>> title
Mr.  : 0.4
Mrs. : 0.3
Ms.  : 0.3
By using, the joint operation, we can see that genders and titles are dependent of each other, as we expect:
>>> lea.joint(gender, title)
('female', 'Mrs.') : 0.3
('female', 'Ms.' ) : 0.3
('male'  , 'Mr.' ) : 0.4
Let's calculate the distribution of heights. To avoid too long display, we shall group with a precision of 10 cm, using Python's integer division.
>>> height_by_ten = (height//10) * 10
>>> height_by_ten
 90.0 : 0.04
110.0 : 0.02
120.0 : 0.06
140.0 : 0.04
150.0 : 0.24
160.0 : 0.16
170.0 : 0.3
180.0 : 0.14
Then, let's map these values into category labels:
>>> height_cat = height_by_ten.map(lambda h: 'H-%03d-%03d'%(h, h+9))
>>> height_cat
H-090-099 : 0.04
H-110-119 : 0.02
H-120-129 : 0.06
H-140-149 : 0.04
H-150-159 : 0.24
H-160-169 : 0.16
H-170-179 : 0.3
H-180-189 : 0.14
We could check who is in the category 'H-120-129' and how tall they are (using English sentences, as an academic exercise!):
>>> (given_name+' is '+(height/100.).map(str)+' m high').given(height_cat=='H-120-129')
Angela is 1.26 m high  : 0.3333333333333333333333333333
Bernice is 1.23 m high : 0.3333333333333333333333333333
Ruby is 1.26 m high    : 0.3333333333333333333333333333
Note that there seem to be only women in this category... This raises another question: how are heights distributed by gender?
>>> female = gender=='female'
>>> male = gender=='male'
>>> height_cat.given(female)
H-120-129 : 0.1
H-140-149 : 0.06666666666666666666666666667
H-150-159 : 0.4
H-160-169 : 0.2333333333333333333333333333
H-170-179 : 0.2
>>> height_cat.given(male)
H-090-099 : 0.1
H-110-119 : 0.05
H-160-169 : 0.05
H-170-179 : 0.45
H-180-189 : 0.35
You can note on this example how height_cat, male and female variables are interdependent. Another approach, more integrated, is to use the joint operation:
>>> lea.joint(gender, height_cat)
('female', 'H-120-129') : 0.06
('female', 'H-140-149') : 0.04
('female', 'H-150-159') : 0.24
('female', 'H-160-169') : 0.14
('female', 'H-170-179') : 0.12
('male'  , 'H-090-099') : 0.04
('male'  , 'H-110-119') : 0.02
('male'  , 'H-160-169') : 0.02
('male'  , 'H-170-179') : 0.18
('male'  , 'H-180-189') : 0.14
Imagine now that you have to calculate the body mass index (BMI) of the patients. Lea's arithmetic shall do the job quite easily (note that we round the BMI to limit the number of distinct values):
>>> bmi = (weight/(height/100.)**2).map(round)
>>> bmi = (weight/(height/100.)**2).map(round)
>>> bmi
15 : 0.04
16 : 0.02
17 : 0.02
18 : 0.02
19 : 0.1
20 : 0.06
21 : 0.1
22 : 0.04
23 : 0.04
26 : 0.06
28 : 0.08
29 : 0.02
30 : 0.1
31 : 0.02
32 : 0.06
33 : 0.04
34 : 0.1
36 : 0.04
37 : 0.02
39 : 0.02
>>> bmi.mean_f()
26.22
We want to split patients into "underweight", "normal" and "obese" categories. For this purpose, we shall define 3 boolean variables, which are mutually exclusive.
>>> underweight = bmi < 18.5
>>> obese = bmi >= 30
>>> normal = ~underweight & ~obese
>>> P(underweight)
0.1
>>> P(obese)
0.4
>>> P(normal)
0.5
Note that we could have also defined a variable cat_bmi, mapping the BMI values into strings, using if / elif construct or, simpler, the Python's bisect module).

As a final example, let's select the female patients who are underweight and aged 18 years or more; for these patients, we want to display the full name, age, height, weight and BMI. Let's first build the full name and age; to ease things, we shall calculate the approximate age, for the year 2018, without considering the birth day and month.

>>> full_name = title + ' ' + given_name + ' ' + surname
>>> age = 2018 - birthday[-4:].map(int)
Now, using these two new variables, we can answer the initial request:
>>> lea.joint(full_name, age, height, weight, bmi).given(female, underweight, age>=18)
('Mrs. Dorothy Keegan', 73, 174.0, 54.7, 18) : 0.3333333333333333333333333333
('Mrs. Ruby Nunez'    , 52, 126.0, 23.6, 15) : 0.3333333333333333333333333333
('Ms. Melinda Edie'   , 65, 145.0, 34.5, 16) : 0.3333333333333333333333333333
To conclude this section, let us mention that it's possible to change to order of displayed rows according to some attribute(s). By default, the rows are sorted as Python tuples: the first column is the main sort key, the second one, etc. If you want to change the order, you may then use the lea.joint function to reorder attributes as needed. An alternative to avoid this consists in using the sort_by method, which takes any number of attributes as arguments. For example here is how you can order underweight patients by their blood type, then their gender, then their BMI:
>>> patient.given(underweight).sort_by(blood_type,gender,bmi)
 given_name, surname    , gender  , title , birthday    , blood_type, weight, height, smoker
('Melinda', 'Edie'     , 'female', 'Ms.' , '20-12-1953', 'A+'      ,   34.5,  145.0, False ) : 0.2
('Dorothy', 'Keegan'   , 'female', 'Mrs.', '14-12-1945', 'A+'      ,   54.7,  174.0, False ) : 0.2
('Joshua' , 'West'     , 'male'  , 'Mr.' , '17-01-1991', 'A+'      ,   14.9,   93.0, True  ) : 0.2
('Bernice', 'Humphries', 'female', 'Mrs.', '26-01-2003', 'O+'      ,   23.2,  123.0, False ) : 0.2
('Ruby'   , 'Nunez'    , 'female', 'Mrs.', '30-12-1966', 'O-'      ,   23.6,  126.0, True  ) : 0.2
You may note that the method can use any attribute, whether displayed or not, whether direct or derived.

WARNING: The sort_by method returns a distribution that has lost any dependency relationship. It has to be the last call before displaying the results. As a counter-example, the following construct patient.sort_by(blood_type,gender,bmi).given(underweight) will display the full list of patients, ignoring the filtering condition on underweight.

Bayesian reasoning

So far, we have seen that the given method allows calculating conditional probabilities. However, there is a whole class of probability problems where conditional probabilities P(A|B) are given. The general formula

P(A | B) . P(B) = P(A and B)

allows you, when applied with care and patience, answering most problems. We have seen how to define dependant random variables through joint distributions. Basically, this boils down to define P(A and B and ...) for each conjunction of 'atomic' events A, B, ... . This method allows modeling very accurately the dependences and to answer any query, including conditional probabilities. However, it requires the knowledge of the probability of each atomic combination, which could be difficult to know. Also, it has the drawback that combinatorial explosion entails the need to enter huge buckets of probability values.

As alternative to joint distributions, Lea provides a more effective approach using conditional probability tables (CPT). The idea is to reduce the degrees of freedom of joint distributions by modeling only causal relationships. In the following, we shall see that this approach is general and powerful when dealing with problems where conditional probabilities are given. We will see that the CPT technique is the cornerstone for Bayes reasoning.

Introducing CPT with a beetle

We shall start with the "beetle" example (Wikipedia):

"An entomologist spots what might be a rare subspecies of beetle, due to the pattern on its back. In the rare subspecies, 98% have the pattern, or P(Pattern|Rare) = 98%. In the common subspecies, 5% have the pattern. The rare subspecies accounts for only 0.1% of the population. How likely is the beetle having the pattern to be rare, or what is P(Rare|Pattern)?"

The random variables and their dependency can be represented by the following graphical model.

bn1.png

To solve this, we need to define two boolean probability distributions, namely rare and pattern. The first one is unconditional and easy to define:

rare = lea.event(0.001)

For pattern, we cannot (without manual calculations) set it so easily. We need to set two conditional probabilities * P(Pattern | Rare) = 98% * P(Pattern | ~Rare) = 5%

Lea provides a special construct to set these probabilities (requires Lea 2.3+):

pattern_if_rare     = lea.event(0.98)
pattern_if_not_rare = lea.event(0.05)
pattern = rare.switch({ True  : pattern_if_rare,
                        False : pattern_if_not_rare })

This switch method defines a special Lea instance, which represents a CPT. The idea is that pattern probability shall depend on the value of rare, according to the given dictionary: the keys of this dictionary shall be the possible values of rare, namely True and False. Let us check that our definition is in line with what we need:

  • What is the probability of having pattern for a rare beetle?
    >>> P(pattern.given(rare))
    0.98
    
  • What is the probability of having pattern for a common beetle?

    >>> pattern.given(~rare)
    False : 19/20
    True  :  1/20
    >>> Pf(pattern.given(~rare))
    0.05
    
    OK, pattern gives back the data we put in it; this does not bring any new information, it is just a sanity test. Now, let us come back to the initial question:

  • What is the probability to be rare for a beetle having the pattern?

    >>> P(rare.given(pattern))
    0.01924209699587669
    
    This result is really what Bayes inference is about. It can be checked by manual calculations using the Bayes theorem.

Once the CPT is defined, other calculations can be done easily:

  • What is the probability to be rare for a beetle NOT having the pattern?

    >>> P(rare.given(~pattern))
    2.107326119253587e-05
    

  • What is the probability for any beetle to have the pattern?

    >>> P(pattern)
    0.05093
    

  • What is the probability for a beetle to be rare AND to have the pattern?

    >>> P(rare & pattern)
    0.0009800000000000002
    

It is even possible to build a joint distribution giving all possible conjunctions (AND), by using the joint operation:

>>> lea.joint(rare,pattern)
(False, False) : 0.94905
(False, True ) : 0.04995
(True , False) : 2.000000000000002e-05
(True , True ) : 0.00098
This first example shows you the general process: you set up the input probabilities, conditional or not, in a declarative manner. Then, you type queries is a natural way and Lea apply conditional probability rules behind the scene. This technique can be used to solve the intriguing Monty Hall problem.

CPT with non-boolean probability distributions

The previous example use boolean probability distributions, which is common with conditional probabilities. However, depending on the problem at hand, other types of distribution can be handled. To illustrate this point we shall revisit the previous problem with 2 variables, kind and aspect, which refer to string probability distributions.

bn2.png

kind = lea.pmf({ 'rare'   : 0.001,
                 'common' : 0.999 })
aspect_if_rare   = lea.pmf({ 'pattern'    : 0.98,
                             'no pattern' : 0.02 }) 
aspect_if_common = lea.pmf({ 'pattern'    : 0.05,
                             'no pattern' : 0.95 })
aspect = kind.switch({ 'rare'   : aspect_if_rare,
                       'common' : aspect_if_common })
Note: an alternative, more general, construction shall be presented later (see cpt method).

Now, aspect is a new CPT that gives probability distribution of 'pattern' vs 'no pattern', according to the value of kind. Now, the question "what is the probability to be rare for a beetle having the pattern?" can be restated in one of the following manners:

>>> kind.given(aspect == 'pattern')
common : 0.9807579030041232
rare   : 0.01924209699587669
>>> (kind == 'rare').given(aspect == 'pattern')
False : 0.9807579030041232
True  : 0.01924209699587669
>>> P((kind == 'rare').given(aspect == 'pattern'))
0.01924209699587669
In the present example, using booleans or string attributes to model the problem is a matter of taste. However, in many situations, models go beyond binary choices and cannot be represented by boolean distributions. For example, imagine that the entomologist wants split the kind 'common' into 'common_A' and 'common_B' as follows:
kind2 = lea.pmf({ 'rare'      : 0.001,
                  'common_A'  : 0.342,
                  'common_B'  : 0.657 })
Also, he wants to split the aspect 'pattern' into 'stripped' and 'mottled', with given conditional probabilities:
aspect2_if_rare     = lea.pmf({ 'stripped'   : 0.78,
                                'mottled'    : 0.20,
                                'no pattern' : 0.02 })
aspect2_if_common_A = lea.pmf({ 'stripped'   : 0.01,
                                'mottled'    : 0.04,
                                'no pattern' : 0.95 })
aspect2_if_common_B = lea.pmf({ 'stripped'   : 0.03,
                                'mottled'    : 0.02,
                                'no pattern' : 0.95 })
aspect2 = kind2.switch({ 'rare'     : aspect2_if_rare,
                         'common_A' : aspect2_if_common_A,
                         'common_B' : aspect2_if_common_B } )
Here are some examples of queries.
>>> aspect2
mottled    : 0.027020000000000002
no pattern : 0.94907
stripped   : 0.02391
>>> kind2.given(aspect2 == 'stripped')
common_A : 0.14303638644918445
common_B : 0.8243412797992472
rare     : 0.03262233375156839
>>> kind2.given(aspect2 != 'no pattern')
common_A : 0.3357549577851953
common_B : 0.6450029452189279
rare     : 0.019242096995876694
>>> P((kind2[:3] == 'com').given(aspect2 != 'no pattern'))
0.9807579030041232
>>> P((kind2 == 'rare').given(aspect2 != 'no pattern'))
0.019242096995876694
Note the consistency of the last result with the first beetle model:
>>> P((kind == 'rare').given(aspect == 'pattern'))
0.01924209699587669
This consistency is due to the fact that the entomologist has carefully made the model refinement so that (see above):

P(common) = P(common_A) + P(common_B) and P(pattern) = P(stripped) + P(mottled).

Bayesian networks

The technique to build CPT can be used to define Bayesian networks (BN), which model causality chains between uncertain events. There is no new syntax here but you shall see how multiple CPT can be connected together to define complex networks.

We shall use the well-known "burglary Bayesian network" of J. Pearl, explained in AIMA book. You can find also good descriptions of this network on the Internet with the following keywords: burglar, Bayes (note that that the classical, yet simpler, example of "rain-sprinkler-grass" is covered here, in Lea examples page).

Here is the graphical model of this BN.

bn3.png

Here is how to model this Bayesian network in Lea:

burglary   = lea.event(0.001)
earthquake = lea.event(0.002)
alarm = lea.joint(burglary, earthquake).switch({ (True ,True ) : lea.event(0.950),
                                                 (True ,False) : lea.event(0.940),
                                                 (False,True ) : lea.event(0.290),
                                                 (False,False) : lea.event(0.001) })
john_calls = alarm.switch({ True  : lea.event(0.90),
                            False : lea.event(0.05) })
mary_calls = alarm.switch({ True  : lea.event(0.70),
                            False : lea.event(0.01) })

We have not done more than building three CPT, using the syntax explained in the previous sections. The sole new technique used here is for the alarm CPT, which depends on two variables, viz. burglary and earthquake: the joint method is used to join these two variables so as to produce the four combinations as tuples containing booleans.

Now, we are ready to query the network. Let us first make "forward" derivations (i.e from causes to effects).

  • What is the probability that Mary calls if the alarm is triggered? (this is a given!)

    >>> P(mary_calls.given(alarm))
    0.7
    

  • What is the probability that Mary calls if there is a burglary?

    >>> P(mary_calls.given(burglary))
    0.6586138
    

Now, we can also query the Bayes network in a "backward" manner (i.e. from effects to cause)

  • What is the probability that there is a burglary if alarm is triggered?

    >>> P(burglary.given(alarm))
    0.373551228281836
    

  • What is the probability that the alarm is triggered if Mary calls?

    >>> P(alarm.given(mary_calls))
    0.1500901177497596
    

  • What is the probability that there is a burglary if Mary OR John calls (or both)?

    >>> P(burglary.given(mary_calls | john_calls))
    0.014814211524985793
    

  • What is the probability that there is a burglary if Mary AND John call?

    >>> P(burglary.given(mary_calls & john_calls))
    0.28417183536439294
    

It is also possible to get unconditional probabilities of events or conjunction of events

  • What is the probability that the alarm is triggered?

    >>> P(alarm)
    0.002516442
    

  • What is the probability that there is a burglary and Mary calls?

    >>> P(burglary & mary_calls)
    0.0006586138000000001
    

  • What is the probability that there is neither burglary nor earthquake, the alarm triggers and both John and Mary call?

    >>> P(~burglary & ~earthquake & alarm & john_calls & mary_calls)
    0.00062811126
    

Note that the last result can be checked in the AIMA book.

As an academic exercise, you can very easily "flatten" the network to build a joint table giving the probabilities of each conjunction of events. This boils down to calculate the joint probability distribution between the 5 variables.

>>> lea.joint(burglary, earthquake, alarm, john_calls, mary_calls)
(False, False, False, False, False) : 0.9367427006190001
(False, False, False, False, True ) : 0.009462047481000001
(False, False, False, True , False) : 0.04930224740100002
(False, False, False, True , True ) : 0.0004980024990000002
(False, False, True , False, False) : 2.9910060000000004e-05
(False, False, True , False, True ) : 6.979013999999999e-05
(False, False, True , True , False) : 0.00026919054000000005
(False, False, True , True , True ) : 0.00062811126
(False, True , False, False, False) : 0.0013341744900000002
(False, True , False, False, True ) : 1.3476510000000005e-05
(False, True , False, True , False) : 7.021971000000001e-05
(False, True , False, True , True ) : 7.092900000000001e-07
(False, True , True , False, False) : 1.7382600000000002e-05
(False, True , True , False, True ) : 4.0559399999999997e-05
(False, True , True , True , False) : 0.00015644340000000006
(False, True , True , True , True ) : 0.00036503460000000007
(True , False, False, False, False) : 5.631714000000006e-05
(True , False, False, False, True ) : 5.688600000000006e-07
(True , False, False, True , False) : 2.9640600000000033e-06
(True , False, False, True , True ) : 2.9940000000000035e-08
(True , False, True , False, False) : 2.8143600000000003e-05
(True , False, True , False, True ) : 6.56684e-05
(True , False, True , True , False) : 0.0002532924000000001
(True , False, True , True , True ) : 0.0005910156000000001
(True , True , False, False, False) : 9.40500000000001e-08
(True , True , False, False, True ) : 9.50000000000001e-10
(True , True , False, True , False) : 4.9500000000000054e-09
(True , True , False, True , True ) : 5.0000000000000066e-11
(True , True , True , False, False) : 5.7e-08
(True , True , True , False, True ) : 1.3299999999999996e-07
(True , True , True , True , False) : 5.130000000000002e-07
(True , True , True , True , True ) : 1.1970000000000001e-06
We see here the interest of using Bayesian networks, defined with only 10 probability values for the causal dependencies, instead of 32 for the joint table. Let's stress that this calculated table takes into account the dependencies between the 5 variables. It may be used to verify or explain the query results given above (marginalization).

Reading BIF files

BIF (Bayesian Interchange Format) is a standard file format for defining Markov networks. A repository of BIF files can be found on the bnlearn site.

Lea is able to read BIF files and create the Markov networks in one statement. For example, here is how to load the earthquake.bif file found on the abovementioned page.

earthquake_bn = lea.read_bif_file("earthquake.bif")
The result is a dictionary that returns the created Lea instances representing the read BN, with their names as keys:
earthquake_bn.keys()
# -> dict_keys(['Burglary', 'Earthquake', 'Alarm', 'JohnCalls', 'MaryCalls'])
You may then query the BN by using the techniques seen above.
P(earthquake_bn["Burglary"])
# -> 0.01
To ease your work, it is possible to create the variables directly in your current session:
lea.declare_namespace(globals())
earthquake_bn = lea.read_bif_file("earthquake.bif", create_vars=True)
This injects Burglary, Earthquake, Alarm, JohnCalls and MaryCalls variables:
P(Burglary)
# -> 0.01
P(MaryCalls.given(Alarm))
# -> 0.7
P(Burglary.given(JohnCalls))
# -> 0.13331382432504352
Take care that several results here are different from those seen earlier because the probability of burglary in earthquake.bif is defined as 1%, while it was 0.1% in the previous example.

Advanced features for Bayesian networks

One of the drawbacks of Bayesian networks using the simple CPT technique presented is that it requires specifying probability data for each combination of influencing variables. Since the number of cases grows exponentially with the number of influencing variables, this could become difficult to specify.

The present section presents several ways to reduce the number of cases to specify in CPT, by means of factorization.

Note that the special topic of calculating CPT from data shall be covered later, in a specific section dedicated to Machine Learning.

Specifying a default entry

One of the simplest ways to reduce the enumeration of cases in CPT is to have a default entry that groups all the cases that have not been explicitly given.

Let us revisit the burglary network. Consider a new model of alarm device for which the two cases "burglary without earthquake" and "earthquake without burglary" cannot be distinguished, these being associated to a probability equal to 0.9. Instead of writing this:

alarm2 = lea.joint(burglary,earthquake).switch({ (True ,True ) : lea.event(0.950),
                                                 (True ,False) : lea.event(0.900),
                                                 (False,True ) : lea.event(0.900),
                                                 (False,False) : lea.event(0.001) })
you could put the common 0.900 probability as a second argument of the switch method, just after the dictionary:
alarm2 = lea.joint(burglary,earthquake).switch({ (True ,True ) : lea.event(0.950),
                                                 (False,False) : lea.event(0.001) },
                                                                 lea.event(0.900)  )

Building CPT with one condition (if_ method)

For variables depending on one Boolean variable, the switch method can be abbreviated using the if_ convenience function: it mimics the if-then-else constructions found in programming languages (the underscore is put to avoid a clash with Python’s if keyword). For instance, the definition

john_calls = alarm.switch({ True  : lea.event(0.90),
                            False : lea.event(0.05) })
can be rewritten as follows :
john_calls = lea.if_( alarm, lea.event(0.90),
                             lea.event(0.05) )
This is no more than syntactic sugar: the if_ construct builds a CPT in a somehow more expressive way than the switch method.

Reducing the size of CPT

All the examples seen above use the switch method to build up CPT. This allows specifying a set of clauses, each of which is triggered when the decision variables are equal to a specific combination of values. Although the switch method works fine and is efficient (through Python dictionaries), it is limited in the way the conditions are expressed since it requires an enumeration of all possible values. This could be a burden when the decision variable has a large domain, for example if it joins multiple variables or if the domain covers a big range of values. More importantly, such explicit CPT may be intractable due to memory usage. In several cases however, it happens that the CPT

  • returns the same probability distribution for different values (see examples below), possibly if some condition is verified (context-specific independence),
  • or returns probability distributions that can be calculated by a deterministic function (e.g. noisy-or, noisy-max models).

In such situations, it's possible to avoid the definition of an explicit CPT by exploiting the properties/assumptions of the probabilistic model. For this purpose, we present below three techniques, all available in Lea:

  1. CPT defined by a function
  2. CPT defined by constraints
  3. CPT with Boolean expressions
  4. Cascaded CPT

CPT defined by function (switch_func method)

Let's imagine a world where the temperatures are distributed uniformly from -10°C to 29°C, as integer numbers:

temperature = lea.interval(-10,29)
To measure the temperature in this world, in place of a precise thermometer, we've got a device that indicates COLD, MEDIUM or HOT depending on the temperature: unfortunately, this device is very imprecise and the best that we can do is to model its indicator as three probability distributions, which depend of the range of temperatures:
ind_if_temperature_below_0          = lea.pmf({ 'COLD': 0.75, 'MEDIUM': 0.25               })
ind_if_temperature_between_0_and_19 = lea.pmf({ 'COLD': 0.25, 'MEDIUM': 0.60,  'HOT': 0.15 })
ind_if_temperature_above_20         = lea.pmf({               'MEDIUM': 0.20,  'HOT': 0.80 })
So, we see that the indicator roughly follows the temperature value, with some randomness. The indicator shall then be specified as a CPT linking any temperature to one of the three above-specified distributions. If we use the switch technique seen above, there are 40 entries to specify:
indicator = temperature.switch({ -10: ind_if_temperature_below_0,

                                  -1: ind_if_temperature_below_0,
                                   0: ind_if_temperature_between_0_and_19,

                                  19: ind_if_temperature_between_0_and_19,
                                  20: ind_if_temperature_above_20,

                                  29: ind_if_temperature_above_20 })
There is much redundancy here; it's a case of context-specific independence: the indicator is independent of the temperature as soon as the actual temperature has defined which of the three ranges is applicable. Things could be improved a bit by grouping one of the third as a default entry, but it remains unsatisfactory for the modeler point of view.

The switch_func method offers a way to remove the need to provide such redundant CPT. The idea is to provide a method that emulates the CPT lookup on the fly. So instead of relying on a dictionary as above, you provide a Python function that returns the probability distribution of the indicator for any temperature given in argument.

temperature = lea.interval(-10,29)
def indicator_func(temperature_value):
    if temperature_value < 0:
        return ind_if_temperature_below_0
    if temperature_value < 20:
        return ind_if_temperature_between_0_and_19
    return ind_if_temperature_above_20
indicator = temperature.switch_func(indicator_func)
Here are some queries, which check the forward dependency from the temperature to the indicator:
>>> indicator.given(temperature == -5)
COLD   : 0.7500000000000001
MEDIUM : 0.25
>>> indicator.given(temperature <= 0)
COLD   : 0.7045454545454545
HOT    : 0.013636363636363634
MEDIUM : 0.28181818181818175
>>> indicator
COLD   : 0.3124999999999998
HOT    : 0.27500000000000013
MEDIUM : 0.4125000000000001 
And here are queries that infer the temperature from the indicator:
>>> P((temperature < 0).given(indicator == 'COLD'))
0.6
>>> P((temperature < 0).given(indicator == 'MEDIUM'))
0.15151515151515144
>>> P((temperature < 0).given(indicator == 'HOT'))
0.0
>>> P((temperature <= 0).given(indicator == 'HOT'))
0.013636363636363629
>>> P((temperature > 0).given(indicator == 'HOT'))
0.9863636363636364
>>> P((temperature <= 0).given(indicator != 'COLD'))
0.11818181818181817
>>> P((temperature <= 0).given(indicator.is_any_of('MEDIUM','HOT')))
0.11818181818181817
As explained before, you can obtain some understanding on the underlying calculations by making a joint between the variables:
>>> lea.joint(indicator,temperature)
('COLD'  , -10) : 0.01875000000000001
('COLD'  ,  -9) : 0.01875000000000001
('COLD'  ,  -8) : 0.01875000000000001
('COLD'  ,  -7) : 0.01875000000000001

For this switch_func construct, it's important to note that, internally, the indicator_func function is stored in place of the CPT. The probabilistic inference algorithm is smart enough to call this function when needed, instead of expanding the full CPT. So, this construct is very efficient in term of memory usage.

WARNING: the function you pass to switch_func shall return Lea instances that are defined by given elementary pmf, e.g. as returnd by lea.pmf(...) or lea.vals(...) calls. Raw values are accepted also: these are automatically coerced to a pmf with a single value having probability 1. If the passed function returns more complex Lea instances (functions, arithmetic expressions, joints, CPT, etc.), then an exception is raised at evaluation.

We shall see later how the switch_func can be used to build up noisy-or / noisy-max models (see here).

CPT defined by constraints (such_that method)

In some situations, we want to define a CPT by means of constraint to enforce. The methods shown above could become cumbersome, whether by explicit table definition (switch) or definition of one function (switch_func).

A well known use case is the Monty Hall problem on Wikipedia. To model this problem in Lea, we can define first the door with prize and the door chosen by the player, as two independent random variables:

lea.set_prob_type('r')
door = "Door " + lea.vals(*"123")
prize = door.new()
choice1 = door.new()
Then we need to define the door revealing a goat, such that it is not the choice of the player. To achieve this, we need a CPT. From the techniques een above, we could use the switch, expanding the 3x3 cases, or use switch_func providing the door selection function. But these are a bit difficult to write (see end of present subsection). For building the CPT, the method such_that provides a far simpler definition:
goat = door.such_that(door != choice1, door != prize)
This definition is quite natural, "declarative", since it expresses the constraints that the CPT shall enforce. Then, the new choice of the player that change his initial choice can be modelled similarly:
choice2 = door.such_that(door != choice1, door != goat)
Then, the probability of win is
P(choice2 == prize)
# -> 2/3
which is the correct, although counterintuitive, result.

This result can be explained by doing the following joint:

lea.joint(prize, choice1, goat, choice2, choice2==prize)
# -> ('Door 1', 'Door 1', 'Door 2', 'Door 3', False) : 1/18
     ('Door 1', 'Door 1', 'Door 3', 'Door 2', False) : 1/18
     ('Door 1', 'Door 2', 'Door 3', 'Door 1', True ) : 2/18
     ('Door 1', 'Door 3', 'Door 2', 'Door 1', True ) : 2/18
     ('Door 2', 'Door 1', 'Door 3', 'Door 2', True ) : 2/18
     ('Door 2', 'Door 2', 'Door 1', 'Door 3', False) : 1/18
     ('Door 2', 'Door 2', 'Door 3', 'Door 1', False) : 1/18
     ('Door 2', 'Door 3', 'Door 1', 'Door 2', True ) : 2/18
     ('Door 3', 'Door 1', 'Door 2', 'Door 3', True ) : 2/18
     ('Door 3', 'Door 2', 'Door 1', 'Door 3', True ) : 2/18
     ('Door 3', 'Door 3', 'Door 1', 'Door 2', False) : 1/18
     ('Door 3', 'Door 3', 'Door 2', 'Door 1', False) : 1/18
For your information, here are the equivalent (but less pleasant) flavours for defining goat :

  • switch requires a long enumeration
    goat = lea.joint(  prize  ,  choice1).switch(
                  { ( 'Door 1', 'Door 1') : lea.vals('Door 2','Door 3'),
                    ( 'Door 1', 'Door 2') : 'Door 3',
                    ( 'Door 1', 'Door 3') : 'Door 2',
                    ( 'Door 2', 'Door 2') : lea.vals('Door 1','Door 3'),
                    ( 'Door 2', 'Door 1') : 'Door 3',
                    ( 'Door 2', 'Door 3') : 'Door 1',
                    ( 'Door 3', 'Door 3') : lea.vals('Door 1','Door 2'),
                    ( 'Door 3', 'Door 1') : 'Door 2',
                    ( 'Door 3', 'Door 2') : 'Door 1' })
    
  • switch_func requires defining a tricky construct:
    doors = door.support()
    def selected_door(eliminated_doors):
        return lea.vals(*(d for d in doors if d not in eliminated_doors))
    goat = lea.joint(prize, choice1).switch_func(selected_door)
    

CPT with Boolean expressions (cpt method)

The cpt function offers an alternate solution to the avoid CPT redundancies. The idea is to specify a list of entries made up of Boolean conditions with associated Lea instances. It's a bit like you would do in Python with a sequence of if constructs (see indicator_func above). Revisiting the example given in the previous section, here is the syntax to build the indicator variable using cpt:

indicator = lea.cpt( (temperature < 0                        , ind_if_temperature_below_0          ),
                     ((0 <= temperature) & (temperature < 20), ind_if_temperature_between_0_and_19 ),
                     (20 <= temperature                      , ind_if_temperature_above_20         ))
With this construct, the example queries given in the previous section produce the very same results.

Note that the method expects that the given conditions cover all possible cases and are mutually exclusive (so, it’s not exactly comparable to an ifelifelse construct). If this is not respected, then an exception is raised.

Similarly to the switch method, the cpt method accepts a default clause: it is represented by condition set to None. This special value is a placeholder for any case not covered by the other condition(s). To illustrate this, here is a definition of indicator that is exactly equivalent to the previous one.

indicator = lea.cpt( (temperature < 0  , ind_if_temperature_below_0          ),
                     (None             , ind_if_temperature_between_0_and_19 ),
                     (20 <= temperature, ind_if_temperature_above_20         ))
As you see, None can be specified at any place of lea.cpt’s arguments, not specially the last one. Of course, no more than one None is authorized (an exception is raised, otherwise).

Cascaded CPT

In several CPT presenting context-specific independence, it's possible to remove redundancies by building CPT "in cascade". Here is a very basic example, where z is a CPT depending of x and y:

x = lea.event('1/3')
y = lea.event('1/4')
z = lea.joint(x,y).switch({ (False, False): lea.event('1/5'),
                            (False, True ): lea.event('1/7'),
                            (True , False): lea.event('1/2'),
                            (True , True ): lea.event('1/2') })
You see that if x is True then z is a 50-50 distribution, which does not depend of y. The redundancy can then be removed by embedding a CPT governed by y only within a CPT governerd by x only, as follows:
z1 = x.switch({ False: y.switch({ False: lea.event('1/5'),
                                 True  : lea.event('1/7')}),
               True  : lea.event('1/2') })
Equivalently, a similar construct can be done using cpt:
z2 = lea.cpt( ( ~x, lea.cpt( (~y, lea.event('1/5')),
                             ( y, lea.event('1/7')))),
              (  x, lea.event('1/2')) )
You can now check by various queries that z, z1 and z2 represent the same logical CPT. For instance,
print (P(z.given(y)))
# -> 11/42
print (P(z1.given(y)))
# -> 11/42
print (P(z2.given(y)))
# -> 11/42

Of course, all these examples are quite contrived and the gain is negligible; you can anyway extrapolate the benefit of these techniques on bigger CPT, having prohibitive combinatorial.

Noisy-or / noisy-max models

The noisy-or / noisy-max models provide practical alternatives to naive CPT, provided that some assumptions can be made on the causal dependencies (we won't detail these here). Lea provides several means to build up such models. For the example, we shall define a symptom boolean random variable (fever), which is modeled as a noisy-or of disease boolean random variables (cold, flu and malaria) (see figures and explanations in AIMA book or in this presentation.

Let's first define the input data, which are the conditional probabilities of fever if the occurrence of one single disease:

prob_fever_if_sole_cold    = 0.4
prob_fever_if_sole_flu     = 0.8
prob_fever_if_sole_malaria = 0.9
then, a priori probability distributions of these diseases (arbitrary values, not found in afore-mentioned reference):
cold    = lea.event(0.082)
flu     = lea.event(0.025)
malaria = lea.event(0.001)
Now, the following construct define the fever variable as a noisy-or of the three diseases:
fever_by_cold    = lea.if_(cold   , lea.event(prob_fever_if_sole_cold)   , False)
fever_by_flu     = lea.if_(flu    , lea.event(prob_fever_if_sole_flu)    , False)
fever_by_malaria = lea.if_(malaria, lea.event(prob_fever_if_sole_malaria), False)
fever = fever_by_cold | fever_by_flu | fever_by_malaria
The fever_by_cold variable can be interpreted as "having fever due to a cold": in case of cold, it has a probability 0.4 to be true, if there is no cold, it is surely false (i.e. either there is no fever or the fever is not caused by cold). The fever is obtained by ORing the three 'noised' variables. This uses the dedicated Lea's construct, which simply uses the classical OR to combine all possible values. Note that this noisy-or model is very different to:
light_fever = cold | flu | malaria
which models a symptom that occurs certainly (probability 1) as soon as at least one disease occurs, e.g. a light fever. Note that you would get the same result if you had set prob_fever_if_sole_cold = prob_fever_if_sole_flu = prob_fever_if_sole_malaria = 1. In such special case, the noisy-or boils down to a classical or.

One way to check that fever is correctly defined is by checking one by one each entry of the equivalent CPT, which can be calculated easily from the 3 given probabilities. Here is how this check can be done:

diseases = lea.joint(cold,flu,malaria)
for (c,f,m) in diseases.support:
    print (c, f, m, lea.P(fever.given(cold==c,flu==f,malaria==m)))
# -> False False False 0.0
#    False False True 0.9
#    False True False 0.8
#    False True True 0.98
#    True False False 0.4
#    True False True 0.94
#    True True False 0.88
#    True True True 0.9879999999999999
You can check that values match the ones found in the afore-mentioned references.

You may then execute queries, as described earlier in the tutorial:

P(fever.given(~cold,flu,malaria))
# -> 0.98
P(malaria.given(fever))
#- > 0.017080461111676846
P(flu.given(~fever,cold))
#- > 0.005102040816326529
Noisy-max models may be define in a similar way, using lea.max_of method. These may use non-boolean random variables, provided that their values are ordered. You may note that the definition of fever above could be replaced by
fever = lea.max_of(fever_by_cold, fever_by_flu, fever_by_malaria) 
without changing anything since Python booleans are ordered (False < True). This tends to confirm that noisy-max is a generalization of noisy-or.

Before concluding the section, let's mention that there exists an alternate way to make the noisy-or / noisy-max models, which relies on the switch_func method. The idea is to define a function that mimics the CPT, by calculating the probability distributions expected for values of causing variables passed as argument. This function shall of course use the assumptions made for the noisy model at hand. Here is how to proceed to redo the fever model:

def fever_event(disease_val):
    (cold_val, fever_val, malaria_val) = disease_val
    inv_prob = 1.
    if cold_val:
        inv_prob *= 1. - prob_fever_if_sole_cold
    if fever_val:
        inv_prob *= 1. - prob_fever_if_sole_flu
    if malaria_val:
        inv_prob *= 1. - prob_fever_if_sole_malaria
    return lea.event(1.-inv_prob)
diseases = lea.joint(cold,flu,malaria)
fever2 = diseases.switch_func(fever_event)
You may check that replacing fever by fever2 in the query above give the very same results.

To conclude, all the models seen here do not necessitate the definition of large CPT. Furthermore, the inference algorithm used does not require to calculate such CPT internally, so these models are very efficient in memory usage.

Prior probabilities (revised_with_cpt method)

The methods seen so far for building CPT miss an important class of problems of conditional probability: those mixing a conditional probability and a prior, unconditional, probability.

We shall take here an example of the AIMA book. It models a causal dependency between the meningitis and the stiff neck. The assumptions are the following:

  • the probability that a patient has meningitis is 1/50,000 (prior probability)
  • the probability that a patient has stiff neck is 1/20 (prior probability)
  • the meningitis causes stiff neck 50 % of the times (conditional probability)

OK, the probability of meningitis is easily tackled:

meningitis = lea.event(1/50000)
For stiff neck however, should we try the if_ syntax, we are stuck on the "else" argument:
stiffneck = Lea.if_( meningitis, lea.event(1/2),
                                 ?? WHAT HERE ??  )
The problem is that the probability of stiff neck if NO meningitis is unknown, so we cannot fill in the second part of the CPT definition. By careful calculations, we could try to find this value such that the unconditional probability of stiff neck is equal to 1/20. Fortunately, this is not needed, you may use the prior_lea argument for this:

stiffneck = lea.if_( meningitis, lea.event(1/2),
                                 prior_lea=lea.event(1/20))
Note: for previous Lea versions, you have to use the dedicated method revised_with_cpt:
stiffneck = lea.event(1/20).revised_with_cpt((meningitis,lea.event(1/2)))
which could be interpreted as: "Without any information, the probability of stiff neck for a given patient is 1/20; however, if the patient has got meningitis, then this probability becomes 1/2".

Let us check these statements:

>>> P(stiffneck)
0.05
>>> P(stiffneck.given(meningitis))
0.5
Note that stiffneck object is a true CPT. Behind the scene, Lea has calculated the probability of stiff neck if no meningitis; then, it was able to fill in the missing data. You can get this value:
>>> P(stiffneck.given(~meningitis))
0.0499909998199964
In the first statement, you could replace the "?? WHAT HERE ??" by 0.0499909998199964 and get the same CPT. Of course, this is not needed since Lea has made the work for you!

Now, we are able to compare the probability of meningitis, prior and after stiff neck information:

>>> P(meningitis)
2e-05
>>> P(meningitis.given(stiffneck))
0.0002
We see that the given information has multiplied the probability of meningitis by ten. This result is in line with the Bayes rule, since

P(meningitis | stiffneck) = P(stiffneck | meningitis) x P(meningitis) / P(stiffneck)

= (1/2) . (1/50000) / (1/20) = 1/5000 = 0.0002

We can check that also using Lea:

>>> P(stiffneck.given(meningitis)) * P(meningitis) / P(stiffneck)
0.0002
As noted before, the previous expression is NOT the way the conditional probability is computed in Lea: a most general algorithm is used, which is also most efficient.

You should take care that some problems have no solutions. In such cases, Lea raises an exception with an error message giving the possible range for the prior probability. Example:

stiffneck2 = lea.event(1/200000).revised_with_cpt((meningitis,lea.event(1/2)))
lea.lea.Error: prior probability of 'False' is 0.999995, outside the range [ 1e-05 , 0.9999899999999999 ]

Prior probabilities work also with non-boolean distributions. Let us revisit the beetle example modeled with attribute. Imagine that the problem is restated as follows: the prior probability of a beetle with pattern is 5.093 %, in the rare subspecies, 98 % have the pattern; the rare subspecies accounts for only 0.1 % of the population. Note that, considering the initial problem, we have removed one data (conditional probability of pattern for common beetle) and added one data (prior probability pattern).

Here are the statements to model these data:

aspect0 = lea.pmf({ 'pattern'    : 0.05093,
                    'no pattern' : 0.94907 })
aspect1 = aspect0.revised_with_cpt(( kind == 'rare', lea.pmf({ 'pattern'    : 0.98,
                                                               'no pattern' : 0.02 })))
Now, what is probability of pattern in common subspecies?
>>> aspect1.given(kind == 'common')
no pattern : 0.95
pattern    : 0.05
This is striking: we get back the same probability of 5% as the initial problem! Why is it so? The answer lies in the given value 0.05093 for prior probability of pattern. Actually, this value has been chosen on purpose, based on unconditional probability of pattern, as calculated in the initial problem (i.e. aspect, not aspect1):
>>> aspect
no pattern : 0.94907
pattern    : 0.05093
This demonstrates the consistency between the two methods to define CPT.

Markov chains

Lea allows defining Markov chains. We shall show here how to model the stock market Markov chain example given on Wikipedia.

To execute the examples shown in this section, you have to import the markov module present in the Lea package:

from lea import markov

Definition from transition probability matrix

The states defining the Markov chain can be any Python object. We shall use here the most simple option : strings. The three market states of the example shall be named "BULL", "BEAR" and "STAG". Here is how you define these states in Lea, along with the probability transition matrix (note that we omit the prompts, so you can copy-paste easily the multline statement):

market = markov.chain_from_matrix(
                  ( 'BULL', 'BEAR', 'STAG'  ),
        ( 'BULL', ( 0.900 , 0.075 , 0.025  )),
        ( 'BEAR', ( 0.150 , 0.800 , 0.050  )),
        ( 'STAG', ( 0.250 , 0.250 , 0.500  )))
Note that the rows of the matrix represent from state and columns represent to states. For instance, the first row indicates the probabilities of transition starting from "BULL" : 90 % to stay "BULL", 7.5 % to become "BEAR" and 2.5 % to become "STAG".

We have here a Markov chain, named market, which captures the set of states and all probabilities of transition from state to state. These probabilities can be displayed by using the same methods as shown for simple distributions:

>>> market
BULL
  -> BULL : 0.9
  -> BEAR : 0.075
  -> STAG : 0.025
BEAR
  -> BULL : 0.15
  -> BEAR : 0.8
  -> STAG : 0.05
STAG
  -> BULL : 0.25
  -> BEAR : 0.25
  -> STAG : 0.5
Take care however that market is NOT a Lea instance (actually, it's a markov.Chain). Apart displaying itself, the operations you can do on this object are minimal: getting its inner states or given probability distributions of these states. We shall see this in the following.

Querying the Markov chain

To make queries, we need to be able to define an initial state. Here is how to proceed:

>>> (bull_state,bear_state,stag_state) = market.get_states()
Each of these 3 variables is Lea instance representing a certain distribution, i.e. having a probability 1. For instance, the bear_state object represents the state BEAR:
>>> bear_state
BEAR : 1.0
You may also use the `state' attribute, which defines a uniform probability distribution over the states:

Note: In Lea version previous to 3.1, write _state instead of state.

>>> market.state
BULL : 0.3333333333333333
BEAR : 0.3333333333333333
STAG : 0.3333333333333333
To get the "next state" distribution (i.e. the state at next week, in the example), use the next_state() method:
>>> market.state.next_state()
BULL : 0.4333333333333333
BEAR : 0.375
STAG : 0.19166666666666665
>>> bear_state.next_state()
BULL : 0.15
BEAR : 0.8
STAG : 0.05
You can check (again) on this last example that the given probability transition matrix has been set up correctly. Note that the object returned by next_state(…) is a Lea instance, on which you can apply usual techniques seen in the tutorial.

To get the state probability distribution after n steps, instead of chaining n times the next_state() method, you can simply pass n as argument:

>>> bear_state.next_state(3)
BULL : 0.3575
BEAR : 0.56825
STAG : 0.07425
You can check these results with Wikipedia's.

To define an initial non-uniform probability distribution, you need to pass the state probability distribution to the .get_state method :

>>> fp_state = market.get_state(lea.pmf({ 'BULL': 0.6250,
                                          'BEAR': 0.3125,
                                          'STAG': 0.0625 }))
>>> fp_state
BULL : 0.625
BEAR : 0.3125
STAG : 0.0625
In the present case, this distribution is a bit special: it is actually the "fixed point"; you can verify that it does not change on next step (if you ignore rounding errors!):
>>> fp_state.next_state()
BULL : 0.6250000000000001
BEAR : 0.3125
STAG : 0.0625
Note that you can also reach this distribution, by starting from any state and making a large number of transitions (e.g. bear_state.next_state(1000)).

Another way to define a non-trivial probability distribution is by specifying a condition on the state:

>>> market.state_given(market.state != 'STAG')
BULL : 0.5
BEAR : 0.5
>>> market.state_given(market.state[0] == 'B')
BULL : 0.5
BEAR : 0.5

Definition from a sequence of states

There exists another way to define a Markov chain: the empirical way. Instead of setting a probability matrix, which could be hard to define, it is possible to define a Markov chain by providing a sequence of states supposed to be long enough to be representative of all transitions. Typically, this could be a sample sequence of observed states for a given system.

This sort of definition is done through the markov.chain_from_seq method. Here is an example, using a very short sequence:

>>> market2 = markov.chain_from_seq(('BEAR','BEAR','BULL','BULL','BULL','BEAR','STAG','BULL','STAG','BEAR'))
>>> market2
BEAR
  -> BEAR : 0.3333333333333333
  -> BULL : 0.3333333333333333
  -> STAG : 0.3333333333333333
BULL
  -> BEAR : 0.25
  -> BULL : 0.5
  -> STAG : 0.25
STAG
  -> BEAR : 0.5
  -> BULL : 0.5
You can verify that the transition probabilities are exactly the frequencies of transition in the given sequence. In particular, there is no transition from "STAG" to "STAG".

Generating random sequence of states

Starting from a given state of a given Markov chain, it is possible to generate a random sequence of states, obeying the transition probabilities of that Markov chain. This is done by calling the random_seq(n) method where n is the size of the sample sequence. Example:

>>> bear_state.random_seq(40)
('BEAR', 'BEAR', 'BEAR', 'BULL', 'BULL', 'BULL', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'STAG', 'BULL', 'BULL', 'BULL', 'BULL', 'BULL', 'BULL', 'BULL', 'BULL', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BULL', 'BULL', 'BULL', 'BULL')
Now, to check the likelihood of the generated sequence, we can use the method markov.chain_from_seq seen above, which can be seen also as a transition frequency calculator. Let's use a random sequence of 100,000 states this time:
>>> state_seq_sample = bear_state.random_seq(100000)
>>> markov.chain_from_seq(state_seq_sample)
BEAR
  -> BEAR : 0.8024547965846309
  -> BULL : 0.14929683576092417
  -> STAG : 0.04824836765444501
BULL
  -> BEAR : 0.0768908443327048
  -> BULL : 0.8988863640026431
  -> STAG : 0.02422279166465213
STAG
  -> BEAR : 0.2497538562520512
  -> BULL : 0.2490974729241877
  -> STAG : 0.501148670823761

We see that the frequencies of transition are close to the probabilities defined for the market Markov chain, as expected.

Matrix extraction / absorbing Markov chains

For advanced treatments on Markov chains, you may desire to obtain characteristic data as matrices.

The probability transition matrix can be retrieved by using the matrix method. If you have installed NumPy package, then you can obtain this matrix as a numpy array using the as_array optional argument:

print (market.matrix())
# -> ((0.9, 0.075, 0.025), (0.15, 0.8, 0.05), (0.25, 0.25, 0.5))
>>> market.matrix(as_array=True)
# -> array([[ 0.9  ,  0.075,  0.025],
            [ 0.15 ,  0.8  ,  0.05 ],
            [ 0.25 ,  0.25 ,  0.5  ]])
If your Markov chain is absorbing, than you may want to get several information on this absorbing MC. The method absorbing_mc_info() serves this purpose, it returns a 6-tuple:
(is_absorbing,transient_states,absorbing_states,q_matrix,r_matrix,n_matrix) = market.absorbing_mc_info()
print (is_absorbing,transient_states,absorbing_states,q_matrix,r_matrix,n_matrix)
# -> (False, (), (), (), (), None)
Since the market MC is not absorbing, the returned is_absorbing is False and the other elements give no interesting data. Consider now the following MC:
dw = markov.chain_from_matrix(
                ( 'S0', 'S1', 'S2', 'S3', 'S4'),
        ( 'S0', ( 1.0 , 0.0 , 0.0 , 0.0 , 0.0 )),
        ( 'S1', ( 0.5 , 0.0 , 0.5 , 0.0 , 0.0 )),
        ( 'S2', ( 0.0 , 0.5 , 0.0 , 0.5 , 0.0 )),
        ( 'S3', ( 0.0 , 0.0 , 0.5 , 0.0 , 0.5 )),
        ( 'S4', ( 0.0 , 0.0 , 0.0 , 0.0 , 1.0 )))
This MC is absorbing because 'S0' and 'S4' are absorbing and because the three other states can reach these two absorbing states. These information, as well as the Q and R matrices, can be retrieved as follows:
(is_absorbing,transient_states,absorbing_states,q_matrix,r_matrix,n_matrix) = dw.absorbing_mc_info()
print (is_absorbing)
# -> True
print (transient_states)
# -> ('S1', 'S2', 'S3')
print (absorbing_states)
# -> ('S0', 'S4')
print (q_matrix)
# -> ((0.0, 0.5, 0.0), (0.5, 0.0, 0.5), (0.0, 0.5, 0.0))
print (r_matrix)
# -> ((0.5, 0.0), (0.0, 0.0), (0.0, 0.5))
print (n_matrix)
# -> None
The last argument n_matrix is supposed to contain the fundamental matrix N = inv(I-Q). To be actually calculated, you need NumPy package and you have to put the as_array optional argument: then, the 3 matrices Q, R and calculated N are returned as numpy arrays:
(is_absorbing,transient_states,absorbing_states,q_matrix,r_matrix,n_matrix) = dw.absorbing_mc_info(as_array=True)
print (q_matrix)
# -> array([[ 0. ,  0.5,  0. ],
            [ 0.5,  0. ,  0.5],
            [ 0. ,  0.5,  0. ]])
print (r_matrix)
# -> array([[ 0.5,  0. ],
            [ 0. ,  0. ],
            [ 0. ,  0.5]])
print (n_matrix)
# -> array([[ 1.5,  1. ,  0.5],
            [ 1. ,  2. ,  1. ],
            [ 0.5,  1. ,  1.5]])

What's next?

Thank you for reading the present tutorial page! We hope that you found it enough clear and informative.

To learn more, you are invited to jump to

  • tutorial [3/3] - plotting, drawing without replacement, machine learning, information theory, MC estimation, symbolic calculation, …
  • Examples page

For enhancements or bugs, please create issues on bitbucket Lea page.

Feel free also to send an e-mail to pie.denis@skynet.be (English or French spoken). Use cases of Lea in your projects/researches are especially appreciated! A "references" page listing the projects using Lea could greatly improve the reputation of the library.

Updated