Lea Advanced Tutorial - Part 2/3

Table of Content

Introduction
References
Reminder on conditional probabilities
Introducing CPT with a beetle
CPT with non-boolean probability distributions
Bayesian networks
Advanced features for Bayesian networks
Prior probabilities
Markov chains
What's next?

Introduction

WARNING: Lea 2 is no longer maintained. If you are new to Lea, you are invited to use Lea 3, which has a comprehensive tutorial.

The present page is the second part of advanced tutorial for the Lea 2 package. It covers conditional probabilities and Bayes inference. It assumes that you have installed the latest version of Lea and that you are familiar with the techniques presented in the tutorial and in the section about conditional probabilities of advanced tutorial: part 1.

The present tutorial uses the Python language. If you prefer a more convenient yet less standard syntax, then you can jump to the present tutorial translated in Leapp.

Note: In the present wiki page, we promote the switch method introduced in Lea 2.3. Should you need it, you can still access the previous version of the present wiki page, which presents (only) the Lea.buildCPT method.

Note: In the present wiki page, we shall use the convenience functions introduced in Lea 2.2.

References

In some sections below, we shall use some examples found in the excellent "Artificial Intelligence: A Modern Approach" book of Stuart Russell and Peter Norvig (second edition). The reader is invited to refer to this book (simply named the "AIMA book" in the following) for more details on presented methods and results.

The examples below are reproduced with the kind permissions of Stuart Russell and Peter Norvig.

Reminder on conditional probabilities

In the first part of advanced tutorial, you have seen how the given method allows you to calculate conditional probabilities from dependent probability distributions. For example, this is how to calculate the sum of two dice if we know already that one of the die result is 1 or 2:

>>> dice.given(die1 <= 2)
2 : 1/12
3 : 2/12
4 : 2/12
5 : 2/12
6 : 2/12
7 : 2/12
8 : 1/12

OK, we have here a simple syntax to calculate conditional probabilities. However, there is a whole class of probability problems where conditional probabilities P(A|B) are given. The general formula

P(A | B) . P(B) = P(A and B)

allows you, when applied with care and patience, answering most problems. In the first part of the advanced tutorial, we have seen how to define dependant random variables through joint distributions. Basically, this boils down to define P(A and B and ...) for each conjunction of 'atomic' events A, B, ... . This method allows modeling very accurately the dependences and to answer any query, including conditional probabilities. However, it requires the knowledge of the probability of each atomic combination, which could be difficult to know. Also, it has the drawback that combinatoric explosion entails the need to enter huge buckets of probability values.

As alternative to joint distributions, Lea provides a more effective approach using conditional probability tables (CPT). The idea is to reduce the degrees of freedom of joint distributions by modeling only causal relationships. In the following, we shall see that this approach is general and powerful when dealing with problems where conditional probabilities are given. We will see that the CPT technique is the cornerstone for Bayes inference.

Introducing CPT with a beetle

We shall start with the "beetle" example (Wikipedia) :

"An entomologist spots what might be a rare subspecies of beetle, due to the pattern on its back. In the rare subspecies, 98% have the pattern, or P(Pattern|Rare) = 98%. In the common subspecies, 5% have the pattern. The rare subspecies accounts for only 0.1% of the population. How likely is the beetle having the pattern to be rare, or what is P(Rare|Pattern)?"

To solve this, we need to define two boolean probability distributions, namely rare and pattern. The first one is unconditional and easy to define:

rare = Lea.boolProb(1,1000)

For pattern, we cannot (without manual calculations) set it so easily. We need to set two conditional probabilities * P(Pattern | Rare) = 98% * P(Pattern | ~Rare) = 5%

Lea provides a special construct to set these probabilities (requires Lea 2.3+):

patternIfRare    = Lea.boolProb(98,100)
patternIfNotRare = Lea.boolProb( 5,100)
pattern = rare.switch({ True  : patternIfRare,
                        False : patternIfNotRare })

This switch method defines a special Lea instance, which represents a CPT. The idea is that pattern probability shall depend on the value of rare, according to the given dictionary: the keys of this dictionary shall be the possible values of rare, namely True and False. Let us check that our definition is in line with what we need:

What is the probability of having pattern for a rare beetle?

>>> pattern.given(rare)
False :  1/50
 True : 49/50
>>> P(pattern.given(rare))
49/50
>>> Pf(pattern.given(rare))
0.98

What is the probability of having pattern for a common beetle?
```
>>> pattern.given(~rare)
False : 19/20
 True :  1/20
>>> Pf(pattern.given(~rare))
0.05
```
OK, pattern gives back the data we put in it; this does not bring any new information, it is just a sanity test. Now, let us come back to the initial question:
What is the probability to be rare for a beetle having the pattern?
```
>>> Pf(rare.given(pattern))
0.019242096995876694
```

This result is really what Bayes inference is about. It can be checked by manual calculations using the Bayes theorem.

Once the CPT is defined, other calculations can be done easily:

What is the probability to be rare for a beetle NOT having the pattern?
```
>>> Pf(rare.given(~pattern))
2.107326119253585e-05
```
What is the probability for any beetle to have the pattern?
```
>>> Pf(pattern)
0.05093
```
What is the probability for a beetle to be rare AND to have the pattern?
```
>>> Pf(rare & pattern)
0.00098
```

It is even possible to build a joint distribution giving all possible conjunctions (AND), by using the cartesian product:

>>> Lea.cprod(rare,pattern)
(False, False) : 94905/100000
 (False, True) :  4995/100000
 (True, False) :     2/100000
  (True, True) :    98/100000

This first example shows you the general process: you set up the input probabilities, conditional or not, in a declarative manner. Then, you type queries is a natural way and Lea apply conditional probability rules behind the scene.

CPT with non-boolean probability distributions

The previous example use boolean probability distributions, which is common with conditional probabilities. However, depending on the problem at hand, other types of distribution can be handled. To illustrate this point we shall re-model the previous problem with 2 variables, kind and aspect, which refer to string probability distributions:

kind = Lea.fromValFreqs(('rare',1),('common',999))
aspectIfRare   = Lea.fromValFreqs(('pattern',49),('no pattern', 1)) 
aspectIfCommon = Lea.fromValFreqs(('pattern', 1),('no pattern',19)) 
aspect = kind.switch({ 'rare'   : aspectIfRare,
                       'common' : aspectIfCommon })

Note: an alternative construction, more general but less efficient, shall be presented later (see section on buildCPT).

Now, aspect is a new CPT that gives probability distribution of 'pattern' vs 'no pattern', according to the value of kind. Now, the question "what is the probability to be rare for a beetle having the pattern?" can be restated in one of the following manners:

>>> kind.given(aspect == 'pattern')
common : 4995/5093
  rare :   98/5093
>>> (kind == 'rare').given(aspect == 'pattern')
False : 4995/5093
 True :   98/5093
>>> Pf((kind == 'rare').given(aspect == 'pattern'))
0.019242096995876694

In the present example, using booleans or string attributes to model the problem is a matter of taste. However, in many situations, models go beyond binary choices and cannot be represented by boolean distributions. For example, imagine that the entomologist wants split the kind 'common' into 'common_A' and 'common_B' as follows:

kind2 = Lea.fromValFreqs(('rare',1),('common_A',342),('common_B',657))

Also, he wants to split the aspect 'pattern' into 'stripped' and 'mottled', with given conditional probabilities:

aspect2IfRare     = Lea.fromValFreqs(('stripped',39),('mottled',10),('no pattern', 1))
aspect2IfCommon_A = Lea.fromValFreqs(('stripped', 1),('mottled', 4),('no pattern',95))
aspect2IfCommon_B = Lea.fromValFreqs(('stripped', 3),('mottled', 2),('no pattern',95))
aspect2 = kind2.switch( { 'rare'     : aspect2IfRare,
                          'common_A' : aspect2IfCommon_A,
                          'common_B' : aspect2IfCommon_B } )

Here are some examples of queries.

>>> aspect2
   mottled :  2702/100000
no pattern : 94907/100000
  stripped :  2391/100000
>>> kind2.given(aspect2 == 'stripped')
common_A : 114/797
common_B : 657/797
    rare :  26/797
>>> kind2.given(aspect2 != 'no pattern')
common_A : 1710/5093
common_B : 3285/5093
    rare :   98/5093
>>> Pf((kind2[:3] == 'com').given(aspect2 != 'no pattern'))
0.9807579030041234
>>> Pf((kind2 == 'rare').given(aspect2 != 'no pattern'))
0.019242096995876694

Note the consistency of the last result with the first beetle model:

>>> Pf((kind == 'rare').given(aspect == 'pattern'))
0.019242096995876694

This consistency is due to the fact that the entomologist has carefully made the model refinement so that (see above): P(common) = P(common_A) + P(common_B) and P(pattern) = P(stripped) + P(mottled).

Bayesian networks

The technique to build CPT can be used to define Bayesian networks (BN), which model causality chains between uncertain events. There is no new syntax here but you shall see how multiple CPT can be connected together to define complex networks.

We shall use the well-known "burglary Bayesian network" of J. Pearl, explained in AIMA book. You can find also good descriptions of this network on the Internet with the following keywords: burglar, Bayes (note that that the classical, yet simpler, example of "rain-sprinkler-grass" is covered here, in Lea examples page).

Here is a basic sketch showing the dependency graph of this BN:

      burglary    earthquake
            \      /
             \    /
             alarm
             /    \
            /      \
     johnCalls    maryCalls

and here is how to model this Bayesian network in Lea:

burglary   = Lea.boolProb(1,1000)
earthquake = Lea.boolProb(2,1000)
alarm = Lea.cprod(burglary,earthquake).switch({ (True ,True ) : Lea.boolProb(950,1000),
                                                (True ,False) : Lea.boolProb(940,1000),
                                                (False,True ) : Lea.boolProb(290,1000),
                                                (False,False) : Lea.boolProb(  1,1000) })
johnCalls = alarm.switch( { True  : Lea.boolProb(90,100),
                            False : Lea.boolProb( 5,100) })
maryCalls = alarm.switch( { True  : Lea.boolProb(70,100),
                            False : Lea.boolProb( 1,100) })

We have not done more than building three CPT, using the syntax explained in the previous sections. The sole new technique used here is for the alarm CPT, which depends on two variables, viz. burglary and earthquake: the Lea.cprod method is used to join these two variables so as to produce the four combinations as tuples containing booleans.

Now, we are ready to query the network. Let us first make "forward" derivations (i.e from causes to effects).

What is the probability that Mary calls if the alarm is triggered? (this is a given!)
```
>>> Pf(maryCalls.given(alarm))
0.7
```
What is the probability that Mary calls if there is a burglary?
```
>>> Pf(maryCalls.given(burglary))
0.6586138
```

Now, we can also query the Bayes network in a "backward" manner (i.e. from effects to cause)

What is the probability that there is a burglary if alarm is triggered?
```
>>> Pf(burglary.given(alarm))
0.373551228281836
```
What is the probability that the alarm is triggered if Mary calls?
```
>>> Pf(alarm.given(maryCalls))
0.1500901177497596
```
What is the probability that there is a burglary if Mary calls?
```
>>> Pf(burglary.given(maryCalls))
0.05611745403891493
```
What is the probability that there is a burglary if Mary OR John calls (or both)?
```
>>> Pf(burglary.given(maryCalls | johnCalls))
0.014814211524985795
```
What is the probability that there is a burglary if Mary AND John call?
```
>>> Pf(burglary.given(maryCalls & johnCalls))
0.28417183536439294
```

It is also possible to get unconditional probabilities of events or conjunction of events

What is the probability that the alarm is triggered?
```
>>> Pf(alarm)
0.002516442
```
What is the probability that Mary calls?
```
>>> Pf(maryCalls)
0.01173634498
```
What is the probability that there is a burglary and Mary calls?
```
>>> Pf(burglary & maryCalls)
0.0006586138
```
What is the probability that there is a burglary, no earthquake, the alarm triggers and both John and Mary call?
```
>>> Pf(burglary & ~earthquake & alarm & johnCalls & maryCalls)
0.0005910156
```
What is the probability that there is neither burglary nor earthquake, the alarm triggers and both John and Mary call?
```
>>> Pf(~burglary & ~earthquake & alarm & johnCalls & maryCalls)
0.00062811126
```

Note that the last result can be checked in the AIMA book.

As an academic exercise, you can very easily "flatten" the network to build a joint table giving the probabilities of each conjunction of events. This boils down to calculate the cartesian product between the 5 variables.

>>> Lea.cprod(burglary,earthquake,alarm,johnCalls,maryCalls)
(False, False, False, False, False) : 936742700619/1000000000000
 (False, False, False, False, True) :   9462047481/1000000000000
 (False, False, False, True, False) :  49302247401/1000000000000
  (False, False, False, True, True) :    498002499/1000000000000
 (False, False, True, False, False) :     29910060/1000000000000
  (False, False, True, False, True) :     69790140/1000000000000
  (False, False, True, True, False) :    269190540/1000000000000
   (False, False, True, True, True) :    628111260/1000000000000
 (False, True, False, False, False) :   1334174490/1000000000000
  (False, True, False, False, True) :     13476510/1000000000000
  (False, True, False, True, False) :     70219710/1000000000000
   (False, True, False, True, True) :       709290/1000000000000
  (False, True, True, False, False) :     17382600/1000000000000
   (False, True, True, False, True) :     40559400/1000000000000
   (False, True, True, True, False) :    156443400/1000000000000
    (False, True, True, True, True) :    365034600/1000000000000
 (True, False, False, False, False) :     56317140/1000000000000
  (True, False, False, False, True) :       568860/1000000000000
  (True, False, False, True, False) :      2964060/1000000000000
   (True, False, False, True, True) :        29940/1000000000000
  (True, False, True, False, False) :     28143600/1000000000000
   (True, False, True, False, True) :     65668400/1000000000000
   (True, False, True, True, False) :    253292400/1000000000000
    (True, False, True, True, True) :    591015600/1000000000000
  (True, True, False, False, False) :        94050/1000000000000
   (True, True, False, False, True) :          950/1000000000000
   (True, True, False, True, False) :         4950/1000000000000
    (True, True, False, True, True) :           50/1000000000000
   (True, True, True, False, False) :        57000/1000000000000
    (True, True, True, False, True) :       133000/1000000000000
    (True, True, True, True, False) :       513000/1000000000000
     (True, True, True, True, True) :      1197000/1000000000000

We see here the interest of using Bayesian networks, defined with only 10 probability values for the causal dependencies, instead of 32 for the joint table.

Advanced features for Bayesian networks

One of the drawbacks of Bayesian networks using the simple CPT technique presented is that it requires specifying probability data for each combination of influencing variables. Since the number of cases grows exponentially with the number of influencing variables, this could become difficult to specify.

The present section presents several ways to reduce the number of cases to specify in CPT, by means of factorization.

Note that the special topic of calculating CPT from data shall be covered later, in a specific section dedicated to Machine Learning.

Specifying a default entry

One of the simplest ways to reduce the enumeration of cases in CPT is to have a default entry that groups all the cases that have not been explicitly given.

Let us revisit the burglary network. Consider a new model of alarm device for which the two cases "burglary without earthquake" and "earthquake without burglary" cannot be distinguished, these being associated to a probability equal to 0.9. Instead of writing this:

alarm2 = Lea.cprod(burglary,earthquake).switch({ (True ,True ) : Lea.boolProb(950,1000),
                                                 (True ,False) : Lea.boolProb(900,1000),
                                                 (False,True ) : Lea.boolProb(900,1000),
                                                 (False,False) : Lea.boolProb(  1,1000) })

you could put the common 0.9 probability as a second argument of the switch method, just after the dictionary:

alarm2 = Lea.cprod(burglary,earthquake).switch({ (True ,True ) : Lea.boolProb(950,1000),
                                                 (False,False) : Lea.boolProb(  1,1000)},
                                                                 Lea.boolProb(900,1000))

The if_ method

For variables depending on one Boolean variable, the switch method can be abbreviated using the if_ convenience method: it mimics the if-then-else constructions found in programming languages (the underscore is put to avoid a clash with Python’s if keyword). For instance, the definition

johnCalls = alarm.switch( { True  : Lea.boolProb(90,100),
                            False : Lea.boolProb( 5,100) })

can be rewritten as follows :

johnCalls = Lea.if_( alarm, Lea.boolProb(90,100),
                            Lea.boolProb( 5,100) )

This is no more than syntactic sugar.

The buildCPT method

All the examples seen above use the switch method to build up CPT. This allows specifying a set of clauses, each of which is triggered when the decision variables are equal to a specific combination of values. Although the switch method works fine and is efficient, it is limited in the way the conditions are expressed since it requires an enumeration of valued. This could be a burden when the decision variable has a large domain, for example if it joins multiple variables or if the domain covers a big range of values.

Consider for example a toy world where the temperatures are distributed uniformly from -10°C to 29°C, as integer numbers:

temp = V(*range(-10,30))

To measure the temperature in this world, in place of a precise thermometer, we've got a device that indicates COLD, MEDIUM or HOT depending on the temperature: unfortunately, this device is very imprecise and the best that we can do is to model its indicator as three probability distributions, which depend of the range of temperatures:

indIfTempBelow0        = VP( ('COLD',75), ('MEDIUM',25)            )
indIfTempBetween0And19 = VP( ('COLD',25), ('MEDIUM',60), ('HOT',15))
indIfTempAbove20       = VP(              ('MEDIUM',20), ('HOT',80))

So, we see that the indicator roughly follows the temperature value, with some randomness. The indicator shall then be specified as a CPT linking any temperature to one of the three above-specified distributions. If we use the switch technique seen above, there are 40 entries to specify:

ind = temp.switch({ -10: indIfTempBelow0,
                     …
                     -1: indIfTempBelow0,
                      0: indIfTempBetween0And19,
                     …
                     19: indIfTempBetween0And19,
                     20: indIfTempAbove20,
                     …
                     29: indIfTempAbove20 })

There is much redundancy here. Things could be improved a bit by grouping one of the third as a default entry, but it remains unsatisfactory for the modeler point of view.

The Lea.buildCPT method addresses this problem. It allows you specifying Boolean conditions and associate a specific Lea instance for each of these, like you would do in Python with a sequence of if constructs. Here is the syntax to build the indicator variable:

ind = Lea.buildCPT((temp < 0                 , indIfTempBelow0       ),
                   ((0 <= temp) & (temp < 20), indIfTempBetween0And20),
                   (20 <= temp               , indIfTempAbove20      ))

Note that the method expects that the given conditions cover all possible cases and are mutually exclusive (so, it’s not exactly comparable to an if – elif – else construct). If this is not respected, then an exception is raised.

Here are some queries, which check the forward dependency from the temperature to the indicator:

>>> ind.given(temp==-5)
  COLD : 3/4
MEDIUM : 1/4
>>> ind.given(temp<0)
  COLD : 3/4
MEDIUM : 1/4
>>> ind.given(temp<=0)
  COLD : 155/220
   HOT :   3/220
MEDIUM :  62/220
>>> ind
  COLD : 25/80
   HOT : 22/80
MEDIUM : 33/80

And here are queries that infer the temperature from the indicator:

>>> P((temp<0).given(ind=='COLD'))
3/5
>>> P((temp<0).given(ind=='MEDIUM'))
5/33
>>> P((temp<0).given(ind=='HOT'))
0
>>> P((temp<=0).given(ind=='HOT'))
3/220
>>> P((temp>0).given(ind=='HOT'))
217/220
>>> P((temp<=0).given(ind!='COLD'))
13/110
>>> P((temp<=0).given(ind.isAnyOf('MEDIUM','HOT')))
13/110

Note that, similarly to the switch method, the Lea.buildCPT method accepts a default clause: it is represented by condition set to None. This special value is a placeholder for any case not covered by the other condition(s). To illustrate this, here is a definition of indicator that is exactly equivalent to the previous one.

ind = Lea.buildCPT((temp < 0  , indIfTempBelow0       ),
                   (None      , indIfTempBetween0And20),
                   (20 <= temp, indIfTempAbove20      ))

As you see, None can be specified at any place of Lea.buildCPT’s arguments, not specially the last one. Of course, no more than one None is authorized (an exception is raised, otherwise).

Context-specific independence

TO BE WRITTEN

Prior probabilities

The syntax to build CPT seen so far misses an important class of problems of conditional probability: those mixing a conditional probability and a prior, unconditional, probability.

We shall take here an example of the AIMA book, referred in the introduction. It models a causal a probability dependency between the meningitis and the stiff neck. The assumptions are the following:

the probability that a patient has meningitis is 1/50,000 (prior probability)
the probability that a patient has stiff neck is 1/20 (prior probability)
the meningitis causes stiff neck 50 % of the times (conditional probability)

OK, the probability of meningitis is easily tackled:

meningitis = Lea.boolProb(1,50000)

For stiff neck however, should we try the if_ syntax, we are stuck on the “else” argument:

stiffneck = Lea.if_( meningitis, Lea.boolProb(1/2),
                                 ?? WHAT HERE ??  )

The problem is that the probability of stiff neck if NO meningitis is unknown, so we cannot fill in the second part of the CPT definition. By careful calculations, we could try to find this value such that the unconditional probability of stiff neck is equal to 1/20. Fortunately, this is not needed: Lea provides a special method,revisedWithCPT, to define CPT taking into account prior probability. Here is how to define stiffneck from the given data:

stiffneck = Lea.boolProb(1,20).revisedWithCPT((meningitis,Lea.boolProb(1,2)))

This says: "Without any information, the probability of stiff neck for a given patient is 1/20; however, if the patient has a meningitis, then this probability becomes 1/2". Let us check these statements:

>>> P(stiffneck)
1/20
>>> P(stiffneck.given(meningitis))
1/2

Note that stiffneck object is a true CPT. Behind the scene, Lea has calculated the probability of stiff neck if no meningitis; then, it was able to fill in the missing data. You can get this value:

>>> P(stiffneck.given(~meningitis))
4999/99998

In the first statement, you could replace the "?? WHAT HERE ??" by Lea.boolProb(4999,99998) and get exactly the same CPT. Of course, this is not needed since Lea has made the work for you!

Now, we are able to compare the probability of meningitis, prior and after stiff neck information:

>>> P(meningitis)
1/50000
>>> P(meningitis.given(stiffneck))
1/5000

This result is in line with the Bayes rule, since

P(meningitis | stiffneck) = P(stiffneck | meningitis) x P(meningitis) / P(stiffneck)

= (1/2) . (1/50000) / (1/20) = 1/5000 = 0.0002

We can check that also using Lea:

>>> P(stiffneck.given(meningitis)) * P(meningitis) / P(stiffneck)
1/5000

Note that some problems have no solutions. In such cases, Lea raises an exception with an error message giving the possible range for the prior probability. Example:

stiffneck2 = Lea.boolProb(1,200000).revisedWithCPT((meningitis,Lea.boolProb(1,2)))
Lea Error: prior probability of 'False' is 199999/200000, outside the range [ 1/100000 , 99999/100000 ]

Prior probabilities work also with non-boolean distributions. Let us revisit the beetle example modeled with attribute. Imagine that the problem is restated as follows: the prior probability of a beetle with pattern is 5.093 %, in the rare subspecies, 98 % have the pattern; the rare subspecies accounts for only 0.1 % of the population. Note that, considering the initial problem, we have removed one data (conditional probability of pattern for common beetle) and added one data (prior probability pattern).

Here are the statements to model these data:

aspect0 = Lea.fromValFreqs(( 'pattern',5093),( 'no pattern',94907))
aspect1 = aspect0.revisedWithCPT(( kind == 'rare', Lea.fromValFreqs(('pattern',98),('no pattern',2))))

Now, what is probability of pattern in common subspecies?

>>> aspect1.given(kind == 'common')
no pattern : 19/20
   pattern :  1/20

This is striking: we get back the same probability of 5% as the initial problem! Why is it so? The answer lies in the given value 5.093 % for prior probability of pattern. Actually, this value has been chosen on purpose, based on unconditional probability of pattern, as calculated in the initial problem (i.e. aspect, not aspect1):

>>> aspect
no pattern : 94907/100000
   pattern :  5093/100000

This shows the consistency between the two methods to define CPT.

Markov chains

Lea allows defining Markov chains. We shall show here how to model the stock market Markov chain example given on Wikipedia.

To execute the examples shown in this section, you have to import the markov module present in the Lea package :

from lea import markov

Definition from transition probability matrix

The states defining the Markov chain can be any Python object. We shall use here the most simple option : strings. The three market states of the example shall be named "BULL", "BEAR" and "STAG". Here is how you define these states in Lea, along with the probability transition matrix (note that we omit the prompts, so you can copy-paste easily the multline statement):

market = markov.Chain.fromMatrix(
                  ( 'BULL', 'BEAR', 'STAG'  ),
        ( 'BULL', (   900 ,    75 ,    25  )), 
        ( 'BEAR', (   150 ,   800 ,    50  )),
        ( 'STAG', (   250 ,   250 ,   500  )))

Note that the rows of the matrix represent from state and columns represent to states. For instance, the first row indicates the probabilities of transition starting from "BULL" : 90 % to stay "BULL", 7.5 % to become "BEAR" and 2.5 % to become "STAG".

We have here a Markov chain, named market, which captures the set of states and all probabilities of transition from state to state. These probabilities can be displayed by using the same methods as shown for simple distributions:

>>> print (market.asFloat())
BULL
  -> BEAR : 0.075000
  -> BULL : 0.900000
  -> STAG : 0.025000
BEAR
  -> BEAR : 0.800000
  -> BULL : 0.150000
  -> STAG : 0.050000
STAG
  -> BEAR : 0.250000
  -> BULL : 0.250000
  -> STAG : 0.500000

Take care however that market is a markov.Chain instance, NOT a Lea instance. Apart displaying itself, the operations you can do on markov.Chain are minimal: getting its inner states or given probability distributions of these states. We shall see this in the following.

Querying the Markov chain

To make queries, we need to be able to define an initial state. Here is how to proceed:

>>> (bullState,bearState,stagState) = market.getStates()

Each of these 3 variables is Lea instance representing a certain distribution, i.e. having a probability 1. For instance, the bearState object represents the state BEAR:

>>> bearState
BEAR : 1

To get the "next state" distribution (i.e. the state at next week, in the example), use the nextState() method:

>>> bearState.nextState()
BEAR : 16/20
BULL :  3/20
STAG :  1/20

The object returned by nextState(…) is a Lea instance, on which you can apply usual techniques seen in the tutorials. For instance, using the decimal format display, we can check (again) that the given probability transition matrix has been set up correctly:

>>> print (bearState.nextState().asFloat())
BEAR : 0.800000
BULL : 0.150000
STAG : 0.050000

To get the state probability distribution after n steps, instead of chaining n times the nextState() method, you can simply pass n as argument:

>>> print ((bearState.nextState(3)).asFloat())
BEAR : 0.568250
BULL : 0.357500
STAG : 0.074250

Note that you can check these results with Wikipedia's.

In the example above, we started from a certain state, which was "BEAR". To define an initial distribution mixing multiple states, you need to pass the state probability distribution to the .getState method :

>>> fpState = market.getState(Lea.fromValFreqs(('BULL',10),('BEAR',5),('STAG',1)))
BEAR :  5/16
BULL : 10/16
STAG :  1/16

In the present case, this distribution is a bit special: it is actually the "fixed point"; you can verify that it does not change on next step:

>>> fpState.nextState()
BEAR :  5/16
BULL : 10/16
STAG :  1/16

Definition from a sequence of states

There exists another way to define a Markov chain: the empirical way. Instead of setting a probability matrix, which could be hard to define, it is possible to define a Markov chain by providing a sequence of states supposed to be long enough to be representative of all transitions. Typically, this could be a sample sequence of observed states for a given system.

This sort of definition is done through the markov.Chain.fromSeq method. Here is an example, using a very short sequence:

>>> market2 = markov.Chain.fromSeq(('BEAR','BEAR','BULL','BULL','BULL','BEAR','STAG','BULL','STAG','BEAR'))
>>> market2
BEAR
  -> BEAR : 1/3
  -> BULL : 1/3
  -> STAG : 1/3
BULL
  -> BEAR : 1/4
  -> BULL : 2/4
  -> STAG : 1/4
STAG
  -> BEAR : 1/2
  -> BULL : 1/2

You can verify that the transition probabilities are exactly the frequencies of transition in the given sequence. In particular, there is no transition from "STAG" to "STAG".

Generating random sequence of states

Starting from a given state of a given Markov chain, it is possible to generate a random sequence of states, obeying the transition probabilities of that Markov chain. This is done by calling the randomSeq(n) method where n is the size of the sequence. Example:

>>> bearState.randomSeq(40)
('BEAR', 'BEAR', 'BEAR', 'BULL', 'BULL', 'BULL', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'STAG', 'BULL', 'BULL', 'BULL', 'BULL', 'BULL', 'BULL', 'BULL', 'BULL', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BEAR', 'BULL', 'BULL', 'BULL', 'BULL')

Now, to check the likelihood of the generated sequence, we can use the method markov.Chain.fromSeq seen above, which can be seen also as a transition frequency calculator. Let's use a random sequence of 10,000 states this time:

>>> stateSeqSample = bearState.randomSeq(10000)
>>> print (markov.Chain.fromSeq(stateSeqSample).asFloat())
BEAR
  -> BEAR : 0.797604
  -> BULL : 0.148640
  -> STAG : 0.053756
BULL
  -> BEAR : 0.073104
  -> BULL : 0.898136
  -> STAG : 0.028760
STAG
  -> BEAR : 0.247453
  -> BULL : 0.254731
  -> STAG : 0.497817

We see that the frequencies of transition are close to the probabilities defined for the market Markov chain, as expected.

What's next?

Thank you for reading the present advanced tutorial!

We hope that you enjoyed it and that you are now willing to experiment these techniques to solve your own probability problems.

You can find more examples in the Examples page of the Wiki.

To learn more, you are invited to follow Lea advanced tutorial: Part 3, which presents useful methods introduced in Lea 2.2: convenience functions, conditional with multiple evidences, Monte-Carlo sampling, building joint probability tables from files, data mining and more.

Please send your comments, critics, suggestions, bug reports,… by E-mail to pie.denis@skynet.be .