# Wiki

Clone wiki# Lea / Lea3_Tutorial_4

**Table of Content**

# Introduction

The present page is a special tutorial dedicated on Machine Learning using Lea library. It assumes that you have installed Lea 3 and that you are familiar at least with the techniques presented in the first part of Lea tutorial.

So far, most of the examples shown demonstrate various ways to calculate probability distributions from known probability distributions. However, apart trivial cases of uniform distributions found in gambling, it could be difficult to determine prior probability distributions for a given phenomenon. If no quantitative data are available for the phenomenon at hand, you have to make assumptions with some theoretical model, assuming this is not too complex. On the other hand, if quantitative data are available, then it is possible, with dedicated algorithms, to calculate prior probability distributions that tend to fit the given data. This last approach belongs to the field of *Machine Learning* (ML).

In the present page, we shall present a couple of techniques to "learn" probabilities from observed data. There are four sections:

**Frequentist inference**: basic techniques using observed frequencies and Markov chains**Bayesian learning**: basic Bayesian reasoning to update probability distributions from observed data**Maximum-likelihood algorithm**: learning parameters of a BN where dependencies are defined, without hidden variables**EM algorithm**: learning parameters of a probabilistic model, with possibly hidden variables

These sections are mostly independent: you may jump directly to any one of them.

# Frequentist inference

Actually, we have already presented a very basic ML approach. Counting occurrences of a given phenomenon and derive probabilities as calculated frequencies is indeed a way to "learn" from the data.

For instance, frequencies of English letters can be learned from text data:

lea.set_prob_type('r') text_data = "to be or not to be: that is the question" freq1 = lea.vals(*text_data) print (freq1) # -> : 9/40 # : : 1/40 # a : 1/40 # b : 2/40 # e : 4/40 # h : 2/40 # i : 2/40 # n : 2/40 # o : 5/40 # q : 1/40 # r : 1/40 # s : 2/40 # t : 7/40 # u : 1/40

print (''.join(freq1.random(100))) # -> 'oshheeo a ttt ubt ee o ontettisboe t: hb nis t bhoo nee bs o ttahsbhtne ttoa o oth ete'

Frequencies of group of two letters can be easily calculated also, using the `pairwise`

function (see implementation in Python’s "Itertools Recipes":

```
freq2 = lea.vals(*pairwise(text_data))
```

*from*a 't':

print (freq2.given(freq2[0] == 't')) # -> ('t', ' ') : 2/7 # ('t', 'h') : 2/7 # ('t', 'i') : 1/7 # ('t', 'o') : 2/7

*to*a 't':

print (freq2.given(freq2[1] == 't')) # -> (' ', 't') : 3/6 # ('a', 't') : 1/6 # ('o', 't') : 1/6 # ('s', 't') : 1/6

`('t', 'h')`

-> `'th'`

). Here is the trick to do that:
```
freq2s = freq2.map(lambda v: ''.join(v))
```

print (''.join(freq2s.random(50))) # -> 't b ns thuest: ue t nhais tt tht tth bhaquioio t t tate: bnoe:uebeo : t: r e heorbes titht e:uet '

A better approach to study transitions between letters is to build a Markov chain from the given data, as already explained here:

mc = markov.chain_from_seq(text_data) print (mc) # -> # -> b : 2/9 # -> i : 1/9 # -> n : 1/9 # -> o : 1/9 # -> q : 1/9 # -> t : 3/9 # : # -> : 1 # a # -> t : 1 # b # -> e : 1 # e # -> : 2/4 # -> : : 1/4 # -> s : 1/4 # h # -> a : 1/2 # -> e : 1/2 # i # -> o : 1/2 # -> s : 1/2 # n # -> o : 1 # o # -> : 2/5 # -> n : 1/5 # -> r : 1/5 # -> t : 1/5 # q # -> u : 1 # r # -> : 11 # s # -> : 1/2 # -> t : 1/2 # t # -> : 2/7 # -> h : 2/7 # -> i : 1/7 # -> o : 2/7 # u # -> e : 1

`freq2`

captures probabilities of two consecutive letters (e.g. P(`'es'`

) = 1/39), while `mc`

captures conditional probabilities of transitions from one letter to another (e.g. P(`'s'`

|`'e'`

) = 1/4). The two approaches are then very different. Let's now generate a random string from this Markov chain (we arbitrarily start from an `'a'`

):
print (''.join(mc.get_state('a').random_seq(100))) # -> 'thato be be tistis nonononor is nor the: or t tiot to que be that thest to t t no tonor que no be qu'

Should you like that kind of linguistic experience, then you are kindly invited to have a look at the *Bullshit Generator* on the Examples page.

# Bayesian learning

Beside frequentist approach seen above, Bayesian reasonning can be used also for machine learning. Let's consider the "surprise candy" example given in pp.712-715 of the book Artificial Intelligence: a Modern Approach (2nd edition), from S. Russel and P. Norvig. We'll merely provide a summary of the case, focusing on translation in Lea. You're invited to read the explanations given in the book to get a thorough understanding.

There are five bags of candies. A bag is chosen randomly and then a sample of several candies are drawn randomly from this bag. The candy's flavor is either cherry or lime. The distribution of flavors in a bag depend of the bag in a known way (see below). Although the chosen bag is unknown, the flavor of drawn candies can be tasted. In such situation, the chosen bag is an unknown variable while the flavor is an observed variable. Following AIMA book, we'll interpret the chosen bag as an hypothesis that can be any of the five identifiers h1, h2, h3, h4 and h5. Here is the model in Lea, based on data given by the candy manufacturer:

hyp_0 = pmf({'h1': 0.1, 'h2': 0.2, 'h3': 0.4,'h4': 0.2, 'h5': 0.1}) flavor_by_hyp = { 'h1': 'cherry', 'h2': pmf({'cherry': 0.75, 'lime': 0.25}), 'h3': pmf({'cherry': 0.50, 'lime': 0.50}), 'h4': pmf({'cherry': 0.25, 'lime': 0.75}), 'h5': 'lime' } flavor = hyp_0.switch(flavor_by_hyp)

`hyp_0`

is the a priori probability distribution of the hypothesis, considering that there is no evidence yet. `flavor`

is defined as a CPT depending of `hyp_0`

.
Let's now draw 10 candies and let's assume that they all have lime flavor. On the basis of these observations, what is the evolution of the probability of hypothesis h5 (bag with lime candies only)? For sure, we expect that such probability, initially 0.1 shall grow as new lime candies are drawn. Let's calculate this with Lea.

A first approach consists in adding observations one by one in a list of evidences and evaluating the conditional probability of h5, given these evidences:

*WARNING: requires Lea 3.2 at least ( clone method)*

P = lea.P evidences = [] for n in range(11): print ("%2d: %f" % (n,P((hyp_0=='h5').given(*evidences)))) flavor_x = flavor.clone(shared=(hyp_0,)) evidences.append(flavor_x=='lime') # -> 0: 0.100000 # 1: 0.200000 # 2: 0.307692 # 3: 0.421053 # 4: 0.528926 # 5: 0.624390 # 6: 0.704749 # 7: 0.770214 # 8: 0.822449 # 9: 0.863566 # 10: 0.895628

`clone`

method to create 10 new events, independent one from each other. The argument `shared=(hyp_0,)`

is meant to express that all these events share the dependency to the same hypothesis; in other words: all drawn candies come from the same bag. Also, based on this sample of 10 lime candies, we can easily get the probability that next drawn candy has also lime flavor:
print (P((flavor=='lime').given(*evidences))) # -> 0.9796631050967436

`n=10`

in `clone`

method.flavors = flavor.clone(n=10,shared=(hyp_0,)) evidences = [flavor_x=='lime' for flavor_x in flavors] print (hyp_0.given(*evidences)) # -> h2 : 1.7082745402179077e-06 # h3 : 0.003498546258366275 # h4 : 0.10087190332532722 # h5 : 0.8956278421417664 print (P((flavor=='lime').given(*evidences))) # -> 0.9730314698335799

`flavor_x`

) depending on the same initial hypotheses distribution (`hyp_0`

). Instead of that, we could build these events one by one, based on hypothesis distributions updated from latest observations. Here is how to proceed:
hyp_x = hyp_0 for n in range(11): print ("%2d: %f"%(n,P((hyp_x=='h5')))) flavor_x = hyp_x.switch(flavor_by_hyp) hyp_x = hyp_x.given(flavor_x=='lime').calc() # -> 0: 0.100000 # 1: 0.200000 # 2: 0.307692 # 3: 0.421053 # 4: 0.528926 # 5: 0.624390 # 6: 0.704749 # 7: 0.770214 # 8: 0.822449 # 9: 0.863566 # 10: 0.895628

`calc()`

to calculate the new hypothesis pmf while getting rid of any dependency with previous hypotheses distribution. To get the probability of lime flavor for the next draw, we simply can use `flavor_x`

, without calling `.given(...)`

since this variable is based on up-to-date hypotheses distribution:
print (P((flavor_x=='lime'))) # -> 0.9730314698335799

# Maximum-likelihood algorithm for BN

If you reconsider the examples of data mining on a joint table, now interpreting the frequencies as probabilities, then you get another example of ML through frequentist approach. Admittedly, such simple statistical approach could be qualified as "Machine Learning for the layman"! A more difficult goal is to calculate the probabilities involved in a Bayesian network. Lea provides a method for this purpose; it can calculate the CPT entries provided that you define the dependency graph of the BN. This method, named `Lea.build_bn_from_joint`

, makes the following assumptions:

- you study a set of given
*N*discrete random variables, which are possibly interdependent and for which the pmf are unknown; - you have got a joint table with
*S*samples given the values of each of the*N*variables (i.e. a table of*S*rows and*N*columns); we assume that the these data are rich enough to discover all values of the domain of each random variable; - you have defined the dependencies of the BN, by means of a direct acyclic graph linking the random variables.

Note that the second point means that *there shall be no hidden variable*. For dealing with probabilistic model having hidden variables, a special algorithm is required (see EM Algorithm section) below.

Let's illustrate these assumptions on an example reusing the `patient`

joint table:

```
patient = lea.read_csv_file('patients.csv',col_names=('given_name','surname','gender','title','birthday','blood_type','weight{f}','height{f}','smoker{b}') )
```

- we shall focus on the five random variables
`gender`

,`blood_type`

,`smoker`

,`height`

and`weight`

(*N*= 5); - these 5 data are present in the patient joint table, which contains fifty records (
*S*= 50); - we assume a BN with the following dependencies between the five random variables:

The `build_bn_from_joint`

method allows "learning" the different CPT of this BN. It uses a maximum-likelihood algorithm, calculating each CPT from the observed data (here, the `patient`

joint table). Each argument of `Lea.build_bn_from_joint`

is a tuple giving the names of antecedent nodes and the name of the dependent node:

bn = patient.build_bn_from_joint( ( ('gender','blood_type') , 'smoker' ), ( ('gender','height','smoker'), 'weight' ))

`bn`

is a full-fledged BN on which you can make queries, such a calculating the gender probability of a person given that its weight is between 110 and 120 (not included):
print (bn.gender.given(110 <= bn.weight, bn.weight < 120)) female : 0.21234352561454817 male : 0.7876564743854518 print (P((bn.gender=='male').given(110 <= bn.weight, bn.weight < 120))) 0.7876564743854518

*overfitting*. It is blatant if we redo the query seen before on the

`patient`

joint table instead of the built-up BN:
print (P((patient.gender=='male').given(110 <= patient.weight, patient.weight < 120))) 1.0

print (patient.given(110 <= patient.weight, patient.weight < 120)) given_name, surname , gender, title, birthday , blood_type, weight, height, smoker ('Jimmie' , 'Martinez', 'male', 'Mr.', '03-12-1930', 'B+' , 114.0, 171.0, False ) : 1.0

## Improving the Bayesian network

Of course, the merits of BN are directly linked to the quality of the dependency graph designed by the human expert who models the phenomenon. In this regard, the BN graph given above could for sure be improved. To illustrate some advanced techniques, let us revisit the BN by including the patient's `age`

: it seems sensible to add `age`

as `height`

's influencer, as shown in the following graph:

Now, the problem is that `age`

is not an attribute of the `patient`

joint table. Only `birthday`

is available, which is a string, hence not adequate for a BN variable. As seen previously, assuming the year of the study is 2015, the age could be approximated with the following definition:

```
age = 2015 – patient.birthday[-4:].map(int)
```

`map`

method, to append it to the patient tuples and then to use the `as_Joint`

method to redefine the column names. This could be done through the following one-liner:
```
patient2 = (patient + age.map(lambda x:(x,))).as_joint('givenName','surname','gender','title','birthday','blood_type','weight','height','smoker','age')
```

`patient2`

joint table contains a new column `age`

, which is in line with `birthday`

. Now that we have all the data needed, we can build the new BN, as sketched above:
bn2 = patient2.build_bn_from_joint( ( ('gender','age') , 'height'), ( ('gender','blood_type') , 'smoker'), ( ('gender','height','smoker'), 'weight'))

print (P((bn2.gender=='male').given(110 <= bn2.weight, bn2.weight < 120))) 0.8397314844204974

`age`

to `height`

, even if these two variables are absent from the query.
# The EM Algorithm

*WARNING: requires Lea 3.2 at least*

So far, we have assumed that all variables of our probabilistic model are observable. In many cases however, models have *hidden* variables: these cannot be observed directly but they influence other variables, which are observable. The maximum-likelihood algorithm used in the previous section is unable to learn parameters of such models.

For machine learning on such models presenting hidden variable, Lea provides an implementation of the Expectation-Maximization algorithm (EM): the `learn_by_em`

method. This method is powerful but also quite hard to master; in particular, it provides many optional arguments. Before giving examples of usage, let's sketch the general process of using EM algorithm (without detailing how EM algorithm works internally):

- You get some observation data
*obs_x*, by means of a Lea instance representing a frequency table. If there are several variables involved, you shall use a joint probability table. - You define a probabilistic model
*x0*that is capable of producing the observed data*obs_x*, without considering yet the fitting of probabilities; this model*x0*is a Lea instance that uses techniques seen so far: CPT, BN, arithmetic, functions, etc. This model involves several random variables*v1, ..., vn*, observed or hidden, with arbitrary parameters (or best guesses). - You call
`x0.learn_by_em(obs_x, ...)`

with some extra arguments (see later). - You obtain a new model
*x1*, which has similar variables and structure as*x0*but with new parameters that better matches observed data*obs_x*.

We shall demonstrate this process on three use cases. Note that the last two ones can be found in literature, so you're able to cross-check the calculated results.

## Case 1 - Learning Bernoulli probability distributions

Let's start with a basic example. Ron is an experimenter in some Las-Vegas laboratory. He patiently has repeated 10,000 times the following experience: throwing two coins, keeping track of the number of heads obtained (0, 1 or 2). Eventually, he gets the following counts:

0: 5,558 1: 3,850 2: 592

The probabilistic model is:

```
X = A + B
```

where *X* is the random variable giving the observed count while *A* and *B* are random variables giving 0 (Tail) or 1 (Head) for each coin. *A* and *B* are hidden variables, with unknown a priori parameters.

So, EM algorithm is the right tool here! Let's first define the observed frequency distribution:

obs_x = lea.pmf({0: 5558, 1: 3850, 2: 592}) print(obs_x) # -> 0 : 0.5558 # 1 : 0.385 # 2 : 0.0592

a0 = lea.bernoulli(0.5) b0 = lea.bernoulli(0.4) x0 = a0 + b0

`0`

" suffix to recall that this model is just a first guess, setting initial values on parameters (here probabilities 0.5 and 0.4). Now, we are almost ready to execute EM algorithm by calling `learn_by_em`

. Because this algorithm is iterative, hopefully converging to the solution, you need to define a halting condition. Several conditions are expressible by means of optional arguments; we shall use here the most simple one, by fixing the number of algorithm steps to 100. Here is the call:
```
model_dict = x0.learn_by_em(obs_x,nb_steps=100)
```

`model_dict`

is a dictionary giving access to the variables of the new probabilistic model. Since the probabilistic model is made up of several variables, the dictionary is meant to provide a look-up table to retrieve each of the variables of the new model (remember that, by design, each Lea instance is immutable, so the internal parameters of `a0`

, `b0`

, `x0`

cannot be altered). Here is how you proceed to get variables of the optimized model:
a1 = model_dict[a0] b1 = model_dict[b0] x1 = model_dict[x0]

`x1`

could be defined also as `a1 + b1`

, which is equivalent. Here are the probability distributions of these variables:
print (a1) # -> 0 : 0.6845898600927648 1 : 0.31541013990723527 print (b1) # -> 0 : 0.8120101399072353 1 : 0.1879898600927648 print (x1) # -> 0 : 0.5558939080730004 1 : 0.38481218385399896 2 : 0.059293908073000515

`x1`

is close to `obs_x`

, as expected. Here is how you may interpret these results: given the observations, the most likely hypothesis is that one coin has 31.5% chances to show Head and the other coin has 18.8% such chances. Since you have now access to all variables of the new model, whether observed or hidden, you may make query to have explanations on new observed results, to make forecasts,etc. For instance:
print (lea.joint(a1,b1).given(x1==1)) # -> (0, 1) : 0.33443835049825915 (1, 0) : 0.6655616495017408

## Case 2 - Learning binomial probability distributions

We shall now see a more elaborate example, which will introduce other arguments of `learn_by_em`

method. This example is detailed in a course of Stefanos Zafeiriou on Advanced Statistical Machine Learning

We have two coins, A and B, possibly biased. We want to determine the bias of each coin. We choose randomly 5 times one of the coin and, for each choice, we toss the chosen coin 10 times and note the number of heads obtained. Each value can then go form 0 to 10 heads. The observed results are: 5, 9, 8, 4, 7. We assume that the two coins cannot be distinguished, so the choice made for each experiment is a hidden random variable. Let's first build the frequency table of observed number of heads:

```
obs_nb_heads = lea.vals(5,9,8,4,7)
```

*n*= 10 and an arbitrary bias probability :

ha0 = 0.6 hb0 = 0.5 nb_heads_A0 = lea.binom(10, ha0) nb_heads_B0 = lea.binom(10, hb0)

coin_type0 = lea.vals("A","B") nb_heads0 = coin_type0.switch( { "A" : nb_heads_A0, "B" : nb_heads_B0})

```
model_dict = nb_heads0.learn_by_em(obs_nb_heads, fixed_vars=(coin_type0,), nb_steps=1)
```

`fixed_vars`

argument. By default, `learn_by_em`

method shall optimize all variables of the given model. Here, because we control the process of selecting the coin (50-50 probabilities), we want that EM algorithm keeps this variable unchanged in the new model. The `fixed_vars=(coin_type0,)`

expresses this constraint. The argument is any iterable so, in more complex model, several variables could be declared as fixed.
Now, we can display the new bias probability for each of the coins:

nb_heads_A1 = model_dict[nb_heads_A0] nb_heads_B1 = model_dict[nb_heads_B0] print (nb_heads_A1.p) # -> 0.7130122354005161 print (nb_heads_B1.p) # -> 0.5813393083136627

## Case 3 - Learning Bayesian network

The last example shows how to learn a simple Bayesian network. It's taken from the AIMA book - pp 727-730. We'll merely provide a summary of the case, focusing on translation in Lea. You're invited to read the explanations given in the book to get a thorough understanding.

Two bags containing candies have been mixed together. Candies are described by three features: wrapper (red/green), hole (true/false) and flavor (cherry/lime). A sample of 1,000 candies is drawn and the occurrences of the eight combinations of features are counted. Here are the raw data obtained, expressed as a Lea JPD:

obs_candy = lea.pmf({ ('red' ,True,'cherry'): 273, ('red' ,False,'cherry'): 93, ('green',True,'cherry'): 104, ('green',False,'cherry'): 90, ('red' ,True,'lime' ): 79, ('red' ,False,'lime' ): 100, ('green',True,'lime' ): 94, ('green',False,'lime' ): 167})

- the three candy features are independent given the origin bag,
- the conditional probability distribution for each feature depends of the bag.

Such kind of assumptions define actually a so-called *naive* Bayes model. Here is the initial model, using arbitrary parameters values presented in the AIMA book (refer to Bayesian Reasoning section of the present tutorial).

Note: `w`

stands for "wrapper", `h`

for "hole", `f`

for flavor, `wb1`

for "wrapper if candy comes from bag 1", etc.

bag0 = lea.pmf({1: 0.6, 2: 0.4}) wb1_0 = lea.pmf({'red': 0.6, 'green': 0.4}) wb2_0 = lea.pmf({'red': 0.4, 'green': 0.6}) w0 = bag0.switch({1: wb1_0, 2: wb2_0}) hb1_0 = lea.event(0.6) hb2_0 = lea.event(0.4) h0 = bag0.switch({1: hb1_0, 2: hb2_0}) fb1_0 = lea.pmf({'cherry': 0.6, 'lime': 0.4}) fb2_0 = lea.pmf({'cherry': 0.4, 'lime': 0.6}) f0 = bag0.switch({1: fb1_0, 2: fb2_0}) candy0 = lea.joint(w0,h0,f0)

`candy0`

is a probability distribution comparable to `obs_candy`

, associating a probability for each of the eight types of candy (you may display the two distributions to convince yourself). To evaluate the degree of likelihood of the model with the given data, there are several possible indicators, which are closely interlinked: the cross-entropy, the log-likelihood and Kullback-Leibler divergence (see here for more details).
print (obs_candy.cross_entropy(candy0)) # -> 2.949244290266771 print (-1000 * obs_candy.cross_entropy(candy0) * math.log(2)) # -> -2044.260364580929 print (obs_candy.kl_divergence(candy0)) # -> 0.09363125084952317

print (obs_candy.kl_divergence(obs_candy)) # -> 0.0 print (obs_candy.cross_entropy(obs_candy)) # -> 2.8556130394172476

We shall now execute one step of the EM algorithm:

```
model_dict = candy0.learn_by_em(obs_candy,nb_steps=1)
```

(candy1,bag1,w1,h1,f1,wb1_1,wb2_1,hb1_1,hb2_1,fb1_1,fb2_1) = \ tuple(model_dict[k] for k in (candy0,bag0,w0,h0,f0,wb1_0,wb2_0,hb1_0,hb2_0,fb1_0,fb2_0))

P = lea.P print (P(bag1==1)) # -> 0.6124306106264868 print (P(fb1_1=='cherry')) # -> 0.6684082742546373 print (P(wb1_1=='red')) # -> 0.6483118060276456 print (P(hb1_1)) # -> 0.6558479816127674 print (P(fb2_1=='cherry')) # -> 0.3886950739168244 print (P(wb2_1=='red')) # -> 0.38174842702951567 print (P(hb2_1)) # -> 0.38274080515627396

`candy0`

model to `candy1`

model:
print (-1000 * obs_candy.cross_entropy(candy1) * math.log(2)) # -> -2021.0262390280009 print (obs_candy.kl_divergence(candy1)) # -> 0.060111493134922256

model_dict = candy0.learn_by_em(obs_candy,nb_steps=10) candy10 = model_dict[candy0] print (-1000 * obs_candy.cross_entropy(candy10) * math.log(2)) # -> -1982.0177851139413 print (obs_candy.kl_divergence(candy10)) # -> 0.0038341901203629014

`max_kld`

argument allows specifying a goal by means of a maximum KL divergence acceptable for the delivered model. For instance: model_dict = candy0.learn_by_em(obs_candy,max_kld=0.0001) candyx = model_dict[candy0] print (obs_candy.kl_divergence(candyx)) # -> 9.809727643439459e-05

`nb_steps`

argument as a safeguard. For instance,
```
model_dict = candy0.learn_by_em(obs_candy,max_kld=0.0001,nb_steps=10000)
```

`max_delta_kld`

, which specifies the maximum difference of KL divergence between two consecutive iterations. This allows you to specify a "stall" condition that detects the moment at which subsequent iterations wouldn't bring much added value.
In any case, whatever the arguments specified, the execution shall stop as soon as any of specified halting conditions is verified.

## Controlling / monitoring the EM algorithm

`learn_by_em`

is meant to be a high-level method: it performs the job until specified halting condition is met. You cannot however monitor what's happening inside, like the evolution of log-likelihood or the evolution of some parameters of the model. You cannot control the algorithm execution (pause, step, resume, etc.). For such needs, Lea provides an alternative method, `gen_em_steps`

, which is a generator. This method actually generates an infinite sequence of steps, that you can control.

Here is an example of usage, revisiting the candy use case to show the evolution of KL divergence on the first 10 steps.

candy_gen_steps = candy0.gen_em_steps(obs_candy) for _ in range(10): model_dict = next(candy_gen_steps) candyx = model_dict[candy0] print (obs_candy.kl_divergence(candyx)) # -> 0.060111493134922256 0.034141267160715305 0.016746236124274727 0.009082121028439083 0.006414953854867456 0.005401348869327194 0.004855251952482931 0.004458178493857989 0.0041265818693028145 0.0038341901203629014

`gen_em_steps`

is that it provides no argument for halting, you have to program the halting condition yourself.
# What's next?

Thank you for reading the present tutorial pages! We hope that you found it enough clear and informative.

You should have now comprehensive information on what Lea can do. You should be able to use it in your projects, researches or as a toolkit for teaching probability theory and probabilistic programming. You can find more examples in the Examples page.

For enhancements or bugs, please create issues on bitbucket Lea page.

Feel free also to send an e-mail to pie.denis@skynet.be (English or French spoken). Use cases of Lea in *your* projects/researches are especially appreciated! A "references" page listing the projects using Lea could greatly improve the reputation of the library.

Updated