Wiki

Clone wiki

Lea / Lea3_Examples

leas3_examples.png

Table of Content

Introduction

The following page provides several examples using Lea 3.

Unless otherwise specified, all the examples just require Lea 3 module on standard Python 2.6+ or 3 installation. These examples can be executed in a Python console or by saving them in a .py files to be executed. It's assumed that the following import has been done:

import lea

Most of the examples presented here just require basic knowledge of the Python language. If some Lea constructs are unclear, then you should refer to the tutorial pages (1, 2 or 3).

Sicherman dice

I discovered the strange Sicherman dice in the Scientific American's column of Martin Gardner.

Here is how you can verify that they result in the same probability distribution as two normal dice.

lea.set_prob_type('r')
die_s1 = lea.vals(1,2,2,3,3,4)
die_s2 = lea.vals(1,3,4,5,6,8)
print (die_s1 + die_s2)
# ->  2 : 1/36
#     3 : 2/36
#     4 : 3/36
#     5 : 4/36
#     6 : 5/36
#     7 : 6/36
#     8 : 5/36
#     9 : 4/36
#    10 : 3/36
#    11 : 2/36
#    12 : 1/36

Nontransitive dice

Nontransitive dice are an intriguing mathematical curiosity. It is another counterintuitive fact in the domain of probabilities.

In the following example, we consider the 3 special dice A, B and C (see face labels below).

lea.set_prob_type('r')
die_a = lea.vals(3,3,5,5,7,7)
die_b = lea.vals(2,2,4,4,9,9)
die_c = lea.vals(1,1,6,6,8,8)
The calculated probabilities associated to inequalities show that

  • die A beats die B, with probability 5/9
  • die B beats die C, with probability 5/9
  • die C beats die A, with probability 5/9

print (P(die_a > die_b))
# -> 5/9
print (P(die_b > die_c))
# -> 5/9
print (P(die_c > die_a))
# -> 5/9
This forms a strange cycle where, on the average, each die beats another die while it is itself beaten by the remaining die.

Other examples (Grime's dice), with thorough explanation and video, can be found here.

Name generator

For the fun, let us generate random (hype?) company/product names...

prefix = lea.vals('Alpha', 'Binary', 'Bit', 'Blue', 'Brain', 'Cool', 'Cosmo', 'Cross', 'Cyber', 'Deep', 'Delta', 'Dual', 
                  'Easy', 'Eye', 'Face', 'Fast', 'Free', 'Good', 'High', 'Hot', 'Inner', 'Key', 'Life', 'Light', 'Lotus',
                  'Lucky', 'Magic', 'Master', 'Mech', 'Metal', 'Micro', 'Mini', 'Mister', 'Nano', 'Net', 'Plane', 'Pop',
                  'Power', 'Purple', 'Py', 'Red', 'Small', 'Soft', 'Source', 'Space', 'Spicy', 'Star', 'Steel', 'Sunny',
                  'Tele', 'Thin', 'Tiny', 'Token', 'Top', 'Turbo', 'Uber', 'Watch', 'Well', 'Wild', 'X-', 'X-', 'Zero')
suffix = lea.vals('-Z', 'X', 'ball', 'beat', 'bird', 'bit', 'boo', 'book', 'bot', 'box', 'cat', 'cell', 'cloud', 'club',
                  'core', 'data', 'doh', 'draw', 'drive', 'finger', 'fish', 'flow', 'fly', 'foo', 'food', 'force', 'forge',
                  'frame', 'free', 'fun', 'head', 'house', 'hub', 'lea', 'line', 'lith', 'look', 'love', 'magic', 'mango',
                  'mount', 'mountain', 'node', 'note', 'one', 'phone', 'pix', 'plane', 'planet', 'ray', 'ring', 'robot',
                  'room', 'scape', 'side', 'sky', 'soft', 'source', 'space', 'star', 'strike', 'sun', 'tag', 'talk',
                  'tals', 'token', 'vibe', 'view', 'vision', 'ware', 'word', 'work', 'world', 'zen')
name = prefix + suffix
Here is a sample of 100 names.
print (name.random_mc(100))
# -> 'Crossframe', 'Magicmountain', 'Bittag', 'Hotfoo', 'Spacebit', 'Thinfinger', 'Facemango', 'Eyeside', 'Powerball', 'Powerbeat', 'Innermount', 'Mistertag', 'Hotfinger', 'Tokenfly', 'Starside', 'Dualsoft', 'Turbolove', 'Lifepix', 'X-planet', 'Alphabox', 'Spacevibe', 'Lightlith', 'Binarystrike', 'Magicscape', 'Highforce', 'Thinscape', 'Wildpix', 'Innerrobot', 'Hotcat', 'Spicydrive', 'Dualbeat', 'Deepmount', 'Watchnote', 'Tokenmango', 'Microdraw', 'Bitworld', 'Crossdrive', 'Coolfish', 'Nettag', 'Softfood', 'X-star', 'Uberfood', 'Masterscape', 'Ubersoft', 'Keyplane', 'Keystrike', 'Goodfood', 'Bluefun', 'Eyevision', 'Popview', 'Cosmosource', 'Spacespace', 'Lotusray', 'Coolsource', 'Purplemango', 'Dualdrive', 'Hotcell', 'Watchcat', 'Steelbeat', 'Coolcat', 'Lifehub', 'Blueware', 'Topsky', 'Planepix', 'Thincell', 'Fastlove', 'CyberX', 'Steelbird', 'Lighthub', 'Starphone', 'Cyberfun', 'Mechware', 'Luckydoh', 'Keylove', 'Uberhouse', 'Mechcore', 'Keystrike', 'Masterrobot', 'Microlook', 'Easyhub', 'Brainvibe', 'Sunnyboo', 'Crosssky', 'Cyberplane', 'Dualwork', 'Lifemount', 'Deepfoo', 'Keyring', 'Innerclub', 'Metaltalk', 'Innerbox', 'Starwork', 'Thinview', 'Tokenbeat', 'Starsky', 'Lotusdraw', 'Softbox', 'Zerobeat', 'Tinytag', 'Facebox')

By the way, do you know why we preferred random_mc over random?

The answer is here.

A slight variation consists in capitalizing the first character of suffix.

cap_name = prefix + suffix.title()
print (cap_name.random_mc(100))
# -> ('FreeBall', 'EasyMango', 'SpicyBot', 'LuckyFinger', 'X-Box', 'SoftLea', 'MechStrike', 'SunnyLine', 'InnerX', 'PowerSpace', 'PowerHead', 'WatchStar', 'GoodRing', 'SpicyTals', 'TokenFrame', 'SmallMagic', 'ThinSky', 'NanoRing', 'X-Planet', 'CosmoView', 'MiniToken', 'UberCat', 'WatchSide', 'BrainPhone', 'InnerCore', 'NetSoft', 'LuckyMango', 'TokenLove', 'CosmoVibe', 'BitCell', 'FreeHouse', 'HighLove', 'WildView', 'NetWord', 'SteelCell', 'TurboFly', 'CyberHub', 'AlphaBall', 'X-Ware', 'MasterPix', 'MechClub', 'MechWork', 'FastPlanet', 'ThinTals', 'BlueFly', 'SpaceBird', 'SmallFlow', 'BinaryStrike', 'BrainHead', 'NetCell', 'WildRoom', 'DualBoo', 'TeleLine', 'DeltaSoft', 'LotusFrame', 'FastSpace', 'AlphaNote', 'UberNote', 'SmallClub', 'PowerZen', 'MisterSpace', 'SunnyRing', 'BitLook', 'WellBoo', 'NetCell', 'MicroLine', 'StarTals', 'NetFrame', 'WellPlane', 'TeleBird', 'SunnyMount', 'MechHouse', 'MetalFly', 'HotPlanet', 'EyeView', 'BlueMagic', 'Face-Z', 'MetalBook', 'PopVision', 'EyeLea', 'RedBird', 'LotusOne', 'FreeSource', 'FastLith', 'FaceCloud', 'FreeClub', 'MasterLook', 'InnerBook', 'AlphaPlane', 'MasterToken', 'SpaceClub', 'KeyWare', 'BinaryBot', 'SoftFrame', 'HighTals', 'DualMount', 'MetalRoom', 'PyClub', 'EyeFlow', 'InnerBox')

If you enjoy such kind of random verbalization, you should have a look at a far more elaborated example, namely the bullshit generator.

The Monty Hall problem

The Monty Hall Problem is one of the greatest examples of counter-intuitive probability result. Let's start by defining door_with_prize as the unknown door with a prize and door_chosen_1st as the initial door chosen by the player.

lea.set_prob_type('r')
doors = ('Door 1', 'Door 2', 'Door 3')
door_with_prize = lea.vals(*doors)
door_chosen_1st = lea.vals(*doors)
So far, without much surprise, the probability of win is 1/3:
print (P(door_chosen_1st==door_with_prize))
# -> 1/3
Now we shall define the door opened by Monty Hall, which is neither the door with the prize, nor the door chosen by the player. This can be done by a Conditional Probability Table (CPT), using the switch method:
door_open_empty = lea.joint( door_with_prize, door_chosen_1st).switch(
                         { ( 'Door 1'       , 'Door 1') : lea.vals('Door 2','Door 3'),
                           ( 'Door 1'       , 'Door 2') : 'Door 3',
                           ( 'Door 1'       , 'Door 3') : 'Door 2',
                           ( 'Door 2'       , 'Door 2') : lea.vals('Door 1','Door 3'),
                           ( 'Door 2'       , 'Door 1') : 'Door 3',
                           ( 'Door 2'       , 'Door 3') : 'Door 1',
                           ( 'Door 3'       , 'Door 3') : lea.vals('Door 1','Door 2'),
                           ( 'Door 3'       , 'Door 1') : 'Door 2',
                           ( 'Door 3'       , 'Door 2') : 'Door 1' })
Below, we'll provide an equivalent definition that is much shorter.

Now, we are ready to evaluate probability of winning by changing the initial choice. Suppose for example that player has chosen Door 1 and Monty Hall has opened Door 2:

print (door_with_prize.given(door_chosen_1st=='Door 1', door_open_empty=='Door 2'))
# -> Door 1 : 1/3
#    Door 3 : 2/3
The player has more chance to win by changing his initial choice and opening Door 3.

As of Lea 3.1, there is a better approach for creating door_open_empty, which avoids the burden of creating a CPT. It uses the switch_func method, to which we pass a function that expresses the way Monty Hall selects the door to open:

def selected_door(eliminated_doors):
    return lea.vals(*(d for d in doors if d not in eliminated_doors))

door_open_empty = lea.joint(door_with_prize,door_chosen_1st).switch_func(selected_door)
This alternative definition gives the same results as the previous one. You could have used the selected_door function to create the dictionary passed to switch method. Note however that switch_funcis called on-the-fly, without expanding the CPT; so it's more memory-efficient than the switch method (this my be interesting for larger problems).

Job scheduling - 1 (integration with datetime)

Suppose you have to do 2 tasks T1 and T2, which can last uncertain durations. After careful evaluations, you estimate the following probability distributions for the duration of each task:

  • T1 : 2 days (20%), 3 days (60%), 4 days (20%)
  • T2 : 3 days (20%), 4 days (80%)

To handle time calculation, we shall use the handy datetime built-in Python module. For convenience, we define the constant DAY representing the duration of one day:

import datetime
lea.set_prob_type('f')
DAY = datetime.timedelta(1)
t1 = lea.pmf({2: 0.2, 3: 0.6, 4: 0.2}) * DAY
t2 = lea.pmf({3: 0.2, 4: 0.8}) * DAY
print (t1)
# -> 2 days, 0:00:00 : 0.2
#    3 days, 0:00:00 : 0.6
#    4 days, 0:00:00 : 0.2
print (t2)
# -> 3 days, 0:00:00 : 0.2
#    4 days, 0:00:00 : 0.8
Now assuming that T1 could start on 28th September 2020 and that T1 shall be completed before starting T2, what is the distribution of the end time?
start_time = datetime.datetime(2020,9,28)
end_time = start_time + t1 + t2
print (end_time)
# -> 2020-10-03 00:00:00 : 0.04000000000000001
#    2020-10-04 00:00:00 : 0.28
#    2020-10-05 00:00:00 : 0.52
#    2020-10-06 00:00:00 : 0.16000000000000003
For an "executive summary" point of view, the mean time of completion is
print (end_time.mean)
# -> 2020-10-04 19:12:00
Note (for the curious): How can Lea calculate a mean time, although it is impossible (and meaningless) to add dates together? Actually, the implementation of Lea's mean method works by calculating the mean of the differences between each values and the first value; then this mean difference is added to the first value. With numbers, this provides the same results as the simple algorithm, with the advantage of avoiding any risk of overflow; in the case of datetime objects, it sums actually timedelta objects (i.e. durations), thus avoiding a raised exception.

Here is how to produce figures on the overall duration of the tasks:

print (end_time - start_time)
# -> 5 days, 0:00:00 : 0.04000000000000001
#    6 days, 0:00:00 : 0.28
#    7 days, 0:00:00 : 0.52
#    8 days, 0:00:00 : 0.16000000000000003
print ((end_time - start_time).mean)
# -> 6 days, 19:12:00
Assume now that there is no dependency between T1 and T2 and that you have enough resources to start T1 and T2 in parallel. Then, the distribution of the overall duration is no more T1+T2 but it is max(T1,T2), i.e. it is the duration of the longest task. Unfortunately, you cannot write such simple expression because the Python max(...) function does not know how to handle Lea instances (actually, it produces a wrong result) ; you have to use the lea.max_of(…) method:
end_time_p = start_time + lea.max_of(t1,t2)
print (end_time_p)
# -> 2020-10-01 00:00:00 : 0.16
#    2020-10-02 00:00:00 : 0.8400000000000001
print (end_time_p.mean)
# -> 2020-10-01 20:09:36
We see, as expected, that the mean end time here is earliest than in the sequential scenario.

Job scheduling - 2 (Bayesian reasoning)

We present here another job scheduling problem with tasks having uncertain durations. In contrast to Job scheduling #1, the focus is now on Bayesian reasoning.

There are 3 tasks to schedule: A, B and C. There is only one precedence constraint: task B shall not be started before the end of task A; we assume that there is enough resources to execute two tasks in parallel.

job_scheduling.png

Durations of tasks A and B are characterized by known pmf (see below); duration of task C is conditioned by three possible scenarii, viz. CONSERVATIVE / EVOLUTIVE / DISRUPTIVE, each having a known probability to happen; the duration of task C is then modeled as a CPT giving a specific pmf for each scenario. We want then to calculate the shortest makespan and the total effort spent to complete the 3 tasks.

d_A = lea.pmf({ 3: '0.10', 4: '0.80', 5: '0.10' })
d_B = lea.pmf({ 2: '0.50', 3: '0.50' })
s = lea.pmf({'CONSERVATIVE': '0.60', 'EVOLUTIVE': '0.30', 'DISRUPTIVE': '0.10'})
d_C = s.switch({ 'CONSERVATIVE': lea.pmf({ 2: '0.70', 3: '0.30'} ),
                 'EVOLUTIVE'   : lea.pmf({ 3: '0.50', 4: '0.50'} ),
                 'DISRUPTIVE'  : lea.pmf({ 7: '0.20', 8: '0.70', 9: '0.10' }) })
makespan = lea.max_of(d_A+d_B,d_C)
efforts = d_A + d_B + d_C
The makespan has been defined by evaluating the duration of the critical path; this uses the lea.max_of function on the two possible paths. From this probabilistic job scheduling model, one can now make several queries to calculate probability distributions of makespan and efforts, possibly integrating new information, assumptions or constraints.
print (makespan)
#    5 : 0.045
#    6 : 0.405
#    7 : 0.424
#    8 : 0.116
#    9 : 0.01
print (makespan.given(s == 'CONSERVATIVE'))
# -> 5 : 0.05
#    6 : 0.45
#    7 : 0.45
#    8 : 0.05
print (makespan.given(s != 'CONSERVATIVE'))
# -> 5 : 0.0375
#    6 : 0.3375
#    7 : 0.385
#    8 : 0.215
#    9 : 0.025
print (efforts.given((s == 'DISRUPTIVE') & (efforts <= 14)))
# -> 12 : 0.01834862385321100917431192661
#    13 : 0.2293577981651376146788990826
#    14 : 0.7522935779816513761467889908
print (makespan.given(efforts == 8))
# -> 5 : 0.08029197080291970802919708029
#    6 : 0.9197080291970802919708029197
Although not conventional, backward reasoning may be done also to infer explanations from posterior measures, assuming that some causal variables (namely, the scenario and/or specific task durations) remain uncertain.
print (s.given((makespan <= 7) & (efforts <= 9)))
# -> CONSERVATIVE : 0.8556430446194225721784776903
#    EVOLUTIVE    : 0.1443569553805774278215223097
print (s.given(makespan == 9))
# -> DISRUPTIVE : 1
print (lea.joint(d_A,d_B,d_C).given((makespan == 6) & (s == 'CONSERVATIVE')))
# -> (3, 3, 2) : 0.07777777777777777777777777778
#    (3, 3, 3) : 0.03333333333333333333333333333
#    (4, 2, 2) : 0.6222222222222222222222222222
#    (4, 2, 3) : 0.2666666666666666666666666667
print (lea.joint(d_A,d_B,d_C).given((makespan == 5) & (s == 'CONSERVATIVE')))
# -> (3, 2, 2) : 0.7
#    (3, 2, 3) : 0.3

The 3 coins problem

(taken from Visual Explanations of Probabilistic Reasoning)

Three coins are flipped. Given that two of the coins have come up heads, what is the probability that the third has come up tails? Many people respond that the probability is 50% but...

lea.set_prob_type('r')
flip = lea.vals('Head','Tail')
flip3 = flip.draw(3,replacement=True)
nb_heads = flip3.count('Head')
print (Pf((nb_heads == 2).given(nb_heads >= 2)))
# -> 0.75
... it's in fact 75%!

In order to convince yourself that this result is right, you can make several queries to get clues on intermediate steps.

print (flip3)
# -> ('Head', 'Head', 'Head') : 1/8
#    ('Head', 'Head', 'Tail') : 1/8
#    ('Head', 'Tail', 'Head') : 1/8
#    ('Head', 'Tail', 'Tail') : 1/8
#    ('Tail', 'Head', 'Head') : 1/8
#    ('Tail', 'Head', 'Tail') : 1/8
#    ('Tail', 'Tail', 'Head') : 1/8
#    ('Tail', 'Tail', 'Tail') : 1/8
print (lea.joint(nb_heads,flip3))
# -> (0, ('Tail', 'Tail', 'Tail')) : 1/8
#    (1, ('Head', 'Tail', 'Tail')) : 1/8
#    (1, ('Tail', 'Head', 'Tail')) : 1/8
#    (1, ('Tail', 'Tail', 'Head')) : 1/8
#    (2, ('Head', 'Head', 'Tail')) : 1/8
#    (2, ('Head', 'Tail', 'Head')) : 1/8
#    (2, ('Tail', 'Head', 'Head')) : 1/8
#    (3, ('Head', 'Head', 'Head')) : 1/8
print (lea.joint(nb_heads,flip3).given(nb_heads >= 2))
# -> (2, ('Head', 'Head', 'Tail')) : 1/4
#    (2, ('Head', 'Tail', 'Head')) : 1/4
#    (2, ('Tail', 'Head', 'Head')) : 1/4
#    (3, ('Head', 'Head', 'Head')) : 1/4
print (nb_heads.given(nb_heads >= 2))
# -> 2 : 3/4
#    3 : 1/4
print (P((nb_heads == 2).given(nb_heads >= 2)))
# -> 3/4

Lottery

lottery = lea.interval(1,46)
lottery.random_draw(6)
# -> (13, 24, 25, 10, 1, 26)

Character frequency counter

In the following example, we count the number of occurrences of each letter in a famous Latin sentence. This shows how a frequency analysis can be done.

lea.set_prob_type('r')
freq_latin = lea.vals(*"ALEA JACTA EST")
print (freq_latin)
# ->   : 2/14
#    A : 4/14
#    C : 1/14
#    E : 2/14
#    J : 1/14
#    L : 1/14
#    S : 1/14
#    T : 2/14
From this basic frequency table, we can generate a random message that tends to obey the same frequency.
print (''.join(freq_latin.random(20)))
# -> 'AJ  LTC TAJA ATLEALJ'
You can notice how the 'A' occurs more than the other letters.

The lifetime problem

Suppose we have a device that has 1% to break every time it is switched on. Let u represents the probability that it is not broken after one switch on:

lea.set_prob_type('f')
u = lea.event(0.99)
print (P(u))
# -> 0.99
To determine the probability that the device is OK after two switches on, you may write:
P(u & u.new())
# -> 0.9801
To determine the probability that it is OK after ten switches on, you may write:
import operator
>>> print (P(u.times(10,operator.and_)))
# -> 0.9043820750088044
You can verify the result as follows:
print (0.99 ** 10)
# -> 0.9043820750088044

2D random walk

We can use Python complex numbers to model a body (e.g. a drunk rabbit) that moves randomly in a plane : in a complex plane, the North direction is the imaginary 1j value, the East direction is the value 1, etc. Let us assume that the body moves at each time step East, North, West or South, with equal probabilities:

lea.set_prob_type('r')
b = lea.vals(1,1j,-1,-1j)
print (b)
# ->   1 : 1/4
#    -1j : 1/4
#     1j : 1/4
#     -1 : 1/4
Then, assuming that the body is initially located at the origin of the complex plane, the probability distribution of the body's position after three time steps is given by
print (b.times(3))
# ->       1 : 9/64
#         1j : 9/64
#    (-1+2j) : 3/64
#     (2+1j) : 3/64
#     (1+2j) : 3/64
#         3j : 1/64
#    (-2-1j) : 3/64
#          3 : 1/64
#    (-2+1j) : 3/64
#    (-0-3j) : 1/64
#    (-1-2j) : 3/64
#        -1j : 9/64
#     (1-2j) : 3/64
#     (2-1j) : 3/64
#         -3 : 1/64
#         -1 : 9/64
Here is how to get the probability that, after three time steps, the body is inside a circle of radius 2.5, centered on the origin (remind that abs(c) gives the modulus of the complex number c):
print (P(abs(b.times(3)) <= 2.5))
# -> 15/16
To get the full distribution of the body's distance from the origin:
print (abs(b.times(3)))
# ->                1 : 9/16
#    2.23606797749979 : 6/16
#                 3.0 : 1/16
You can check the consistency of the previous result since 15/16 = 9/16 + 6/16.

RPG combat

Imagine a Role-Playing Game combat situation opposing you, a warrior, against a giant troll. We shall use here a simplified version of the "d20 System" (Open Game License). To know whether you can hit the troll, you have to calculate your "attack roll" (AR) and the "armour class" (AC) of the troll; if your AR is greater or equal to the AC of the troll, then you hit him and causes him some "damage".

Let us first define the dice we need (using RPG standards: "D6", "2D6" and "D20"):

lea.set_prob_type('r')
D6 = lea.interval(1,6)
T2D6 = D6.times(2)
D20 = lea.interval(1,20)
On a given combat round, you hit the troll if your attack roll (D20 + bonus of 8) is greater or equal to the troll's armor class (24):
target_armor_class = 24
attack_roll = D20 + 8
hit0 = attack_roll >= target_armor_class
print (P(hit0))
# -> 1/4
Well, … but we forget exceptions to the the rule! It says that if the D20 throw is 1, then there is a certain miss; also, if the D20 throw is 20, then there is a certain hit. Let's make the correction with a CPT (Conditional Probability Table):
hit = D20.switch( { 1: False,
                   20: True  },
                       hit0   )
Let us see the new odds:
print (P(hit))
# -> 1/4
Actually, that does not change anything compared to hit0 but it could have on a more unbalanced combat.

Note that you could model the same rule using the cpt method:

hit = lea.cpt( ( D20 ==  1, False ),
               ( D20 == 20, True  ),
               ( None     , hit0  ))
or using pure logical expressions (no CPT):
hit = (hit0 & (D20 != 1)) | (D20 == 20)

Now, if you can hit the troll, your magic sword shall cause him a damage of 2D6 + a bonus of 3:

damage_if_hit = T2D6 + 3
print (damage_if_hit)
# ->  5 : 1/36
#     6 : 2/36
#     7 : 3/36
#     8 : 4/36
#     9 : 5/36
#    10 : 6/36
#    11 : 5/36
#    12 : 4/36
#    13 : 3/36
#    14 : 2/36
#    15 : 1/36
print (damage_if_hit.mean_f)
# -> 10.0
If you are lucky to hit the troll, then the damage average is 10 points.

If you want to know the overall damage distribution considering that the hit is uncertain, then we could define a CPT as above or use simply the lea.if_ convenience method:

damage = lea.if_( hit, damage_if_hit, 0)
Consider this as the probabilistic version of an if-then-else! The resulting distribution of damage is as follows:
print (damage)
# ->  0 : 108/144
#     5 :   1/144
#     6 :   2/144
#     7 :   3/144
#     8 :   4/144
#     9 :   5/144
#    10 :   6/144
#    11 :   5/144
#    12 :   4/144
#    13 :   3/144
#    14 :   2/144
#    15 :   1/144
print (damage.mean_f)
# -> 2.5
We see that the damage average has lowered to 2.5 points, since the chance to hit is 1/4. Let us evaluate total damage distribution after 2 rounds of combat (with no reaction from the troll!):
print (damage.times(2))
# ->  0 : 11664/20736
#     5 :   216/20736
#     6 :   432/20736
#     7 :   648/20736
#     8 :   864/20736
#     9 :  1080/20736
#    10 :  1297/20736
#    11 :  1084/20736
#    12 :   874/20736
#    13 :   668/20736
#    14 :   467/20736
#    15 :   272/20736
#    16 :    80/20736
#    17 :   104/20736
#    18 :   125/20736
#    19 :   140/20736
#    20 :   146/20736
#    21 :   140/20736
#    22 :   125/20736
#    23 :   104/20736
#    24 :    80/20736
#    25 :    56/20736
#    26 :    35/20736
#    27 :    20/20736
#    28 :    10/20736
#    29 :     4/20736
#    30 :     1/20736
To convert the same data in floating-point format, do this:
print (damage.times(2).new('f'))
# ->  0 : 0.5624999999999999
#     5 : 0.010416666666666664
#     6 : 0.02083333333333333
 
If current troll's health point is 4, the probability to kill him in the next two rounds is calculated as follows:
health_points = 4
print (P(health_points - damage.times(2) <= 0))
# -> 7/16
To end, let's simulate 20 "real" rounds of combat:
print (damage.random(20))
# -> (0, 0, 0, 0, 11, 0, 9, 7, 0, 0, 0, 10, 0, 0, 13, 0, 0, 9, 0, 0)
You can develop the case to calculate troll's attacks, to evaluate health points evolution after damages, simulate combats until dead, etc.

Bayesian network - the Rain-Sprinkler-Grass example

We refer here to the example of bayesian network on Wikipedia. We show hereafter one possible translation of this model in Lea:

# -- rsg_net.py --------

import lea

rain = lea.event(0.20)

sprinkler = lea.if_(rain, lea.event(0.01),
                          lea.event(0.40))

grassWet = lea.joint(sprinkler,rain).switch({ (False,False): False,
                                              (False,True ): lea.event(0.80),
                                              (True ,False): lea.event(0.90),
                                              (True ,True ): lea.event(0.99)})
Then, the model can be queried in different ways.
from rsg_net import *
print (P(grassWet & sprinkler & rain))
# -> 0.0019799999999999996
print (P(rain.given(grassWet)))
# -> 0.35768767563227616
print (P(grassWet.given(rain)))
# -> 0.8019000000000001
print (P(grassWet.given(sprinkler & ~rain)))
# -> 0.9000000000000001
print (P(grassWet.given(~sprinkler & ~rain)))
# -> 0.0
You can check the first resulting probabilities (floating-point values) on the afore-mentioned Wikipedia page. Note that the results are exact (i.e. no MCMC estimation and the likes).

A murder case

This one is a simple fictional example of using Lea's bayesian reasoning for forensic science.

Here are the facts, which are certain:

  • Dr. Black has been murdered last night.
  • There are three suspects: Colonel Mustard, Mrs. White and Mrs. Peacock. The killer is one of these and only one.

Let us assume that we have the following respective prior probabilities for each: 60%, 20%, 10%. Let us define then the killer variable.

killer = lea.pmf({ "Colonel Mustard": 0.60,
                   "Mrs. White"     : 0.30,
                   "Mrs. Peacock"   : 0.10 })
print (killer.as_pct())
# -> Colonel Mustard :  60.0 %
#    Mrs. Peacock    :  10.0 %
#    Mrs. White      :  30.0 %
Assume now that investigations allow us establish the following three probabilistic rules:

if Mrs. White is the killer,
then she’ll be absent with prob. 90 %
else she’ll be absent with prob. 20 %

if Mrs. Peacock is innocent,
then she knows who’s the killer with prob. 75 %

if Mrs. Peacock is the killer or if she knows who’s the killer
then she’ll be absent with prob. 99 %
else she’ll be absent with prob. 10 %

mrs_white_is_absent = \
      lea.if_( killer == "Mrs. White",
               lea.event(0.90),
               lea.event(0.20))

mrs_peacock_knows_who_is_killer = \
      lea.if_( killer != "Mrs. Peacock",
               lea.event(0.75),
               True           )

mrs_peacock_is_absent = \
      lea.if_( (killer == "Mrs. Peacock") | mrs_peacock_knows_who_is_killer,
               lea.event(0.99),
               lea.event(0.10))

Note how we state with the True value, that Mrs. Peacock CERTAINLY knows who the killer is assuming she's HERSELF the killer (an assumption of sanity, missing in the given rule)!

Let's imagine we get now new evidences, one by one. Let us examine how the culpability probability distribution evolves as the evidences are added.

EVIDENCE 1: Mrs. Peacock is absent.

evidences = [mrs_peacock_is_absent]
print (killer.given(*evidences).as_pct())
# -> Colonel Mustard :  58.3 %
#    Mrs. Peacock    :  12.5 %
#    Mrs. White      :  29.2 %

EVIDENCE 2: Mrs. White is present.

mrs_white_is_present = ~mrs_white_is_absent
evidences.append(mrs_white_is_present)
print (killer.given(*evidences).as_pct())
# -> Colonel Mustard :  78.3 %
#    Mrs. Peacock    :  16.8 %
#    Mrs. White      :   4.9 %

EVIDENCE 3: The killer is a woman.

killer_is_woman = killer[:4] == "Mrs."
evidences.append(killer_is_woman)
print (killer.given(*evidences).as_pct())
# -> Mrs. Peacock :  77.5 %
#    Mrs. White   :  22.5 %
We see on this example how new evidences, added to known evidences, can dramatically change the prior probabilities.

This example may sound simplistic, far from true cases. Be aware however, that even simpler reasoning have been mistakenly used to famous cases in the past, leading to wrong results and tragic consequences (see Dreyfus affair and, more recently, Sally Clarke affair).

The problem of Chevalier de Méré

The problem of points is at the very root of probability theory, with the famous developments of Blaise Pascal starting from a question posed by Chevalier de Méré in 1654. This problem and resolution is discussed in details in the paper of P. Gorroochurn.

Let's assume that

  • chances of both players to win at any round are equal (1/2)
  • the first player having a total of 6 rounds wins a prize
  • at the time the game stops, player one has won 5 rounds and player two has won 3 rounds

How the prize should be be divided between the players?

Here is, translated in Lea, the method of recursion invented by Pascal, for two players with given probabilities to win a round.

def player_1_wins(prob_player_1_wins_one_round, player_1_rounds_short, player_2_rounds_short):
    if player_1_rounds_short == 0:
        return True
    if player_2_rounds_short == 0:
        return False
    return lea.if_(lea.event(prob_player_1_wins_one_round),
                   player_1_wins(prob_player_1_wins_one_round, player_1_rounds_short-1, player_2_rounds_short),
                   player_1_wins(prob_player_1_wins_one_round, player_1_rounds_short, player_2_rounds_short-1))
Noting that player one is 1 round short and player two is 3 rounds short, probability that player 1 wins the game is obtained as follows:
print (player_1_wins('1/2',1,3))
# -> False : 1/8
#    True  : 7/8
This means that 7/8 of the prize should be given to player one and 1/8 of the prize should be given to player two.

If you have installed the SymPy library, you can obtain a general formula based on probability p for player 1 to win at any round:

print (player_1_wins('p',1,3))
# -> False : -(p - 1)**3
#    True  : p*(p**2 - 3*p + 3)

For generalizing to n players, each having a given probability to win at each round, the translation is Lea is a bit more involved:

def winner(prob_player_wins_one_round, player_rounds_short):
    for (player,rounds_short) in player_rounds_short.items():
        if rounds_short == 0:
            return player
    def winner_if_player_wins_one_round(player):
        new_player_rounds_short = player_rounds_short.copy()
        new_player_rounds_short[player] -= 1
        return winner(prob_player_wins_one_round,new_player_rounds_short)
    return lea.pmf(prob_player_wins_one_round) \
           .switch_func(winner_if_player_wins_one_round).calc()
Here is the example above revisited:
print (winner({'A': '1/2', 'B': '1/2'}, {'A': 1, 'B': 3}))
# -> A : 7/8
#    B : 1/8
and here is an example with 3 players and unbalanced probabilities of win:
print (winner({'A': '1/6', 'B': '1/3', 'C': '1/2'}, {'A': 2, 'B': 3, 'C': 3}))
# -> A : 29/144
#    B : 36/144
#    C : 79/144

Rock-paper-scissors game

Let's play the rock-paper-scissors game! First, let us define a class that allows us to represent the three objects and the nontransitive win relationship:

class RPS(object):

    _cur_val = 0

    def __init__(self,name):
        object.__init__(self)
        self.name = name
        self.val = RPS._cur_val
        RPS._cur_val += 1

    def __str__(self):
        return self.name

    __repr__ = __str__

    def __hash__(self):
        return hash(self.name)

    def _cmp(self,other):
        delta = other.val - self.val
        if delta == 0:
            return 0
        if delta == 1 or delta == -2:
            return -1
        return +1

    def __eq__(self,other):
        return self._cmp(other) == 0

    def __ne__(self, other):
        return self._cmp(other) != 0

    def __gt__(self, other):
        return self._cmp(other) == 1

    def __ge__(self, other):
        return self._cmp(other) >= 0

    def __lt__(self,other):
        return self._cmp(other) == -1

    def __le__(self, other):
        return self._cmp(other) <= 0

r = RPS('rock')
p = RPS('paper')
s = RPS('scissors')
We can verify the win conditions by comparing the objects together:
print ((r,p,s))
# -> (rock, paper, scissors)
print (r < p)
# -> True
print (p < s)
# -> True
print (s < r)
# -> True
So far, Lea has not yet been used. We have just defined 3 objects that model the rules of the game and which contain no uncertainties. Now, assuming that players follow a random, independent choice at each turn of play, let us define 3 behaviors:

  1. balanced : the 3 objects have equal probabilities
  2. morerock : the rock has twice more chances to be chosen than the others
  3. morepaper : the paper has twice more chances to be chosen than the others

lea.set_prob_type('r')
balanced  = lea.pmf(((r,1), (p,1), (s,1)), ordered=True)
morerock  = lea.pmf(((r,2), (p,1), (s,1)), ordered=True)
morepaper = lea.pmf(((r,1), (p,2), (s,1)), ordered=True)
print (balanced)
# -> rock     : 1/3
#    paper    : 1/3
#    scissors : 1/3
print (morerock)
# -> rock     : 2/4
#    paper    : 1/4
#    scissors : 1/4
print (morepaper)
# -> rock     : 1/4
#    paper    : 2/4
#    scissors : 1/4
Note the use of ordered argument, to keep the order of the displayed values the same as the order of input (the sorting made by default would be a bit fuzzy considering the nontransitive order defined!).

Now, we are able to compare these behaviors to get the probabilities of win. The easier approach is to use the usual comparison operators. For example, the chances of win of morepaper over morerock are given by

print (P(morepaper > morerock))
# -> 3/8
The advantage of morepaper is not blatant because this result does not show the probability of tied game. This can be found as follows
print (P(morepaper == morerock))
# -> 5/16
A better approach consists in using the cmp(a,b) method, defined in Python 2.x : it returns -1, 0 or +1 depending whether a is less than, equal to or greater than b, respectively. For Python 3.x, since cmp is not defined, you may define it yourself or use the following trick:
cmp = RPS._cmp
The new approach requires to use the map method to be able to apply the cmp function on morepaper's values, providing morerock's values as second argument. We can then have the full picture of the results at once:
lea_cmp = lea.func_wrapper(cmp)
odds = lea_cmp(morepaper,morerock)
print (odds)
# -> -1 : 5/16
#     0 : 5/16
#     1 : 6/16
We see now clearly the advantage of choosing more rocks against an opponent that chooses more papers. In the present case, it wins with probability 6/16 and loose with probability 5/16.

The skeptical reader can use the joint method to verify this result, by consulting each possible draws of the game and add the atomic probabilities - provided that such reader is not skeptical on the joint method! -:

print (lea.joint(morepaper,morerock))
# -> (scissors, paper   ) : 1/16
#    (scissors, scissors) : 1/16
#    (scissors, rock    ) : 2/16
#    (rock    , rock    ) : 2/16
#    (rock    , paper   ) : 1/16
#    (rock    , scissors) : 1/16
#    (paper   , scissors) : 2/16
#    (paper   , rock    ) : 4/16
#    (paper   , paper   ) : 2/16
An even more practical way to check atomic results consists in letting Lea provides directly game results by injecting odds in the joint probability distribution (as first position, so the ordering makes a smart grouping):
print (lea.joint(odds,morepaper,morerock))
# -> (-1, rock    , paper   ) : 1/16
#    (-1, paper   , scissors) : 2/16
#    (-1, scissors, rock    ) : 2/16
#    ( 0, scissors, scissors) : 1/16
#    ( 0, rock    , rock    ) : 2/16
#    ( 0, paper   , paper   ) : 2/16
#    ( 1, paper   , rock    ) : 4/16
#    ( 1, scissors, paper   ) : 1/16
#    ( 1, rock    , scissors) : 1/16
Since the odds provides -1,0,+1 values, it is easy to get the distribution of fails, ties and wins after a given number of games by using the times(n) method. For example, here are the scores after 3 games, where values indicates the difference won games - lost games from the "morerock" player's point of view:
print (odds.times(3))
# -> -3 :  125/4096
#    -2 :  375/4096
#    -1 :  825/4096
#     0 : 1025/4096
#     1 :  990/4096
#     2 :  540/4096
#     3 :  216/4096 
Now, are there better strategies against morepaper?

Yes! In particular,

  • choose paper with 8 more chances than the others:
evenmorepaper = lea.pmf(((r,1), (p,8), (s,1)), ordered=True)
print (lea_cmp(evenmorepaper,morerock))
# -> -1 : 11/40
#     0 : 11/40
#     1 : 18/40
  • choose always paper:
alwayspaper = lea.vals(p)
print (lea_cmp(alwayspaper,morerock))
# -> -1 : 1/4
#     0 : 1/4
#     1 : 2/4

Now, what about the balanced behavior?

  • against morerock:
print (lea_cmp(balanced,morerock))
# -> -1 : 1/3
#     0 : 1/3
#     1 : 1/3
  • against a fixed choice (e.g. paper):
print (lea_cmp(balanced,alwayspaper))
# -> -1 : 1/3
#     0 : 1/3
#     1 : 1/3
  • against itself:

print (lea_cmp(balanced,balanced.new()))
# -> -1 : 1/3
#     0 : 1/3
#     1 : 1/3
We demonstrate in these last few examples the known (and proven) fact that the balanced strategy gives a tied game on the long term, whatever opponent's behavior. It is actually the best strategy when nothing is known about the opponent's behavior.

Note: you may have noticed the call .new() in the last case. Why? The reason is simply the referential consistency of Lea: the following construct

print (lea_cmp(balanced,balanced))
# -> 0 : 1
does not model two independent balanced opponents but an absurd game where the two players, by some bizarre entanglement, make always the same choice. The result is then obviously a tie.

von Neumann extractor

Imagine you have a physical device able to produce a "true" random stream of bits. Great! Unfortunately, this device has a bias : the probability of having 1 or 0, although constant in time, is not 1/2; furthermore, this probability is unknown.

Question: Can you find a mechanism to remove this bias and produce a stream of true random bits, with balanced probabilities (1/2)?

John von Neumann has invented a simple yet awesome trick to do the job. See von Neumann extractor.

Let us demonstrate the algorithm of von Neumann. Since we don't have the physical random device at our disposal, we shall first simulate the biased stream of bits. We use Lea to build an infinite random bits iterator, assuming a probability of 80% to have a 0:

biased_random_iter = lea.pmf({0: 0.80, 1: 0.20}).random_iter()
Note that the previous statement is equivalent to this:
def gen_biased_random():
    source = lea.pmf({0: 0.80, 1: 0.20})
    while True:
        yield source.random()
biased_random_iter = gen_biased_random()
We can make a trial by taking a sample of 500 bits on this biased random source and control the frequency:
from itertools import islice
biased_sample = tuple(islice(biased_random_iter,500))
print (biased_sample)
# -> (0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0)
print (lea.vals(*biased_sample).as_pct())
# -> 0 :  79.40 %
#    1 :  20.60 %
We see that the frequency of 0 is close to 80 %, as expected.

Now, we build the von Neumann extractor, as an infinite random bits iterator that takes the biased source as input:

from itertools import islice

def gen_von_neumann_extractor(random_iter):
    while True:
        (b0,b1) = islice(random_iter,2)
        if b0 != b1:
            yield b0

von_neumann_extractor_iter = gen_von_neumann_extractor(biased_random_iter)
Let us make a try, taking a sample of 500 bits from this device (note that the process shall likely consume a lot more bits on the biased source!) :
unbiased_sample = tuple(islice(von_neumann_extractor_iter,500))
print (unbiased_sample)
# -> (0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1)
print (lea.vals(*unbiased_sample).as_pct())
# -> 0 :  47.2 %
#    1 :  52.8 %
We can see that the frequencies are now close to 50%, as expected.

Thank you, Doctor Miraculis !

Central limit law

a contribution of Nicky van Foreest

Requires: SciPy, numpy and matplotlib (see SciPy Stack)

From theory we know that the distribution of the sum of identical and independent random variables converges to the normal distribution. We can test this with Lea.

First we need some basic inputs.

import numpy as np
from scipy.stats import norm
Now make a uniform die with four sides and take the 5 fold sum.
die = lea.vals(0, 1, 2, 3)
dieN = die.times(5)
dieN.plot()
dieN.png

We need the support and the cumulative distribution of dieN.

T = np.array(tuple(zip(*dieN.cdf_tuple)))
# the support is stored in T[0,:]
supp = T[0,:]
# and the distribution in T[1,:]
F = T[1,:]
The last step is to compare F to the cumulative distribution of the normal distribution. We need a continuity correction, for otherwise the normal distribution seems to be shifted to the right.
mu = dieN.mean
std = dieN.stdev
FN = norm(loc = mu , scale = std).cdf(supp + 0.5)
Let's compare.
print (np.max(np.abs(F-FN)))
# -> 0.0049414764166033631
This is truly astonishing. A 5-fold convolution is already this good.

Bullshit generator

Some years ago, out of desperation, I wrote a bullshit generator in Python.

Here is a typical sample of what it produces:


As a matter of fact, the engine is provided by the interface where the actor on top of a peer-to-peer interoperability under the specification derivation rule is performed by the available SSO. The timing thread is generated by a full XML when the Web 2.0 that executes the issuer is generated by the artifact. The world-leading Web browser orchestrates a SOAP message. An aware issuer is envisioned in the generic design operator, which competes with the factory, in the aggregator that is unaffected. The authorized entity provides an interface to the fat client. The engine updates the integration. The algorithm of the model that is efficient sent to a technology thin client, which triggers a cluster, is provided by the version. The approach is repeatedly required by the action timing security under a space when the ancillary owner generates the XML-based UML model.


The program works with a a small dictionary of words and a simplified English grammar (very close to the one wired in my brain, since I am not a native English speaker!). The words are classified in categories -nouns, verbs, etc-; to simplify, these words are chosen with equal probability, which was easily done with the built-in random.choice of Python (see TerminalNode class). For the grammar rules however, in order to avoid too long sentences, I put a probability weight on each derivation rule, so to make the simplest constructions the likeliest. Part of the complexity of the program was then in the algorithm to make the random choices according to the probability weights (see Node class).

In what follows, I show how to use Lea to greatly simplify the Node and TerminalNode classes. The statements to define English words and grammars are unchanged from the original version, excepting the swapping of probability weights and grammar rules to accommodate the Lea.pmf method. Note that Python's random module is no longer imported (although used by Lea internally).

'''python
======================================================================
 Bullshit Generator 
    by Pierre Denis, 2009, 2014
======================================================================
'''

# --------------------------------------------------
# grammar engine
# --------------------------------------------------

import lea

class Node(object):

    def setTermsChoices(self,*termsChoices):
        self.termsChoices = lea.pmf(termsChoices)

    def getWords(self):
        terms = self.termsChoices.random()
        for term in terms:
            if isinstance(term,str):
                yield term
            else:
                for word in term.getWords():
                    yield word

    def getString(self):
        res = " ".join(self.getWords())
        res = ", ".join(w.strip() for w in res.split(",") if w.strip())
        if res.endswith(", "):
            res = res[:-2]
        return res[0].upper() + res[1:] + "."


class TerminalNode(object):

    def __init__(self,*words):
        self.words = lea.vals(*words)

    def getWords(self):
        yield self.words.random()

# --------------------------------------------------
# grammar
# --------------------------------------------------

verb = TerminalNode(
    "accesses", "activates", "administrates", "aggregates", "builds",
    "calculates", "checks", "competes with", "completes", "complies with",
    "controls", "covers", "delivers", "dispatches", "eases", "encapsulates",
    "encompasses", "executes", "extracts", "features",
    "generates", "gets", "governs", "guides", "has", "increases",
    "inherits from", "is", "keeps track of", "leverages", "makes",
    "manages",
    "manages", "maximizes", "mitigates", "monitors", "must have", "needs",
    "offers", "opens", "operates on", "optimizes", "orchestrates",
    "overwrites", "performs", "populates", "precludes", "provides",
    "provides",
    "provides an interface to", "reads", "receives", "reduces",
    "reduces the need of", "registers", "regulates", "relies on",
    "requires",
    "resides on", "resides within", "retrieves", "retrieves the data in",
    "runs on",
    "schedules", "integrates with", "sends", "shall be",
    "shall have", "should be", "should have", "starts", "stores",
    "streamlines", "subscribes to", "subscribes to", "supersedes", "takes",
    "targets", "triggers", "updates", "validates", "writes")

passiveVerb = TerminalNode(
    "accessed by", "achieved by", "aggregated by", "applicable for",
    "asserted by", "authorized by",
    "based upon", "built from", "built upon", "collected by",
    "controlled by",
    "dedicated to", "deployed on", "derived from", "dispatched by",
    "driven by", "eased by", "enabled by", "envisioned in",
    "extracted from", "generated by", "in the scope of", "installed on",
    "integrated in",
    "located in", "managed by", "maximized by", "monitored by", "opened by",
    "optimized by", "orchestrated by", "packaged in", "performed by",
    "populated by", "processed by", "provided by", "provided by",
    "received by", "refreshed by", "registered in", "related to",
    "required by",
    "responsible for", "scheduled by", "sent to", "serialized by",
    "serialized in", "started in", "stored by", "stored in", "stored on",
    "the interface of", "updated by", "validated by")

aSimpleName = TerminalNode(
    "COTS", "GRID processing",
    "Java program", "LDAP registry", "Portal", "RSS feed", "SAML token",
    "SOAP message", "SSO", "TCP/IP", "UML model", "URL",
    "W3C", "Web", "Web 2.0", "Web browser", "Web page",
    "Web service", "back-end", "backbone", "bandwidth", "bean",
    "bridge", "browser", "bus", "business", "business model", "call",
    "catalogue", "class", "client", "cluster", "collection",
    "communication", "component", "compression",
    "concept", "conceptualization", "connection", "console", "content",
    "context", "cookie", "customization", "data", "database",
    "datastore", "deployment",
    "derivation rule", "design", "development", "device", "directory",
    "discovery", "dispatcher", "document", "domain", "factory",
    "fat client", "feature", "file", "form", "frame", "framework",
    "function", "gateway", "genericity", "geomanagement", "goal",
    "governance", "granularity", "guideline", "header", "key", "layer",
    "leader", "library", "link", "list", "log file", "logic",
    "look-and-feel",
    "manager", "market", "mechanism", "message", "meta-model",
    "metadata", "model", "modeling", "module", "network", "performance",
    "persistence", "personalization", "plug-in", "policy", "port",
    "portal", "practice",
    "presentation layer", "privacy", "private key", "procedure",
    "process", "processor", "processing", "product", "protocol",
    "recommendation",
    "registration", "registry", "relationship", "resource",
    "responsibility", "role",
    "rule", "scenario", "scenario", "scheduler", "schema", "security",
    "server", "service", "service provider", "servlet", "session",
    "skeleton", "software", "solution", "source", "space",
    "specification", "standard", "state", "statement", "streaming",
    "style sheet", "subscriber", "subsystem", "system", "system",
    "table", "target", "task", "taxonomy", "technique", "technology",
    "template", "thin client", "thread", "throughput", "timing", "tool",
    "toolkit", "topic", "unit", "usage", "use case", "user",
    "user experience", "validation", "value", "version", "vision", "work",
    "workflow")

anSimpleName = TerminalNode(
    "API", "IP address", "Internet", "UDDI", "XML", "XML file",
    "abstraction", "access", "acknowledgment", "action", "actor",
    "administrator", "aggregator", "algorithm", "application", "approach",
    "architecture", "artifact", "aspect", "authentication", "availability",
    "encapsulation", "end-point", "engine", "engine", "entity",
    "entity", "environment", "event", "identifier", "information",
    "integration", "interface", "interoperability", "issuer", "object",
    "ontology", "operation", "operator", "operator", "opportunity",
    "orchestration", "owner")

aAdjective = TerminalNode(
    "BPEL",  "DOM", "DTD", "GRID", "HTML", "J2EE",
    "Java", "Java-based", "Java-based", "UML", "SAX", "WFS", "WSDL",
    "basic", "broad", "bug-free",
    "business-driven", "client-side", "coarse", "coherent", "compatible",
    "complete", "compliant", "comprehensive", "conceptual", "consistent",
    "control", "controller", "cost-effective",
    "custom", "data-driven", "dedicated", "distributed", 
    "dynamic", "encrypted", "event-driven", "fine-grained", "first-class",
    "free", "full",
    "generic", "geo-referenced", "global", "global", "graphical",
    "high-resolution", "high-level", "individual", "invulnerable",
    "just-in-time", "key",
    "layered", "leading", "lightweight", "logical", "main", "major",
    "message-based",
    "most important", "multi-tiers", "narrow", "native", "next",
    "next-generation",
    "normal", "password-protected", "operational", "peer-to-peer",
    "performant", "physical",
    "point-to-point", "polymorphic", "portable", "primary", "prime",
    "private", "proven", "public", "raw", "real-time", "registered",
    "reliable", "remote",
    "respective", "right", "robust", "rule-based", "scalable", "seamless",
    "secondary", "semantic", 
    "server-side", "service-based", "service-oriented", "simple", "sole",
    "specific", "state-of-the-art", "stateless", "storage", "sufficient",
    "technical", "thread-safe", "uniform", "unique", "used", "useful",
    "user-friendly", "virtual", "visual", "web-based", "web-centric",
    "well-documented", "wireless", "world-leading", "zero-default")

anAdjective = TerminalNode(
    "AJAX", "OO", "XML-based", "abstract", "ancillary", "asynchronous",
    "authenticated", "authorized", "auto-regulated", "available", "aware",
    "efficient",
    "international", "interoperable", "off-line", "official", "online",
    "open", "operational",
    "other", "own", "unaffected", "up-to-date")

adverb = TerminalNode(
    "basically", "comprehensively", "conceptually", "consistently",
    "definitely", "dramatically",
    "dynamically", "expectedly", "fully", "generally", "generically",
    "globally", "greatly", "individually", "locally", "logically",
    "mainly", "mostly", "natively",
    "officially", "physically", "practically", "primarily",
    "repeatedly", "roughly", "sequentially", "simply", "specifically", 
    "surely", "technically", "undoubtly", "usefully", "virtually")

sentenceHead = TerminalNode(
    "actually", "as a matter of fact", "as said before", "as stated before",
    "basically", "before all", "besides this", "beyond that point",
    "clearly",
    "conversely", "despite these facts", "for this reason",
    "generally speaking",
    "if needed", "in essence", "in other words", "in our opinion",
    "in the long term", "in the short term", "in this case", "incidentally",
    "moreover", "nevertheless", "now", "otherwise", "periodically",
    "roughly speaking", "that being said", "then", "therefore",
    "to summarize", "up to here", "up to now", "when this happens")

(name, aName, anName, nameTail, adjective, nameGroup,
 simpleNameGroup, verbalGroup, simpleVerbalGroup, sentence,
 sentenceTail) = [Node() for i in range(11)]

aName.setTermsChoices(
    (( aSimpleName,      ), 50 ),
    (( aSimpleName, name ),  5 ),
    (( aSimpleName, name ),  8 ),
    (( aName, nameTail   ),  5 ))

anName.setTermsChoices(
    (( anSimpleName,      ), 50 ),
    (( anSimpleName, name ),  8 ),
    (( anName, nameTail   ),  5 ))

nameTail.setTermsChoices(
    (( "of", nameGroup        ), 2 ),
    (( "from", nameGroup      ), 2 ),
    (( "under", nameGroup     ), 1 ),
    (( "on top of", nameGroup ), 1 ))

name.setTermsChoices(
    (( aName,  ), 1 ),
    (( anName, ), 1 ))

adjective.setTermsChoices(
    (( aAdjective,  ), 1 ),
    (( anAdjective, ), 1 ))

nameGroup.setTermsChoices(
    (( simpleNameGroup,                                   ), 10 ),
    (( simpleNameGroup, passiveVerb, nameGroup            ),  1 ),
    (( simpleNameGroup, "that", simpleVerbalGroup         ),  1 ),
    (( simpleNameGroup, ", which", simpleVerbalGroup, "," ),  1 ))

simpleNameGroup.setTermsChoices(
    (( "the", name             ), 40 ),
    (( "the", adjective, name  ), 20 ),
    (( "a", aName              ), 10 ),
    (( "an", anName            ), 10 ),
    (( "a", aAdjective, name   ),  5 ),                
    (( "an", anAdjective, name ),  5 ))  

verbalGroup.setTermsChoices(
    (( verb, nameGroup                      ), 10 ),
    (( adverb, verb, nameGroup              ),  1 ),
    (( "is", passiveVerb, nameGroup         ), 10 ),
    (( "is", adverb, passiveVerb, nameGroup ),  1 ),
    (( "is", adjective                      ),  1 ),
    (( "is", adverb, adjective              ),  1 ))

simpleVerbalGroup.setTermsChoices(
    (( verb, simpleNameGroup ), 2 ),
    (( "is", adjective       ), 1 ))

sentence.setTermsChoices(
    (( nameGroup, verbalGroup                     ), 20 ),
    (( sentenceHead, "," , nameGroup, verbalGroup ),  4 ),
    (( sentence, sentenceTail                     ),  4 ))

sentenceTail.setTermsChoices(
    (( "in", nameGroup                 ), 12 ),
    (( "within", nameGroup             ),  5 ),
    (( "where", nameGroup, verbalGroup ),  5 ),
    (( "when", nameGroup, verbalGroup  ),  5 ),
    (( "because it", verbalGroup       ),  2 ),
    (( "; that's why it", verbalGroup  ),  1 ))

# --------------------------------------------------
# main program
# --------------------------------------------------
try:
    import win32com.client
    voice = win32com.client.Dispatch("sapi.SPVoice")
except:
    voice = None

print ("press <enter> to resume, 'q'+<enter> to quit\n")

while True:
    for i in range(8):
        generatedSentence = sentence.getString()
        print (generatedSentence)
        if voice:
            voice.speak(generatedSentence)
    # PY2: cmd = raw_input()        
    cmd = input()
    if cmd.strip().lower() == "q":
        break

Note: for Python 2 users, just replace the statement marked with # PY2:

You can run this program and verify that it produces the same kind of gobbledygook than the original version. Let him the last word:


The official operator is performed by the most important actor. The interoperability is off-line. The environment is authorized by the rule-based identifier that needs the availability thread. The W3C is located in a high-resolution style sheet because it is surely point-to-point. A session sends the use case owner that guides the asynchronous content. The approach aggregator, which shall have the algorithm, shall be an identifier. The OO link administrates a controller information that executes an identifier when the thread-safe opportunity engine user experience that populates a bridge roughly extracts the most important API. The environment is derived from the dispatcher from a unit acknowledgment of an interface.


Updated