Add a function like "map", but where the called function returns a distribution
As an example, consider rolling a die, and then rolling N dice (where N is the number you rolled on the first die). You want to know the total of that second roll. (This example is inspired by the rules of a game called "To Court the King").
So you want is, in effect
first = die(6)
result = dice(first,6)
but this doesn't work, of course, because first
is a distribution, not a number.
I have been using the following function:
def mapdist(dist, fn):
"""Call fn on each value of a distribution, and merge the results.
fn should take a single value as argument, and return a distribution."""
vps = []
for v, p in dist.vps():
for v1, p1 in fn(v).vps():
vps.append((v1, p * p1))
return Lea.fromValFreqs(*vps)
I'm not entirely sure I like the name, but the idea seems sound. It seems to work on this example, at least, although a more complex example I'm working on:
dist = Lea.if_((hits == N) || (left == 0), sofar + hits,
mapdist(left, lambda l: rest(Lea.binom(l, 1, 6), l, sofar + hits))
is failing with an error that Lea.binom()
needs a strictly positive integer (which means that this case wasn't filtered out by the first part of the if_
. I'm not sure if that's the fault of my implementation (and it should work) or if I'm musunderstanding how if_
works.
Regardless of these details, I think the mapdist
function would be a useful addition to the library.
Comments (11)
-
reporter -
repo owner Yes, indeed! I put hereafter the answer I was preparing just before I saw your last comment.
I understand the need but take care that your implementation of
mapdist
is wrong! You can see that by modelling "To Court the King" asmapdist(die(6),lambda v: dice(v,6))
With this distribution, you'll see that the probability to get 1 is 1/55986. This is wrong since the right value is actually 1/36. The reason of this error lies in the fact that, in Lea, probabilities are stored as integer weights, not floating-point numbers. Your implementation would be right with classical float probabilities but it is faulty here because you mix distributions having different total probability weights. For example, the total weight of one die is 6 while it is 36 for two dice. Adding these probability weights without a correcting factor entails a bias in favor of the distributions having the biggest total weights.
OK, now the good news... There exists already a method in Lea for this need. This method, named
flat
, is not documented however in the wiki pages (very, very sorry for that!). This method works on a prob. distribution that contains itself prob. distributions as values. For example, you can model "To Court the King" as a bag containing 6 boxes of equal size, each box containing 1, 2, ..., 6 dice. Then you chose randomly a box in the bag and you throw the dice it contains. Here is the translation in Lea (the "bag" is the object on whichflat()
applies):Lea.fromSeq(dice(n,6) for n in range (1,7)).flat()
The result is a distribution of numbers from 1 to 36, with the expected probabilities.
Now, if you prefer
mapdist
function, here is how you can implement it the right way:def mapdist(dist, fn): return dist.map(fn).getAlea().flat()
I'll try to answer the second point in my next comment... Stay tuned!
-
repo owner - changed status to open
-
repo owner I come back to the second issue you mention (
dist
variable). As a minor remark, the||
in your snippet should be|
; It's invalid syntax for Python so I presume it's just a typo when you wrote your post.OK, I'm happy that you now know the dangers of
getAlea()
method, as you wrote in issue#31. This is again the troublemaker here. In your implementation ofmapdist
, independently of the problem I mention above, there is another issue: you usevps()
method, which itself calls...getAlea()
! Also, theLea.fromValFreqs
returns an Alea instance. That being said, my implementation ofmapdist
(see above) will fail also since it containsgetAlea()
(this is needed to avoid an exception, a bit surprising for me). So, all these methods, yours and mine, are doomed to failure for yourLea.if_
expression.I am working on that : I'm updating the Rlea class (that implements
flat()
) so asmapdist
can be implemented without anygetAlea()
call; this would hopefully solve your issue. I'll let you know when this is ready (Lea 2.2.0.beta-8). -
repo owner -
assigned issue to
-
assigned issue to
-
repo owner Simplify Rlea class implementation to put correcting factors in a tuple instead of a dictionary; this allows calling 'flat()' method on non-Alea instances (refs
#33)→ <<cset 1bb4e1a8f2d3>>
-
reporter Thank you for the explanations. As usual, they are really helpful and give me lots of food for thought. I really appreciate your patience and help.
-
repo owner You're welcome. It's a great pleasure for me to get such interesting and challenging feedback! :)
Now, some details on last Lea version 2.2.0-beta.8 (available on PyPI). As announced, I've changed a bit the way
flat
is implemented. You can now replace your callsmapdist(dist, lambda ...)
by this:
dist.map(lambda ...).flat()
There is no
getAlea
hidden in this expression; so the advanced constructions (e.g. usinggiven
orif_
methods) should work now. -
repo owner - changed status to resolved
-
repo owner - changed status to closed
-
repo owner Simplify Rlea class implementation to put correcting factors in a tuple instead of a dictionary; this allows calling 'flat()' method on non-Alea instances (refs
#33)→ <<cset ff2830a78041>>
- Log in to comment
Hmm, reading the code, is this what
dist.map(f).flat()
would do?