Add a function like "map", but where the called function returns a distribution

Issue #33 closed
Paul Moore created an issue

As an example, consider rolling a die, and then rolling N dice (where N is the number you rolled on the first die). You want to know the total of that second roll. (This example is inspired by the rules of a game called "To Court the King").

So you want is, in effect

first = die(6)
result = dice(first,6)

but this doesn't work, of course, because first is a distribution, not a number.

I have been using the following function:

def mapdist(dist, fn):
    """Call fn on each value of a distribution, and merge the results.

    fn should take a single value as argument, and return a distribution."""
    vps = []
    for v, p in dist.vps():
        for v1, p1 in fn(v).vps():
            vps.append((v1, p * p1))
    return Lea.fromValFreqs(*vps)

I'm not entirely sure I like the name, but the idea seems sound. It seems to work on this example, at least, although a more complex example I'm working on:

dist = Lea.if_((hits == N) || (left == 0), sofar + hits,
                    mapdist(left, lambda l: rest(Lea.binom(l, 1, 6), l, sofar + hits))

is failing with an error that Lea.binom() needs a strictly positive integer (which means that this case wasn't filtered out by the first part of the if_. I'm not sure if that's the fault of my implementation (and it should work) or if I'm musunderstanding how if_ works.

Regardless of these details, I think the mapdist function would be a useful addition to the library.

Comments (11)

  1. Pierre Denis repo owner

    Yes, indeed! I put hereafter the answer I was preparing just before I saw your last comment.

    I understand the need but take care that your implementation of mapdist is wrong! You can see that by modelling "To Court the King" as

    mapdist(die(6),lambda v: dice(v,6))
    

    With this distribution, you'll see that the probability to get 1 is 1/55986. This is wrong since the right value is actually 1/36. The reason of this error lies in the fact that, in Lea, probabilities are stored as integer weights, not floating-point numbers. Your implementation would be right with classical float probabilities but it is faulty here because you mix distributions having different total probability weights. For example, the total weight of one die is 6 while it is 36 for two dice. Adding these probability weights without a correcting factor entails a bias in favor of the distributions having the biggest total weights.

    OK, now the good news... There exists already a method in Lea for this need. This method, named flat, is not documented however in the wiki pages (very, very sorry for that!). This method works on a prob. distribution that contains itself prob. distributions as values. For example, you can model "To Court the King" as a bag containing 6 boxes of equal size, each box containing 1, 2, ..., 6 dice. Then you chose randomly a box in the bag and you throw the dice it contains. Here is the translation in Lea (the "bag" is the object on which flat() applies):

    Lea.fromSeq(dice(n,6) for n in range (1,7)).flat()
    

    The result is a distribution of numbers from 1 to 36, with the expected probabilities.

    Now, if you prefer mapdist function, here is how you can implement it the right way:

    def mapdist(dist, fn):
        return dist.map(fn).getAlea().flat()
    

    I'll try to answer the second point in my next comment... Stay tuned!

  2. Pierre Denis repo owner

    I come back to the second issue you mention (dist variable). As a minor remark, the || in your snippet should be |; It's invalid syntax for Python so I presume it's just a typo when you wrote your post.

    OK, I'm happy that you now know the dangers of getAlea() method, as you wrote in issue #31. This is again the troublemaker here. In your implementation of mapdist, independently of the problem I mention above, there is another issue: you use vps()method, which itself calls... getAlea() ! Also, the Lea.fromValFreqs returns an Alea instance. That being said, my implementation of mapdist (see above) will fail also since it contains getAlea() (this is needed to avoid an exception, a bit surprising for me). So, all these methods, yours and mine, are doomed to failure for your Lea.if_ expression.

    I am working on that : I'm updating the Rlea class (that implements flat()) so as mapdist can be implemented without any getAlea() call; this would hopefully solve your issue. I'll let you know when this is ready (Lea 2.2.0.beta-8).

  3. Pierre Denis repo owner

    Simplify Rlea class implementation to put correcting factors in a tuple instead of a dictionary; this allows calling 'flat()' method on non-Alea instances (refs #33)

    → <<cset 1bb4e1a8f2d3>>

  4. Paul Moore reporter

    Thank you for the explanations. As usual, they are really helpful and give me lots of food for thought. I really appreciate your patience and help.

  5. Pierre Denis repo owner

    You're welcome. It's a great pleasure for me to get such interesting and challenging feedback! :)

    Now, some details on last Lea version 2.2.0-beta.8 (available on PyPI). As announced, I've changed a bit the way flat is implemented. You can now replace your calls

    mapdist(dist, lambda ...)
    

    by this:

    dist.map(lambda ...).flat()
    

    There is no getAlea hidden in this expression; so the advanced constructions (e.g. using given or if_ methods) should work now.

  6. Pierre Denis repo owner

    Simplify Rlea class implementation to put correcting factors in a tuple instead of a dictionary; this allows calling 'flat()' method on non-Alea instances (refs #33)

    → <<cset ff2830a78041>>

  7. Log in to comment