# Add a function like "map", but where the called function returns a distribution

Issue #33 closed
Paul Moore
created an issue

As an example, consider rolling a die, and then rolling N dice (where N is the number you rolled on the first die). You want to know the total of that second roll. (This example is inspired by the rules of a game called "To Court the King").

So you want is, in effect

```first = die(6)
result = dice(first,6)
```

but this doesn't work, of course, because `first` is a distribution, not a number.

I have been using the following function:

```def mapdist(dist, fn):
"""Call fn on each value of a distribution, and merge the results.

fn should take a single value as argument, and return a distribution."""
vps = []
for v, p in dist.vps():
for v1, p1 in fn(v).vps():
vps.append((v1, p * p1))
return Lea.fromValFreqs(*vps)
```

I'm not entirely sure I like the name, but the idea seems sound. It seems to work on this example, at least, although a more complex example I'm working on:

```dist = Lea.if_((hits == N) || (left == 0), sofar + hits,
mapdist(left, lambda l: rest(Lea.binom(l, 1, 6), l, sofar + hits))
```

is failing with an error that `Lea.binom()` needs a strictly positive integer (which means that this case wasn't filtered out by the first part of the `if_`. I'm not sure if that's the fault of my implementation (and it should work) or if I'm musunderstanding how `if_` works.

Regardless of these details, I think the `mapdist` function would be a useful addition to the library.

1. reporter

Hmm, reading the code, is this what `dist.map(f).flat()` would do?

2. repo owner

Yes, indeed! I put hereafter the answer I was preparing just before I saw your last comment.

I understand the need but take care that your implementation of `mapdist` is wrong! You can see that by modelling "To Court the King" as

```mapdist(die(6),lambda v: dice(v,6))
```

With this distribution, you'll see that the probability to get 1 is 1/55986. This is wrong since the right value is actually 1/36. The reason of this error lies in the fact that, in Lea, probabilities are stored as integer weights, not floating-point numbers. Your implementation would be right with classical float probabilities but it is faulty here because you mix distributions having different total probability weights. For example, the total weight of one die is 6 while it is 36 for two dice. Adding these probability weights without a correcting factor entails a bias in favor of the distributions having the biggest total weights.

OK, now the good news... There exists already a method in Lea for this need. This method, named `flat`, is not documented however in the wiki pages (very, very sorry for that!). This method works on a prob. distribution that contains itself prob. distributions as values. For example, you can model "To Court the King" as a bag containing 6 boxes of equal size, each box containing 1, 2, ..., 6 dice. Then you chose randomly a box in the bag and you throw the dice it contains. Here is the translation in Lea (the "bag" is the object on which `flat()` applies):

```Lea.fromSeq(dice(n,6) for n in range (1,7)).flat()
```

The result is a distribution of numbers from 1 to 36, with the expected probabilities.

Now, if you prefer `mapdist` function, here is how you can implement it the right way:

```def mapdist(dist, fn):
return dist.map(fn).getAlea().flat()
```

I'll try to answer the second point in my next comment... Stay tuned!

3. repo owner

I come back to the second issue you mention (`dist` variable). As a minor remark, the `||` in your snippet should be `|`; It's invalid syntax for Python so I presume it's just a typo when you wrote your post.

OK, I'm happy that you now know the dangers of `getAlea()` method, as you wrote in issue #31. This is again the troublemaker here. In your implementation of `mapdist`, independently of the problem I mention above, there is another issue: you use `vps()`method, which itself calls... `getAlea()` ! Also, the `Lea.fromValFreqs` returns an Alea instance. That being said, my implementation of `mapdist` (see above) will fail also since it contains `getAlea()` (this is needed to avoid an exception, a bit surprising for me). So, all these methods, yours and mine, are doomed to failure for your `Lea.if_` expression.

I am working on that : I'm updating the Rlea class (that implements `flat()`) so as `mapdist` can be implemented without any `getAlea()` call; this would hopefully solve your issue. I'll let you know when this is ready (Lea 2.2.0.beta-8).

4. repo owner

Simplify Rlea class implementation to put correcting factors in a tuple instead of a dictionary; this allows calling 'flat()' method on non-Alea instances (refs #33)

→ <<cset 1bb4e1a8f2d3>>

5. reporter

Thank you for the explanations. As usual, they are really helpful and give me lots of food for thought. I really appreciate your patience and help.

6. repo owner

You're welcome. It's a great pleasure for me to get such interesting and challenging feedback! :)

Now, some details on last Lea version 2.2.0-beta.8 (available on PyPI). As announced, I've changed a bit the way `flat` is implemented. You can now replace your calls

```mapdist(dist, lambda ...)
```

by this:

```dist.map(lambda ...).flat()
```

There is no `getAlea` hidden in this expression; so the advanced constructions (e.g. using `given` or `if_` methods) should work now.