Add a means to generate additional random variables "on the fly"
In many situations simulating games, I need to (for example) roll extra dice depending on what I've already rolled. (For example, in Monopoly, you roll 2 dice, and if they are a pair, you roll again). This can be approximated by rolling as many dice as you might ever need and simply ignoring the unwanted ones, but the number of results rapidly gets very large that way, and the calculations get slow.
It would be helpful to be able to only generate results as needed. This may already be possible, but I couldn't find a way from looking through the documentation (so if it is possible, could an example be added to the docs? :-))
Comments (17)
-
repo owner -
reporter Ah, right - that's how you can use
if_
for something like this. I wondered if you could, but couldn't quite work out how. I'm going to have to ponder on it to understand completely. For example, why do you returndSum.getAlea()
rather than justdSum
?But regardless, thanks for the example.
-
repo owner You're welcome!
Well, some explanations on
getAlea()
are given here. The thing is that, in the present function, you could remove the.getAlea()
call and get the very same result! The only difference is the processing time, which should then increase dramatically asnbMaxThrows
grows. Why is it so? Let me try to explain on the examplerecDice(3)
:- without the
.getAlea()
call, we have "lazy evaluation"; firstly, a tree of depth 3 is built, with all the thrown dice as leaves; then, when the result is displayed, this tree is explored to get the resulting distribution : complexity = O(6**(#dice)) - with the
.getAlea()
call, therecDice(1)
,recDice(2)
,recDice(3)
are true distributions calculated in sequence, each one from the previous one; there are less cases to explore because the collapsing effect of summing dice (e.g. 2 dice -> 36 cases become 11 cases).
Sorry if all this sounds a bit convoluted but this is actually the very heart of Lea!
- without the
-
reporter Not at all, that's very helpful. I'll go and read the explanation on
getAlea
fully as well. I hadn't read through to the 3rd tutorial yet, as I had (mistakenly!) thought it was covering topics I wasn't yet interested in. -
repo owner - changed status to open
-
repo owner -
assigned issue to
-
assigned issue to
-
reporter I've been thinking some more about this. In general terms, I think that something like your
recDice
function, is probably sufficient in practice. The problem is that it's hard to write such functions. Specifically, my real use case with the Monopoly example needs to return a special result, "go to Gaol" if you roll three pairs in a row. It's not difficult to modify the function to do this, but it's quite hard to formulate it in a way that doesn't then try to add the special value into the sum. I managed it, as follows:def recMonopolyDice(nbMaxThrows): d1 = Lea.interval(1,6) d2 = d1.clone() dSum = d1 + d2 if nbMaxThrows == 1: dSum = Lea.if_(d1==d2,"Gaol",dSum) else: dSum = Lea.if_(d1==d2,(lambda x:Lea.if_(x=="Gaol",x,dSum+x))(recMonopolyDice(nbMaxThrows-1)),dSum) return dSum.getAlea()
but it's not pretty.
What I'm currently looking at is whether it's possible to write "helper" functions that would make writing this type of calculation more easily. It feels to me that the trick is to think in a "functional programming" style, rather than in imperative "do this N times, if such a thing happens then ..." terms. But I've not done much functional programming, so I'll have to do some research to work out how they would express a problem like this (I think they'd use a "right fold" somehow, but I need to work out what that means :-))
-
repo owner Nice! That's clever. I don't see a better way to propagate "Gaol".
Now, since "Gaol" is a kind of absorbing element for addition, I had the idea of another implementation with -infinity acting as "Gaol" :
__GAOL = -float("inf") def __recMonopolyDiceG(nbMaxThrows): d1 = Lea.interval(1,6) d2 = d1.clone() dSum = d1 + d2 if nbMaxThrows == 1: dSum = Lea.if_(d1==d2,__GAOL,dSum) else: dSum += Lea.if_(d1==d2,__recMonopolyDiceG(nbMaxThrows-1),0) return dSum.getAlea() def recMonopolyDiceG(nbMaxThrows): return __recMonopolyDiceG(nbMaxThrows).map(lambda v: 'Gaol' if v==__GAOL else v)
The
recMonopolyDiceG
is just facade function converting -infinity into the string "Gaol"; I checked that this function gives the same result as yours. For your information, you can use theequiv
method to make such check, e.g.recMonopolyDice(3).equiv(recMonopolyDiceG(3))
shall returnTrue
.OK, this alternative implementation is just a trick that moves the complexity. Having to program such function remains too complex for simple rules that any child can understand. I'm looking forward for your ideas.
-
reporter I've had a revelation, that I'd not really understood about
getAlea()
, even with your explanation above. The key problem is thatgetAlea()
creates an independent result, so (as a simple example)Lea.if_(d1 == d2, 'Gaol', (d1+d2).getAlea()).p(12)
is 5/216, rather than 0.As a result, if you're trying to trim the number of cases being calculated by using
getAlea()
you have to be very careful not to end up getting the wrong answer.You can get correct answers using
Lea.if_(d1 == d2, 'Gaol', (d1+d2).given(d1 != d2).getAlea())
but I can't work out (yet!) how to turn that into a usable shortcutting loop :-) -
repo owner Yes, you are on the right path to enlightenment :).
getAlea()
is something to use with caution because it breaks irreversibly any dependencies with other variables. Your examples are very good indeed. ThegetAlea()
method historically results from many design refactoring ; also, initially, I thought it was useful only as an internal "protected" method. Admittedly, this name is far to be self-explanatory. Better names could besolve
,fix
,settle
, ... -
reporter I'm now thinking that the problem is actually one of optimisation. Writing the "naive" function that builds the distribution without worrying about "rolling too many dice" is not that hard. But the O(6**nDice) time complexity is unacceptable.
Using a recursive function and
getAlea
is basically just trying to optimise the final calculation, but it gets into trouble when we (accidentally) callgetAlea
on something that isn't independent of other parts of the expression. I think (although I've not managed to write complete enough code to test this yet) that it's always safe to callgetAlea
on a subexpression that shares no variables with the rest of the main expression. So we could, in theory, write an optimiser that did this automatically.I'm trying to write a proof of concept of this. But I'm struggling at the moment to understand how the
Blea
class works, and in particular the role of the_ctxClea
attribute. This seems to be a reference to all of the variables used in theIlea
objects making up theBlea
- is that right? If so, I think it would be possible to look for individualIlea
children that used a subset of variables that weren't used elsewhere in theBlea
, and callgetAlea
on those to remove the dependency. I think I then need to fix up the_ctxClea
list to reflect that we no longer depend on those variables.Does that seem like a possible improvement to make? If so, am I right about how
_ctxClea
works? -
repo owner This is an original idea and, seemingly, a good idea! I'm a bit embarrassed because these are hard questions and I'm lacking time. I thought a bit on it today, I've several points in mind but I need more time to give sensible and understandable answers. In the meantime, I can try (!) to explain the story of Blea and
_ctxClea
(OK, you've already seen thatBlea.build
is, by far, the most tricky method of the package – well, it’s monstrous actually!).The job of Blea is to build conditional prob. table (CPT) and evaluate them under given conditions (or "evidences" or "assumptions" or …). These are used for Bayesian nets (BN) and for
Lea.if_
method that you know (which is a sort of minimal BN). It is build up from clauses <if C1 then X1>, <if C2 then X2>, etc. where Ci and Xi are Lea instances that could have interdependences. The evaluation algorithm browses the combination of values of independent variable (Alea instances) one by one, check for each combination which Ci is True, collects value-probability pairs of corresponding Xi and aggregates all this pairs. In many CPT, this produces the good results. However, if some variable Z is referred in some clause (say C1 or X1) but not in other clauses (say C2 or X2), then there is an imbalance in the probability weights and the results are wrong (see#9and#11– Leapp is just there as a syntactic sugar to replaceBlea.build
). Here comes_ctxClea
… It’s there just to fix the potential problem by gathering any such Z variable and be sure its probability weights are multiplied in all clauses. Now, that’s not yet the end of the story : a user doing complex BN was complaining about the very bad performance to build his BN. One of the reason was linked to this_ctxClea
thing, which was anyway superfluous in the way the BN was built. Then, I’ve added this optionalctxType
argument to let the caller have a lever to accelerate the BN building by removing the_ctxClea
treatment. Another lever was thecheck
argument that allows skipping expensive checks. OK, it’s a long story… let’s stop here!Now, if you want to really understand all this so as to develop your idea, I do advise you to have a look first in
Alea._genVPs
method; there, you can understand how Lea manages the dependencies between variables, thanks to Python generators (the "Statues" algorithm). It's short, well documented and I think that you'll find there the very idea of Lea!I'll try to give answers to your other points later.
-
repo owner Addendum: if you look in
Alea._genVPs
doc, please replace"(...) the method calls the _genVPs method implemented in Lea subclasses;"
which is wrong, by
"(...) the present Alea._genVPs method is called by the _genVPs methods implemented in other Lea subclasses; these methods are themselves called by Lea.new and, indirectly, by Lea.getAlea"
-
reporter I'm a bit embarrassed because these are hard questions and I'm lacking time
Not a problem. I'm very aware that I've been asking a load of questions in my enthusiasm. Your answers are much appreciated. I've been looking through the sources now and that's helping me a lot, so I'll continue doing that and try to keep the questions to a minimum :-)
What I'm hoping to do is to develop at least a prototype optimiser, and I'll see how well it works in practice. (The examples I have are all fairly complex game simulations, based on examples a friend used for a Monte Carlo simulator he wrote, so they are pretty good for stress-testing something like Lea - it's quite probable that some of the examples are too complex for any sort of analytical solution, but I don't intend to give up easily :-)) In the process of doing that I'll probably want to write some tests, so I may also work on putting together at least the basis of a test suite, which I know is one of your open issues. If I do, I'll likely use
tox
andpy.test
as that's the test framework I'm most comfortable with - would that be OK with you? -
repo owner Great. No problem at all!
-
repo owner - changed status to resolved
-
repo owner - changed status to closed
- Log in to comment
Well, the problem is not that easy to model. The first issue is that such special rules can lead to infinite distributions (imagine you roll pairs over and over!). Such infinite distributions cannot be handled by Lea, nor any package. Now, if you accept to put halting conditions, then this can be handled using a loop or a recursive function. For example, here is a function you can use for the Monopoly example, provided that you put a limit on number of throws:
Calling
recDice(3)
shall return the expected distribution if no more than 3 throws is allowed. The crux of the function is the statement withLea.if_(…)
, which takes the current dice result and adds, if the dice show the same value, the result of the same process removing one throw on the max (recursion); if the dice values differ, then the value added is 0.Admittedly, despite the conciseness, such function is not easy to craft! If you think it's helpful, I'll add this in the example page.