The given method interacts weirdly with the times method
The following behaviour seems wrong.
>>> d6 = leaf.die()
>>> d6.given(d6==1)
1 : 1.0
>>> d6.given(d6==1).times(3)
3 : 0.004629629629629629
Note in particular that the total probability of the final expression is not equal to 1!
There is also:
>>> d6.times(3).given(d6==1)
3 : 0.0046296296296296285
4 : 0.013888888888888886
5 : 0.027777777777777773
6 : 0.04629629629629629
7 : 0.06944444444444443
8 : 0.09722222222222221
9 : 0.1157407407407407
10 : 0.12499999999999999
11 : 0.12499999999999999
12 : 0.11574074074074073
13 : 0.09722222222222222
14 : 0.06944444444444443
15 : 0.04629629629629629
16 : 0.027777777777777773
17 : 0.013888888888888886
18 : 0.0046296296296296285
In general, I don't think the times()
method is documented sufficiently to explain what is going on here. I've read the "Statues" paper and the source code, so I know this is logically something like
Condition(d6==1, Function(+, [d6, Function(+, [d6, d6])]))
(I'm ignoring the optimisation in the Flea2
class, although it may well be the cause of the unexpected behaviour...)
Which should (IMO) evaluate as 3: 100% because the d6==1 fixes the d6 RV for the expression.
Whereas d6.given(d6==1).times(3)
evaluates as
Function(+, [
Condition(d6==1, d6),
Function(+, [Condition(d6==1, d6), Condition(d6==1, d6)])
])
which should give the same result, but much less efficiently...
But the docs cover none of this. And the implementation seems to be broken.
As a side note, is there any possibility that you might consider moving Lea to github? It's entirely your choice, as it's your project, but the Bitbucket issue tracker seems to be completely broken on Firefox, as I can't raise issues with that browser. I've no idea why, or even if it's something at my end, but it makes interacting with the project rather frustrating :-(
Comments (15)
-
-
Update - on reflection, it’s obvious that the
d6.times(3).given(d6==1)` case wouldn't work, as `times has to clone its argument in order to do its job. So that's really only a documentation issue, to make that clearer.
Hmm, sorry about the formatting. Bitbucket on Firefox is screwing up again
-
repo owner - changed status to open
Indeed, for the first case
d6.given(d6==1).times(3)
returns an unnormalized distribution, which is admittedly weird (the present "probability" is actually equal to 1/6**3). Actually, this is due to the optional
times
argumentnormalization=False
. The default should beTrue
, definitely! This shall be corrected. In the meantime, the workaround is:>>> d6.given(d6==1).times(3, normalization=True) 3 : 1.0
About the second case, yes, you’ve well understood, the issue. What happens under the hood for evaluating this expression is:
((d6.times(3)).calc()).given(d6==1)
So, there is no binding possible on
d6
after the inner distribution has been evaluated.This is a recurring issue on lea: several functions that optimize calculations, break the referential consistency. The present case should indeed be better documented to avoid bad surprise.
Thanks for reporting these important issues. The code and doc will be fixed in next version.
-
repo owner -
assigned issue to
-
assigned issue to
-
repo owner BTW... Sad to hear all these annoyances with issue mgt on Bitbucket! :( I don't plan to migrate to GitHub, unless absolute necessity. The project already suffered two migrations (GoogleCode -> Bitbucket Hg> Bitbucket Git) and it's a bit painful for me.
Be sure that your contributions are highly appreciated, whatever the glitches you encounter!
-
Oh, that’s a lot simpler than I thought. It hadn’t occurred to me that it was simply unnormalised, I read the result and in my head I thought it meant that 99.6% of results were “something else” that wasn’t specified. Which of course makes no sense now that I think of it.
I wouldn’t have made that mistake with a result that was unnormalised because it had weights rather than fractions - I would have quite happily read “3: 12” as “3 occurring 12 times out of 12”. I suspect that says more about psychology than about maths or computing, though! I’ve been doing work with weighted results, so I’m used to seeing unnormalised PMF-like displays with weights, so I guess I’ve trained myself to see “integer = weight, fraction/decimal = probability”.
I’m glad it’s a simple fix - I tried reading the code to see if I could spot the problem, and I got lost very fast! The normalisation stuff is complicated.
-
Regarding referential consistency, it’s definitely a place where the feature is really powerful, but fitting it into Python syntax can be frustrating. Something like
(d1+d2+d3+d4+d5+d6).given((d1==d2) & (d2 == d3) & (d3 == d4) & (d4 == d5) & (d5 == d6))
really obscures the intention - particularly if you want an arbitraty number of dice. And you can’t usetimes()
as noted above - I wondered about having a way to access the generated RVs, but I couldn’t think of anything that wouldn’t be a horrible abuse of Python’s object model.Maybe using lots of helper functions might make a difference here. A friend has a bunch of probability problems that he’s been solving with a system of his own design. I’m thinking of working through them all with lea, and I think that might give me some ideas on useful higher level abstractions.
I don't plan to migrate to GitHub, unless absolute necessity.
No worries, whatever works for you makes sense. I can work around it.
-
repo owner About your penultimate comment, that’s an interesting analysis for sure. So, the present result, although weird, is not completely absurd. And this is confirmed indeed if you do this:
>>> P(d6.given(d6==1).times(3) == 3) 1.0
-
repo owner About your latest comment, some helper functions are probably still needed to help the expression of some problems. Note that for the precise present example you provide, I invite you to check out some recent features (lea 4.3.2). You can do this:
>>> (d1+d2+d3+d4+d5+d6).given(all_equal(d1,d2,d3,d4,d5,d6))
and even this:
>>> dice = (d1,d2,d3,d4,d5,d6) >>> sum(dice).given(all_equal(*dice))
Beside the convenience for writing, these functions should also accelerate the calculations.
-
repo owner Correct default value of normalization argument of Lea.times method (refs
#68)→ <<cset f25e2c8b85cb>>
-
Nice! And I like that these are static methods, so they are things that could be written by a 3rd party (even if you’d need to understand the underlying classes). One thing that I find it difficult to work out from just reading the documentation, is how to distinguish between what is “core Lea” (the stuff that has to be part of the library) and what is “helpers” (things like the functions in
leaf
, for example). There’s a rather large “grey area” with things liketimes
, which is in theory just a helper, but in practice has the nice “Russian Peasant” divide and conquer algorithm. -
repo owner Improve doc of Lea.times + TU (refs
#68)→ <<cset abe7955397b1>>
-
repo owner Indeed, the core functions are somehow mixed with helper ones, and the frontier is not always clear. I've made MicroLea some years ago, with the goal of showing just the core Statues algorithm, as support to my paper. I doubt however that this can help you here because the names of MicroLea's methods are different from Lea's.
-
repo owner For the rest, I've corrected the last problems and just published Lea 3.4.3.
Also, I've added some warning in the docstring of
times
and on the wiki page Tutoral 1/3 ( at the end of the section ontimes
and at the end of the section ongiven
).Don't hesitate to let me know if you judge that this documentation can be improved.
-
repo owner - changed status to resolved
Fixed in Lea 3.4.3.
- Log in to comment
Grr, on Edge, it looks like Bitbucket won’t recognise that I’m logged in. It’s me again