- edited description
Removing Lea.__getattribute__ (for Lea 4)
As discussed in #70, the Lea.__getattribute__
method entail performance issues for several calculation. It is proposed to remove this method and to adapt the rest of the code accordingly. This method is not part at all of Lea’s core algorithm. It brings just some syntactic sugar for two independent concerns:
- provide access to indicators methods (e.g. mean, var, std, entropy, etc ) without the need of putting parentheses (
x.mean
instead ofx.mean()
, etc.), - easily build prob. distributions based on attributes of objects put as support of a prob. distribution. See for example the expression
character.race
here.
In the two cases, the method calls are hidden for the end-user, who is probably not aware of the calculations that take place.
Now, the added values of these constructs appear disputable, especially if they have noticeable side-effects on performance. So, it is sensible to remove this feature. The adaptation for the end-users of Lea are quite simple:
- put explicit call parentheses for indicator methods. Here is the full list:
P
,Pf
,mean
,mean_f
,var
,var_f
,std
,std_f
,mode
,entropy
,rel_entropy
,redundancy
,information
,support
,p_sum
,ps
,pmf_tuple
,pmf_dict
,cdf_tuple
,cdf_dict
; - replace implicit accesses to “user attributes” by explicit call to a new method (to be added), e.g.
character.get_attr("race")
.
Since these changes are not backward compatible with Lea 3, this evolution should be done in a major version (Lea 4). This requires of course several updates in tutorial pages.
Note that a quick proof-of-concept test on the example provided by Paul Moore (#70) shows that the removal of this Lea.__getattribute__
method provides a significant gain in performance. It is expected to bring a global positive effect for many Lea expressions.
Comments (13)
-
reporter -
reporter - edited description
-
reporter Remove Lea.getattribute method (refs
#72)→ <<cset 2baa91b51d9a>>
-
reporter Improve comments (refs
#72)→ <<cset 77222c0bf48b>>
-
reporter - changed status to open
-
reporter -
assigned issue to
-
assigned issue to
-
One thing that occurred to me - why are you using
__getattribute__
here rather than__getattr__
? The latter is only triggered on access to attributes/methods that don’t exist, which makes it a much better fit for the use case of exposing the underlying object’s attributes and methods. Conversely,__getattribute__
is called for every access, meaning that you have to take special account of when you’re accessing your own attributes.I don’t recall ever seeing
__getattribute__
used in real-world code, whereas__getattr__
is relatively common.I’d suggest that it’s worth considering
__getattr__
before completely removing the existing syntax sugar (which is convenient, and worth keeping if it can be done without excessive overhead). -
reporter A very sensible comment, for sure!
I was not aware of the
__getattr__
method. Python is always richer than I could have imagined!So, the evolution that I've made is maybe too radical. Indeed, it becomes possible to have the best of both worlds. So I can rework a bit my last changes to keep evolution on point 1 (no implicit method calls), while rolling back the evolution on point 2 using
__getattr__
.If I've well understood your suggestion, I can do that very quickly on a new dev_remove_get_attribute2 branch, then (if OK) merging on dev_lea4.
-
reporter Put Lea.getattr method for retrieving previous behaviour without performance loss (refs
#72)→ <<cset 3d7fa47c00b2>>
-
reporter Done on dev_remove_get_attribute2 branch. This works as expected. I note just a marginal slow down on your "Markov 240" test. Here are the approximate figures on my PC:
~3'40'' on initial version Lea 3.4 ~1'08'' on prev version Lea 4, without __getattr__ nor __getattribute__ ~1'15'' on present version Lea 4, with just __getattr__.
In my opinion, the tradeoff useability/performance for the latest version is acceptable.
-
Agreed. The difference between 1’08 and 1’15 is noise IMO. Both far better than 3’40.
-
reporter OK. We definitely agree on the conclusion!
As a side note, note that I've repeated the performance tests several times today and I'm almost sure now that there is a small overhead when putting the
__getattr__
method, even if it is never called (and I've checked it isn't in the present prime_prob.py test). I presume that the Python engine (Python 3.9 in my case) executes few opcodes to test the existence of this method. OK, this is just for completeness! -
reporter - changed status to resolved
- Log in to comment