Removing Lea.__getattribute__ (for Lea 4)

Issue #72 resolved
Pierre Denis repo owner created an issue

As discussed in #70, the Lea.__getattribute__ method entail performance issues for several calculation. It is proposed to remove this method and to adapt the rest of the code accordingly. This method is not part at all of Lea’s core algorithm. It brings just some syntactic sugar for two independent concerns:

  1. provide access to indicators methods (e.g. mean, var, std, entropy, etc ) without the need of putting parentheses (x.mean instead of x.mean(), etc.),
  2. easily build prob. distributions based on attributes of objects put as support of a prob. distribution. See for example the expression character.race here.

In the two cases, the method calls are hidden for the end-user, who is probably not aware of the calculations that take place.

Now, the added values of these constructs appear disputable, especially if they have noticeable side-effects on performance. So, it is sensible to remove this feature. The adaptation for the end-users of Lea are quite simple:

  1. put explicit call parentheses for indicator methods. Here is the full list: P, Pf, mean, mean_f, var, var_f, std, std_f, mode, entropy, rel_entropy, redundancy, information, support, p_sum, ps, pmf_tuple, pmf_dict, cdf_tuple, cdf_dict;
  2. replace implicit accesses to “user attributes” by explicit call to a new method (to be added), e.g. character.get_attr("race").

Since these changes are not backward compatible with Lea 3, this evolution should be done in a major version (Lea 4). This requires of course several updates in tutorial pages.

Note that a quick proof-of-concept test on the example provided by Paul Moore (#70) shows that the removal of this Lea.__getattribute__ method provides a significant gain in performance. It is expected to bring a global positive effect for many Lea expressions.

Comments (13)

  1. Paul Moore

    One thing that occurred to me - why are you using __getattribute__ here rather than __getattr__? The latter is only triggered on access to attributes/methods that don’t exist, which makes it a much better fit for the use case of exposing the underlying object’s attributes and methods. Conversely, __getattribute__ is called for every access, meaning that you have to take special account of when you’re accessing your own attributes.

    I don’t recall ever seeing __getattribute__ used in real-world code, whereas __getattr__ is relatively common.

    I’d suggest that it’s worth considering __getattr__ before completely removing the existing syntax sugar (which is convenient, and worth keeping if it can be done without excessive overhead).

  2. Pierre Denis reporter

    A very sensible comment, for sure!

    I was not aware of the __getattr__ method. Python is always richer than I could have imagined!

    So, the evolution that I've made is maybe too radical. Indeed, it becomes possible to have the best of both worlds. So I can rework a bit my last changes to keep evolution on point 1 (no implicit method calls), while rolling back the evolution on point 2 using __getattr__.

    If I've well understood your suggestion, I can do that very quickly on a new dev_remove_get_attribute2 branch, then (if OK) merging on dev_lea4.

  3. Pierre Denis reporter

    Done on dev_remove_get_attribute2 branch. This works as expected. I note just a marginal slow down on your "Markov 240" test. Here are the approximate figures on my PC:

    ~3'40'' on initial version Lea 3.4
    ~1'08'' on prev version Lea 4, without __getattr__ nor __getattribute__
    ~1'15'' on present version Lea 4, with just __getattr__.
    

    In my opinion, the tradeoff useability/performance for the latest version is acceptable.

  4. Pierre Denis reporter

    OK. We definitely agree on the conclusion!

    As a side note, note that I've repeated the performance tests several times today and I'm almost sure now that there is a small overhead when putting the __getattr__ method, even if it is never called (and I've checked it isn't in the present prime_prob.py test). I presume that the Python engine (Python 3.9 in my case) executes few opcodes to test the existence of this method. OK, this is just for completeness!

  5. Log in to comment