Restructuring the code (for Lea 4)

Issue #71 resolved
Paul Moore created an issue

One of the frustrations I find with Lea is the way the imports work. As someone who works heavily with packaging, maybe I’m over-sensitive to these matters, and I don’t think that in practical terms it matters that much, but I think the following changes would make things a lot cleaner.

  1. Make the various static methods standalone functions - there’s no benefit to them being static methods that I can see, and it makes it a lot harder to “explore” the API via things like dir(lea_instance) when faced with a long list of methods that are exposed (via the aliases in __init__.py) as standalone functions.
  2. Remove the imports of the various Lea subclasses in lea.py. As you note, it makes for nasty circular dependencies, as well as not being very discoverable (no-one thinks to scroll down to the bottom of the file looking for imports). There’s a problem with defining _lea_leaf_classes, but that could be fixed by making it an initially empty list at the top of lea.py, and adding the classes to it as they are defined in their respective modules.
  3. Put the call to set_prob_type, which sets the default, in __init__.py rather than lea.py, as people expect to find defaults there.
  4. Put the public names in each module (which includes any static methods refactored as functions) in an __all__ list, and re-export them in __init__.py using from .alea import *. This avoids needing to maintain an explicit list of aliases in __init__.py
  5. Make alternative constructors like lea.markov.Chain.from_matrix() into classmethods, and use the cls argument to create the resulting object rather than hard-coding Chain. This makes the methods subclass-friendly. (I can’t see why you’d subclass, but someone is bound to come up with a reason!) Also, remove the aliases chain_from_* and make the normal way of creating chains the classmethods (or the normal constructor, which frankly is the one I find most useful) - so c = Chain.from_matrix(...) rather than c = chain_from_matrix(). It’s no more to type, and a much more common pattern.

None of these are critical, but they feel like they would make the codebase more understandable.

I could try to do some of that refactoring, but I have no idea how to create a pull request in bitbucket, and in fact it looks like I don’t even have permission to do so. If it’s useful, I could create a fork of the project in github, and do the refactorings there. But I’m not sure how you would get those changes back into the repo here (although I guess pulling from github locally and merging is possible).

I feel like there are some further improvements that are more a matter of documenting things more precisely, but the docs are meant as a tutorial, and that affects how you present things, plus I’m not even sure how I would go about documenting things differently. At some point I’ll probably write up some “what I learned working with Lea” notes, and if you find them useful, you’re welcome to add them to the docs. Otherwise, I’ll avoid picking fault when I can’t offer any useful suggestions for improvement :-)

Comments (18)

  1. Pierre Denis repo owner
    • changed status to open

    You are for sure more knowledgeable the me on these topics, Paul. Also, as developer, I lack the point of view of Lea’s users. Your comments and propositions are therefore much welcome. Lea 4 is a good opportunity to get rid of these annoyances and odd constructs.

    The five points are very fine (and very convincing) for me. Let’s go for it!

    For the development, the best for me would be that you manage to create a pull request in Bitbucket. I have just a free account on Bitbucket but, in the past, I was able to merge pull requests from other users. Have you an account on Bitbucket (I’ve made searches with your name without success)? If it helps, I’m willing to add you as developer in Lea. Now, if things are becoming too difficult, I think I can do these evolutions myself, with your reviews.

    Anyway, I will create soon a branch dev_lea4, which will be fed also by other forthcoming developments (#72, #73 and maybe others).

    For the documentation, this is an intriguing point. I’m very sensitive to this subject. I’ve passed more time on documenting (especially tutorials) than on strict coding/testing. I’m willing to improve things, no doubt. Don’t hesitate to provide critics even if you have no clear suggestions. At least, I will be more aware…

  2. Paul Moore reporter

    I’ll see what I can do with a pull request - I did use Bitbucket previously (when it was Mercurial based) but I never really liked the interface and I once managed to delete the wrong repo (deleting the main project repo rather than my fork of it, which was really bad) so I removed everything of mine from here so I wouldn’t do that again!

    I have a clone of Lea locally, I’ll try working on that, maybe putting a copy on my github account so I can have a public branch. We can work on merging the changes once they are done.

    And thanks for the kind words - please don’t hesitate to say if you disagree with any of my suggestions, though - it’s your project and you should definitely set the style how you want.

  3. Pierre Denis repo owner

    Sad story about your lost repo!

    That's fine for the rest. We'll manage to do it, whatever the means.

    On my side, I've created a branch dev_lea4. I'll work first on removal of __getatttribute__ (#72).

  4. Paul Moore reporter

    Hi.

    I started having a look at the refactoring process, but I got stuck when I discovered that the existing tox.ini file doesn’t work for me, it was giving errors about failed imports. After a bit of digging, I think the problem is that the source is laid out in a way that confuses pytest - having all the project files in the top level directory is generally considered bad practice these days, as it means tests tend to import from the working code, not the installed version. I also noticed that you are still using distutils (which is getting removed from the standard library, and so will stop working very soon).

    As a result, I updated the packaging to use “modern” setuptools - with a pyproject.toml file and all of the project metadata specified in that, rather than in `setup.py`. I tried to make sure that I didn’t change anything in the resulting source distribution - the files included in that should be essentially the same as before.

    If you’re interested in this, the work is available on my fork of the project on github, at https://github.com/pfmoore/lea - branch packaging. I tried to set up a pull request on bitbucket, but it seems to require me to create my fork on bitbucket, and I’m afraid after my previous experiences I’m not willing to do that. The changes aren’t actually that complex - they are mostly just moving files around. In the worst case, if you can’t pick the changes up via git, I can simply send you the new files and describe the renamings for you to apply manually.

    Now that I have something that can run the test suite successfully, I’ll continue working on the refactoring. It will be based off the packaging branch, but if you don’t want to take that code I can see if I can modify the resulting changes to apply to the existing structure - I don’t want you to feel that I’m pressuring you into taking my packaging changes if you don’t want to!

    (I’ll probably also add some of my standard infrastructure to my github clone - continuous integration via github actions, and things like that. I’ll do that on an independent branch, though, as it probably won’t make much sense for a Bitbucket project - it’s just to make my life easier for testing etc. So don’t worry if there’s extra stuff on my version, I’ll make sure to prepare “clean” branches when I have something for you).

  5. Pierre Denis repo owner

    What you describe is fine for me. Lea 4 is the opportunity to fix such packaging issues and make it more standard (I presume that this addresses also some concerns raised in the long-standing #43).

    So far, looking on your github fork, things seems indeed easy to integrate semi-manually on my side. I will do that ASAP, mimicking your branch on a new "packaging" branch from dev_lea4.

    So, please continue on your way! We'll synchronize when needed :)

  6. Paul Moore reporter

    I’ve just created another branch in my fork on github named “exceptions”, which is based off the “packaging” branch. It moves the exceptions Lea.Error and Lea._FailedRandomMC into a separate file, so they are no longer attributes of the Lea class. This might help to remove some of the circular import issues, and it’s a cleaner structure anyway, reducing the footprint of the Lea class, which should improve modularity.

    Let me know what you think - I’m currently struggling with refactoring the static and class methods, and this might help.

    One other issue with static methods, while I’m here. There are a few functions like lea.joint() which is actually an alias for the non-static joint method of the Lea class. That’s a very strange way of defining this - lea.joint(a,b,c) treats a differently than the other parameters. I can’t actually see any examples in the wiki that use joint as a method of a Lea object, so is there any reason I’m not aware of why this can’t just be a normal function:

    def joint(*args):
        return Clea(*args)
    

    It’s actually just a different name for the Clea class constructor, but I’m not sure I want to go as far as setting joint = Clea or renaming the class…

  7. Pierre Denis repo owner

    I've reviewed your branch "exceptions". This looks very fine for me, especially if it helps removing some odd tricks linked to circular references. Don't hesitate to go in that direction since Lea 4 is the perfect opportunity to fix such oddities (even if it breaks full backward compatibility).

    For the joint topic, you are right. The expression a.joint(...) is essentially the same as lea.joint(a, ...) or lea.Clea(a, ...). If I remember well, my intention was to allow for both notations although I never promoted it! There is not much interest for the former flavor, excepting for method chaining or "fluent interface" style.

    This is for sure a bit tricky (based on Python's method model), so don't hesitate to streamline this! For me, the aliasing joint = Clea is perfectly sensible: it makes the translation of user's universe of discourse ("joint probability") into implementation concept (Clea, which indicates a Lea's subclass)-—sorry to be a bit pedantic here! :)

  8. Pierre Denis repo owner

    Paul, very sorry but I have provisionally deactivated Lea Bitbucket public issue tracker. As of this morning, hundreds of spam issues have been created by a user "Anonymous". This is very annoying. I'll try to find a solution... I'm not sure you'll get this message. You can reach me by e-mail: pie.denis@skynet.be

  9. Pierre Denis repo owner

    I've just reactivated the issue tracker (March 19, 2023), without the option "Anonymous" users can create issues".

    Let me know, whether it's OK for you.

    PS: I have deleted about 100 spam issues, one by one!

  10. Paul Moore reporter

    Hi, I’ve just replied by email, but I’m commenting here just to confirm that I can respond. I’m fine with you disabling anonymous reporting - that seems like a very sensible move!

  11. Pierre Denis repo owner

    Streamline initialization and import's sequence, change static methods to plain functions, change Markov chain constructors - WIP (refs #71)

    → <<cset 56d9418c73d4>>

  12. Pierre Denis repo owner

    OK. I’ve made a massive refactoring, fixing many things (#71, #72, #74) and removing support to Python anterior to v3.5 (#73). The package is clearly neater now! :)

    Almost all of your points (#71) are implemented, I think. I’ve however two remaining points that I cannot easily fix:

    1. I was not able to (re)move the import statements at the middle of lea.py, despite several trials.
    2. (minor) In __init__.py, all imported module names are part of the exported names. To get rid of them, I’ve put a del statement.

    I know that there are several ways to fix these oddities (see here, for example). However, I’ve preferred so far to leave these as-is instead of complexifying things or even, getting errors reported in my IDE (PyCharm).

    Any advice is welcome!

  13. Log in to comment