Update kuibit to 1.4.0

Issue #2696 resolved
Gabriele Bozzola created an issue

A new version of kuibit will be available by the time of the next release. The NEWS file contains a detailed list of improvements (https://github.com/Sbozzolo/kuibit/blob/next/NEWS.md).

Importantly, kuibit 1.4.0 will only support Python 3.8-3.11. Moving the minimum requirement to Python 3.8 is necessary to ensure compatibility with modern versions of NumPy. Python 3.8 was released at the end of 2019.

Comments (23)

  1. Roland Haas

    Python3.8 requirements will make kuibit fail (first one I checked) to run on stampede2 since it has no available Pyton3.8. Unfortunately for HPC clusters 2019 is “recent”. See:

    $ module avail python
    
    -------------------------------- /opt/apps/intel18/impi18_0/modulefiles --------------------------------
       python2/2.7.15 (L,D)    python3/3.7.0 (D)
    
    ------------------------------------ /opt/apps/intel18/modulefiles -------------------------------------
       python2/2.7.15    python3/3.7.0
    
    ---------------------------------------- /opt/apps/modulefiles -----------------------------------------
       python_cacher/1.0    python_cacher/1.2 (D)
    
      Where:
       L:  Module is loaded
       D:  Default Module
    

  2. Gabriele Bozzola reporter

    In that case, kuibit 1.3.6 will still work.

    Unfortunately, it is pretty much impossible to support at the same time Python 3.7 and Python 3.11 if I want to ensure that all the dependencies are satisfied and known to be compatible.

    I am acutely aware of how clusters are stuck back in time, and I tried hard to see if I could squeeze in at least Python 3.7. Given NEP29, that meant hard-coding a several versions of packages specifically for Python 3.7, and the result was that the dependency solver wouldn’t even convergence. (At the moment, the usage of features available only in Python>3.7 is either 0 or very minimal, so the main problem is with the dependencies.)

    We can discuss how to handle this, but form a technical point of view the only solution is to forgo any check on the dependency tree and cross fingers that things will install and work (as 80% of the Python world routinely does anyway).

  3. Roland Haas

    Having been poked to comment. Here’s my comment:

    Oh right, it still says "new version of kuibit". In the call Gabriele
    suggested to use the old version either in the ET in general (not so
    great) or add a warning for the affected machines (somewhat better).

    If I recall correctly then what will happen is that if one does say:

    pip install kuibit==1.4.0
    

    on a system without Python 3.8 loaded (not present, loaded) then pip
    will report that not suitable kuibit is found. Eg on Stampede2 right
    now:

    $ module load python3
    Lmod is automatically replacing "python2/2.7.15" with "python3/3.7.0".
    $ pip install kuibit==1.4.0
    Collecting kuibit==1.4.0
    Could not find a version that satisfies the requirement kuibit==1.4.0 (from versions: 1.0.0b0, 1.0.0, 1.1.0, 1.1.1, 1.2.0b0, 1.2.0, 1.2.1, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5, 1.3.6)
    No matching distribution found for kuibit==1.4.0
    

    while leaving out the version number will pick the newest available
    version.

    A workaround for the same command to work on all clusters is actually
    to use:

    $ pip install --upgrade 'kuibit<=1.4.0'
    

    which will pick the newest version possible (and update if possible).
    This is better than

    $ pip install --upgrade 'kuibit'
    

    since it avoids going to a newer version that may be released on
    PyPi after the a ET release.

    Let me add this as a discussion item, though really (my personal opinion) the authors should spend time on coming up with solutions. Part of having things
    included in the ET means that they must make some effort in keeping things working with the ET. If they want to do whatever they want then having it in a toolkit used by many and with a statement of supported clusters then they should not include it. "it works for me" is not good enough anymore, it has to work for others.

  4. Gabriele Bozzola reporter

    Let me add this as a discussion item, though really (my personal opinion) the authors should spend time on coming up with solutions. Part of having things
    included in the ET means that they must make some effort in keeping things working with the ET. If they want to do whatever they want then having it in a toolkit used by many and with a statement of supported clusters then they should not include it. "it works for me" is not good enough anymore, it has to work for others.

    Only two in the set {kuibit supports Python 3.6, kuibit supports Python 3.11, the dependency tree of kuibit is vetted and verified consistent and compatible}, so I decided to raise the minimum version required because depends heavily on the NumPy ecosystem, which has a tight support schedule and sometimes introduces breaking changes that ripple through all the downstream packages (for example, in 1.20, NumPy deprecated the names np.int, np.float and so on).

    I presented the community with the statement that the next version of kuibit will depend on Python>=3.8, and the problem I raised is “what do we want to do in this situation?”

    Among the options are:

    • We require Python 3.8 and build it when it is not available
    • We require Python 3.8 and claim the clusters that don’t have it “not supported”
    • We maintain both kuibit 1.3.X and 1.4.X
    • We require as minimum ET dependency Python 3.7 and reject this kuibit update.

    Note that the latest Fedora (one of the supported machines) defaults to Python 3.11, so kuibit 1.3.6 cannot be installed there.

    That’s why I feel this is more of a policy issue than a technological one (that I could solve myself).

  5. Samuel Cupp

    @Roland Haas @Gabriele Bozzola
    Hey. I wasn’t at the last ET meeting, but I didn’t see any discussion of this topic. Has a consensus been reached on what the plan is for Kuibit? I don’t want Cheng-Hsin to waste his time reviewing parts that may not be in the release, so I would appreciate getting a clear goal for what is being changed/updated in the ET version of kuibit for the release, if anything. Or, if it’s remaining in its current form, I can tell Cheng-Hsin that he’s off the hook for reviewing it.

  6. tootle
    Among the options are:
    
        1. We require Python 3.8 and build it when it is not available
        2. We require Python 3.8 and claim the clusters that dont have it not supported
        3. We maintain both kuibit 1.3.X and 1.4.X
        4. We require as minimum ET dependency Python 3.7 and reject this kuibit update.
    

    I'm not up to speed on the latest in Kuibit, but are there critical features that are necessary to run along side ETK? If not, I would suggest option two and have it just throw a warning during build. I had a similar issue with FUKA python readers where many people were trying to install FUKA for the solvers and the build procedure for the python readers would consistently fail due to dependency issues. I’ve since localized all the python bindings to a separate build location so that it’s now optional to build them if they are needed instead of people having to manually comment out the python bindings from the CMakeLists every time they try to compile on a system without the necessary python dependencies. Just my $0.02

  7. Gabriele Bozzola reporter

    I managed to add compatibility for Python 3.7 with no inconvenience for users.

    The next version of NumPy will drop support for Python 3.8 as well (but it will be required for Python 3.12), so I don’t know for how long I can hold on supporting 3.7 while also supporting newer versions.

  8. Roland Haas

    Nice. Thank you. I can see this is going to be a continuous struggle with versions of Python / numpy / whatnot. Hiding Python 3.8 in the GNU modules is a mean trick by TACC.

  9. Roland Haas

    Sleep until after release and revisit if there are any issues / clusters with old Python anymore where the newest kuibit version cannot be used.

  10. Roland Haas

    With 1.5 released should the version in ET be updated? What are minimum and maximum Python versions required?

  11. Gabriele Bozzola reporter

    kuibit 1.5.0 requires python 3.7 (3.8.1 for developlment) and works with any currently released version of Python

  12. Roland Haas

    Ok, so that bumps the minimum version from 3.6 to 3.7. Just taking a look at what I think may be the oldest cluster in ACCESS-CI (SDSC’s expanse), it has a module for python 3.8 and runs python3.6 as a system package. So should (TM) be fine, though one assumes one will need to install kuibit in a virtualenv and have pip install all dependencies while providing a correct say libhdf5 module.

  13. Roland Haas

    though, simple test to use that module so far is failing. With an illegal instruction error not less. Very odd. Ok, needed a different hash after the version name. Would be nice if SDSC gave some hints on this, but anyway. Seems to work ok with:

    module load cpu/0.17.3b gcc/10.2.^Cnpcyll4 python/3.8.12/7zdjza7
    python -m venv $PWD
    source bin/activate
    pip install kuibit==1.5.0
    

    and I can create a SimDir object for a simulation of mine (with HDF5 files and waveform extraction). Slow to create though, but that seems to be partially due to the file system.

  14. Gabriele Bozzola reporter

    kuibit’s discovery mechanisms can be slow on distributed filesystems when there are lots of small files (for ASCII files, kuibit opens every file and does regular expression matching to find what’s inside). I wrote some recommendations here

    https://sbozzolo.github.io/kuibit/faq.html#some-of-the-attributes-in-simdir-are-slow

    It is also a good idea to use HDF5 whenever possible (eg for waveform and horizon data too)

    Note also that Python 3.6 end of life was 2021. Numpy dropped support from Python 3.6 in 2020 (version 1.20).

  15. Roland Haas

    Sure. Note that the numpy ↔︎ Python compatibility issue does not directly arise with those clusters I would claim. The old clusters only providing old Python will provide an equally old (and compatible to Python) numpy. So the issue would be more whether kuibit can support both old and new numpy versions. From previous discussion about this my understanding is that supporting both numpy versions is (basically) impossible in kuibit. So faced with the choice of having to drop support for some version of numpy dropping support for the older one rather than dropping support for all newer versions makes sense.

    This has been an issue with old HPC clusters all the time, they just offer quite old software stacks.

  16. Log in to comment