Using Python package TPOT in parallel on IBM i 7.3

Issue #52 resolved
Clemens Zauchner created an issue

I am using Python 3.6 on IBM i 7.3. The installation of TPOT (https://pypi.org/project/TPOT/) using pip3 install TPOT finished successfully.

Using the package with n_jobs = 1 works without any problem.

from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
                                                    train_size=0.75, test_size=0.25)

pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5,
                                    random_state=42, verbosity=2, n_jobs=1)
pipeline_optimizer.fit(X_train, y_train)
print(pipeline_optimizer.score(X_test, y_test))
pipeline_optimizer.export('tpot_exported_pipeline.py')

Note that there is a warning at the start:

Exception ignored in: <Finalize object, dead>
Traceback (most recent call last):
File "/QOpenSys/pkgs/lib/python3.6/multiprocessing/util.py", line 186, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/QOpenSys/pkgs/lib/python3.6/multiprocessing/synchronize.py", line 87, in _cleanup
sem_unlink(name)
FileNotFoundError: [Errno 2] No such file or directory

and at the end:

Exception ignored in: <Finalize object, dead>
Traceback (most recent call last):
File "/QOpenSys/pkgs/lib/python3.6/multiprocessing/util.py", line 186, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/QOpenSys/pkgs/lib/python3.6/multiprocessing/synchronize.py", line 87, in _cleanup
sem_unlink(name)
FileNotFoundError: [Errno 2] No such file or directory

If you want to optimise a pipeline in parallel, there is an error, which is attached to the issue. This is the code to reproduce the problem:

from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
                                                    train_size=0.75, test_size=0.25)

pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5,
                                    random_state=42, verbosity=2, n_jobs=-1)
pipeline_optimizer.fit(X_train, y_train)
print(pipeline_optimizer.score(X_test, y_test))
pipeline_optimizer.export('tpot_exported_pipeline.py')

Comments (14)

  1. Gavin Gan Zhang Account Deactivated

    Following is what I got on one i7.3 system. Looks like it works for me, of course with some wornings. From one warning, you may noticed that I am not using xgboost. Did you get this worning ? Or you already have xgboost working on i ? My experimental test on xgboost is that it does not works on i. I am still investigating it . Not sure whether the issue you hit is from xgboost or not.

    /qopensys/pkgs/lib/python3.6/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
      from numpy.core.umath_tests import inner1d
    Warning: xgboost.XGBClassifier is not available and will not be used by TPOT.
    Generation 1 - Current best internal CV score: 0.9844391821591708              
    Generation 2 - Current best internal CV score: 0.9859404912419489            
    Generation 3 - Current best internal CV score: 0.9859404912419489               
    Generation 4 - Current best internal CV score: 0.9859404912419489               
    Generation 5 - Current best internal CV score: 0.9859404912419489             
    
    Best pipeline: LogisticRegression(SelectPercentile(PolynomialFeatures(input_matrix, degree=2, include_bias=False, interaction_only=False), percentile=21), C=25.0, dual=False, penalty=l1)
    /qopensys/pkgs/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
      return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
    0.9888888888888889
    

    The code I am using here is with n_jobs=-1 or n_jobs=3. both works for me.

  2. Clemens Zauchner reporter

    That’s interesting. I have just tried it again and it still does not work.

    Which versions are you using?

    for me it’s:

    Version
    Python 3.6.8
    TPOT 0.10.1
    sklearn 0.20.3
    numpy 1.15.4
    scipy 1.1.0

    And no, I don’t use xgboost on IBMi.

  3. Gavin Gan Zhang Account Deactivated
    bash-4.4$ yum list installed |grep -i "python3\."
    python3.ppc64                 3.6.8-1       @ibm                                
    bash-4.4$ yum list installed |grep -i "scikit"
    python3-scikit-learn.ppc64    0.19.1-6      @ibm                                
    bash-4.4$ yum list installed |grep -i "numpy"
    python3-numpy.ppc64           1.15.4-0      @ibm                                
    bash-4.4$ yum list installed |grep -i "scipy"
    python3-scipy.ppc64           1.1.0-0       @ibm                                
    bash-4.4$ pip3 list|grep -i tpot
    TPOT               0.10.1  
    

    Here’s mine.

  4. Gavin Gan Zhang Account Deactivated

    BTW, how did you get sklearn 0.20.3 on i ? I noticed that version of RPM is 0.19.1 only.

  5. Clemens Zauchner reporter

    What’s the output of this in Python?

    import multiprocessing
    multiprocessing.cpu_count()
    
  6. Clemens Zauchner reporter

    It’s part of the Python standard library, it ships with Python. It’s also the root cause of the problem I am experiencing.

  7. Gavin Gan Zhang Account Deactivated

    Can you show me the output of following commands on your system?

    yum list installed |grep -i "python3"
    
    yum list installed |grep -i "scikit"  
    
  8. Clemens Zauchner reporter

    yum list installed |grep -i "python3"

    python3.ppc64 3.6.8-1 @ibm
    python3-Pillow.ppc64 5.0.0-4 @ibm
    python3-asn1crypto.noarch 0.24.0-0 @ibm
    python3-bcrypt.ppc64 3.1.4-5 @ibm
    python3-cffi.ppc64 1.11.5-2 @ibm
    python3-cryptography.ppc64 2.2.2-2 @ibm
    python3-cycler.noarch 0.10.0-0 @/python3-cycler-0.10.0-0.ibmi7.2.noarch
    python3-dateutil.noarch 2.7.5-0 @ibm
    python3-devel.ppc64 3.6.8-1 @ibm
    python3-ibm_db.ppc64 2.0.5.9-0 @ibm
    python3-idna.noarch 2.8-0 @ibm
    python3-itoolkit.ppc64 1.6.0-0 @ibm
    python3-kiwisolver.noarch 1.0.1-0 @/python3-kiwisolver-1.0.1-0.ibmi7.2.noarch
    python3-lxml.ppc64 4.2.1-3 @ibm
    python3-matplotlib.ppc64 3.0.2-0 @/python3-matplotlib-3.0.2-0.ibmi7.2.ppc64
    python3-numpy.ppc64 1.15.4-0 @ibm
    python3-pandas.ppc64 0.22.0-4 @ibm
    python3-pip.noarch 9.0.1-2 @ibm
    python3-pycparser.ppc64 2.19-1 @ibm
    python3-pynacl.ppc64 1.2.1-3 @ibm
    python3-pyparsing.noarch 2.3.1-0 @/python3-pyparsing-2.3.1-0.ibmi7.2.noarch
    python3-pyzmq.ppc64 17.1.2-0 @/python3-pyzmq-17.1.2-0.ibmi7.2.ppc64
    python3-rpm.ppc64 4.13.0.1-17 @ibm
    python3-scikit-learn.ppc64 0.19.1-6 @ibm
    python3-scipy.ppc64 1.1.0-0 @ibm
    python3-setuptools.noarch 36.0.1-2 @ibm
    python3-six.noarch 1.10.0-0 @ibm
    python3-tkinter.ppc64 3.6.8-1 @ibm
    python3-wheel.noarch 0.29.0-2 @ibm

    yum list installed |grep -i "scikit"

    python3-scikit-learn.ppc64 0.19.1-6 @ibm

  9. Gavin Gan Zhang Account Deactivated

    thanks for your share. I would guess you installed some multiprocess library on your side. Can you share me the output of “yum list all|grep mp”. thx.

  10. Clemens Zauchner reporter

    I have ‘downgraded’ and uninstalled the newer version of scikit learn, now it works for me as well.

    And no, I have not installed it on my side, as mentioned, the multiprocessing library is part of the python standard lib, see https://docs.python.org/3/library/

  11. Log in to comment