Compare pkl files (models)

Issue #633 resolved
Agustin Lobo created an issue

I have 2 pkl files that I know are different because their sizes are different, but it is being hard to find the actual differences just by browsing in the EnMapBox GUI. Is there any way to compare them or at least print them to txt ?

Comments (9)

  1. Andreas Janz

    We plan to have a human-readable version of the PKL files in a future version (most likely as JSON or XML).

    For now, you could restore both PKLs in python and compare the actual objects. But that may be to complicated, depending on the user.

  2. Andreas Janz

    Could look something like that:

    rf = RandomForestClassifier()
    print(json.dumps(rf, default=lambda x: x.__dict__, indent=2))
    
    {
        "base_estimator": {
            "criterion": "gini",
            "splitter": "best",
            "max_depth": null,
            "min_samples_split": 2,
            "min_samples_leaf": 1,
            "min_weight_fraction_leaf": 0.0,
            "max_features": null,
            "max_leaf_nodes": null,
            "random_state": null,
            "min_impurity_decrease": 0.0,
            "min_impurity_split": null,
            "class_weight": null,
            "ccp_alpha": 0.0
        },
        "n_estimators": 100,
        "estimator_params": [
            "criterion",
            "max_depth",
            "min_samples_split",
            "min_samples_leaf",
            "min_weight_fraction_leaf",
            "max_features",
            "max_leaf_nodes",
            "min_impurity_decrease",
            "min_impurity_split",
            "random_state",
            "ccp_alpha"
        ],
        "bootstrap": true,
        "oob_score": false,
        "n_jobs": null,
        "random_state": null,
        "verbose": 0,
        "warm_start": false,
        "class_weight": null,
        "max_samples": null,
        "criterion": "gini",
        "max_depth": null,
        "min_samples_split": 2,
        "min_samples_leaf": 1,
        "min_weight_fraction_leaf": 0.0,
        "max_features": "auto",
        "max_leaf_nodes": null,
        "min_impurity_decrease": 0.0,
        "min_impurity_split": null,
        "ccp_alpha": 0.0
    }
    

  3. Agustin Lobo reporter

    Thanks, most useful

    But, assuming a pkl has been loaded into the GUI, how would that model be printed?

    Also, the qgis python console does not seem to be aware of EnMap Box libs:

    rf = RandomForestClassifier()
    Traceback (most recent call last):
    File "C:\PROGRA~1\QGIS3~1.16\apps\Python37\lib\code.py", line 90, in runcode
    exec(code, self.locals)
    File "<input>", line 1, in <module>
    NameError: name 'RandomForestClassifier' is not defined

  4. Andreas Janz

    Try this for now (you get the filename of the PKL from the GUI: right click and “Copy Uri / Path”):

    import json
    import pickle
    import numpy as np
    
    filename = 'c:/vsimem/outClassifier.pkl'
    
    def default(obj):
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        elif hasattr(obj, '__dict__'):
            return obj.__dict__
        else:
            return str(obj)
    
    
    with open(filename, 'rb') as file:
        obj = pickle.load(file)
    print(json.dumps(obj, default=default, indent=2))
    

  5. Agustin Lobo reporter

    Worked, thanks. Could find the difference. But for the tool you are planning, please note that a file with > 11000 lines cannot be human-readable. The user needs a simpler file including his/her input search and the resulting parameters.

  6. Andreas Janz

    the user needs a simpler file including his/her input search and the resulting parameters.

    Can you elaborate a bit more. How would the user do that? So, given a PKL file, how would the user provide his input search? Not quite sure, what usecase you try to cover here. The code above would give you a complete JSON-Dump of all the PKL content. You can then use a normal editor to search for specific content.

  7. Andreas Janz

    I would recommend to have the full JSON-Dump and open it in a powerful editor like Notepad++, where you can collapse blocks you’re not interested in.

  8. Log in to comment