hu-geomatics / enmap-box / issues / #633 - Compare pkl files (models) — Bitbucket

Issue #633 resolved

Agustin Lobo created an issue 2021-03-08

I have 2 pkl files that I know are different because their sizes are different, but it is being hard to find the actual differences just by browsing in the EnMapBox GUI. Is there any way to compare them or at least print them to txt ?

Comments (9)

Andreas Janz
We plan to have a human-readable version of the PKL files in a future version (most likely as JSON or XML).

For now, you could restore both PKLs in python and compare the actual objects. But that may be to complicated, depending on the user.
- 2021-03-08T10:30:28+00:00

Andreas Janz

Could look something like that:

rf = RandomForestClassifier()
print(json.dumps(rf, default=lambda x: x.__dict__, indent=2))

{
    "base_estimator": {
        "criterion": "gini",
        "splitter": "best",
        "max_depth": null,
        "min_samples_split": 2,
        "min_samples_leaf": 1,
        "min_weight_fraction_leaf": 0.0,
        "max_features": null,
        "max_leaf_nodes": null,
        "random_state": null,
        "min_impurity_decrease": 0.0,
        "min_impurity_split": null,
        "class_weight": null,
        "ccp_alpha": 0.0
    },
    "n_estimators": 100,
    "estimator_params": [
        "criterion",
        "max_depth",
        "min_samples_split",
        "min_samples_leaf",
        "min_weight_fraction_leaf",
        "max_features",
        "max_leaf_nodes",
        "min_impurity_decrease",
        "min_impurity_split",
        "random_state",
        "ccp_alpha"
    ],
    "bootstrap": true,
    "oob_score": false,
    "n_jobs": null,
    "random_state": null,
    "verbose": 0,
    "warm_start": false,
    "class_weight": null,
    "max_samples": null,
    "criterion": "gini",
    "max_depth": null,
    "min_samples_split": 2,
    "min_samples_leaf": 1,
    "min_weight_fraction_leaf": 0.0,
    "max_features": "auto",
    "max_leaf_nodes": null,
    "min_impurity_decrease": 0.0,
    "min_impurity_split": null,
    "ccp_alpha": 0.0
}

‌

2021-03-08T11:43:56+00:00

Agustin Lobo reporter
Thanks, most useful

But, assuming a pkl has been loaded into the GUI, how would that model be printed?

Also, the qgis python console does not seem to be aware of EnMap Box libs:

rf = RandomForestClassifier()
Traceback (most recent call last):
File "C:\PROGRA~1\QGIS3~1.16\apps\Python37\lib\code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
NameError: name 'RandomForestClassifier' is not defined
- 2021-03-08T14:34:15+00:00

Andreas Janz

Try this for now (you get the filename of the PKL from the GUI: right click and “Copy Uri / Path”):

import json
import pickle
import numpy as np

filename = 'c:/vsimem/outClassifier.pkl'

def default(obj):
    if isinstance(obj, np.ndarray):
        return obj.tolist()
    elif hasattr(obj, '__dict__'):
        return obj.__dict__
    else:
        return str(obj)


with open(filename, 'rb') as file:
    obj = pickle.load(file)
print(json.dumps(obj, default=default, indent=2))

‌

2021-03-08T15:08:19+00:00

Agustin Lobo reporter
Worked, thanks. Could find the difference. But for the tool you are planning, please note that a file with > 11000 lines cannot be human-readable. The user needs a simpler file including his/her input search and the resulting parameters.
- 2021-03-08T16:22:33+00:00
Andreas Janz
the user needs a simpler file including his/her input search and the resulting parameters.

Can you elaborate a bit more. How would the user do that? So, given a PKL file, how would the user provide his input search? Not quite sure, what usecase you try to cover here. The code above would give you a complete JSON-Dump of all the PKL content. You can then use a normal editor to search for specific content.
- 2021-03-08T20:16:31+00:00
Andreas Janz
I would recommend to have the full JSON-Dump and open it in a powerful editor like Notepad++, where you can collapse blocks you’re not interested in.

‌
- 2021-03-10T10:03:28+00:00
Agustin Lobo reporter
ok, I’ll try and report back

‌
- 2021-03-10T11:22:06+00:00
Andreas Janz
- changed status to resolved
- 2021-09-22T13:21:29+00:00
Log in to comment

Assignee: –

Type: proposal

Priority: major

Status: resolved

Component: Processing

Milestone: –

Version: 3.7

Votes: 0

Watchers: 1

Jira: the preferred issue tracker for Bitbucket. Join the team!