Compare pkl files (models)
I have 2 pkl files that I know are different because their sizes are different, but it is being hard to find the actual differences just by browsing in the EnMapBox GUI. Is there any way to compare them or at least print them to txt ?
Comments (9)
-
-
Could look something like that:
rf = RandomForestClassifier() print(json.dumps(rf, default=lambda x: x.__dict__, indent=2))
{ "base_estimator": { "criterion": "gini", "splitter": "best", "max_depth": null, "min_samples_split": 2, "min_samples_leaf": 1, "min_weight_fraction_leaf": 0.0, "max_features": null, "max_leaf_nodes": null, "random_state": null, "min_impurity_decrease": 0.0, "min_impurity_split": null, "class_weight": null, "ccp_alpha": 0.0 }, "n_estimators": 100, "estimator_params": [ "criterion", "max_depth", "min_samples_split", "min_samples_leaf", "min_weight_fraction_leaf", "max_features", "max_leaf_nodes", "min_impurity_decrease", "min_impurity_split", "random_state", "ccp_alpha" ], "bootstrap": true, "oob_score": false, "n_jobs": null, "random_state": null, "verbose": 0, "warm_start": false, "class_weight": null, "max_samples": null, "criterion": "gini", "max_depth": null, "min_samples_split": 2, "min_samples_leaf": 1, "min_weight_fraction_leaf": 0.0, "max_features": "auto", "max_leaf_nodes": null, "min_impurity_decrease": 0.0, "min_impurity_split": null, "ccp_alpha": 0.0 }
-
reporter Thanks, most useful
But, assuming a pkl has been loaded into the GUI, how would that model be printed?
Also, the qgis python console does not seem to be aware of EnMap Box libs:
rf = RandomForestClassifier()
Traceback (most recent call last):
File "C:\PROGRA~1\QGIS3~1.16\apps\Python37\lib\code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
NameError: name 'RandomForestClassifier' is not defined
-
Try this for now (you get the filename of the PKL from the GUI: right click and “Copy Uri / Path”):
import json import pickle import numpy as np filename = 'c:/vsimem/outClassifier.pkl' def default(obj): if isinstance(obj, np.ndarray): return obj.tolist() elif hasattr(obj, '__dict__'): return obj.__dict__ else: return str(obj) with open(filename, 'rb') as file: obj = pickle.load(file) print(json.dumps(obj, default=default, indent=2))
-
reporter Worked, thanks. Could find the difference. But for the tool you are planning, please note that a file with > 11000 lines cannot be human-readable. The user needs a simpler file including his/her input search and the resulting parameters.
-
the user needs a simpler file including his/her input search and the resulting parameters.
Can you elaborate a bit more. How would the user do that? So, given a PKL file, how would the user provide his input search? Not quite sure, what usecase you try to cover here. The code above would give you a complete JSON-Dump of all the PKL content. You can then use a normal editor to search for specific content.
-
I would recommend to have the full JSON-Dump and open it in a powerful editor like Notepad++, where you can collapse blocks you’re not interested in.
-
reporter ok, I’ll try and report back
-
- changed status to resolved
- Log in to comment
We plan to have a human-readable version of the PKL files in a future version (most likely as JSON or XML).
For now, you could restore both PKLs in python and compare the actual objects. But that may be to complicated, depending on the user.