Save processing information for reproducible research

Issue #626 resolved

Agustin Lobo created an issue 2021-02-19

It would be important that all information introduced in a panel (i.e. Classification Workflow) could be saved to a file so that it could be kept along with the output for future reference. Actually, the best would be saving the actual python script that is run.

Comments (25)

Agustin Lobo reporter
- changed title to Save processing information for reproducible research
- 2021-02-19T10:37:45+00:00
Andreas Janz
Yes, we plan to do something like that. For each algorithm, we want to log a console command, that the user can use to re-execute the algorithm. It’s pretty much like GDAL algorithms do it:

‌
- 2021-02-19T11:22:07+00:00
Agustin Lobo reporter
Great!, for which version is this planned?

This will also be very useful for making scripts.
- 2021-02-19T12:12:53+00:00
Agustin Lobo reporter
Anyway, it would be better to record all the info from the input file (e.g., input and output files, the actual process, selected options…) to the *acass.html file
- 2021-02-20T10:52:20+00:00
Andreas Janz
Great!, for which version is this planned?

Mayby v3.8 in April 2021 or perhaps in v.3.9 in November.
- 2021-02-22T07:34:30+00:00
Andreas Janz
Anyway, it would be better to record all the info from the input file (e.g., input and output files, the actual process, selected options…) to the *acass.html file

I would rather not convolute the Accuracy Assessment Report with such information. I think the user will need to keep track of it.
- 2021-02-22T07:37:17+00:00
Andreas Janz
I discussed this with @Benjamin Jakimow and we decided to implement a log-writer that redirects all the information from the processing algorithm gui log widget into a file that will be placed next to the results on disk. E.g. all this goes to a *.log file:

‌
- 2021-02-22T14:44:41+00:00
Benjamin Jakimow
It should’nt be redirection, just a printout of the related Log into a text file.
- 2021-02-22T15:07:29+00:00
Agustin Lobo reporter
This is very good for saving all the necessary information. It will not be as useful as an aid for writing scripts. My ideal would be saving the actual python script that actually executed the task, but perhaps this is not possible or it is too costly (do not know how things are done behind the scenes).

‌
- 2021-02-22T15:44:19+00:00
Benjamin Jakimow
We want to do both,
(i) saving the log messages that are generated when running a QgsProcessingAlgorithm. We like to store them along with its outputs. And
(ii) implement the QgsProcessingAlgorithm::asPythonCommand() method. It returns the python code that shows how to run the algorithm with the current parameterization. This one is used e.g. by the QGIS model builder, that allows you to store entire processing models als python script.
- 2021-02-22T15:52:07+00:00
Agustin Lobo reporter
Wonderful then!

‌
- 2021-02-22T16:01:41+00:00
Andreas Janz
It should’nt be redirection, just a printout of the related Log into a text file.

That’s what I meant. I will print/redirect everything that goes to feedback.pushInfo also into the log file.
- 2021-02-22T16:05:59+00:00
Andreas Janz
Ok, quick update.
1. Algorithms will log a command that can be executed inside the QGIS Python Console, to reproduce the processing:
2. Informations from the log will be written to a log-file located next to the “main-result”:

‌
- 2021-02-24T08:57:28+00:00
Andreas Janz
Existing algorithms need to be updated to reflect that changes. Will keep this issue open until that's done.
- 2021-02-24T08:59:42+00:00
Agustin Lobo reporter
I understand such python command must be run from within the qgis python console, am I wrong?

Is it possible to run the python script within a regular python command console (outside qgis)? What libraries should be imported?
- 2021-02-24T13:42:34+00:00

Andreas Janz

Running it from within QGIS python console ensures the correct environment setup.

Running it without QGIS GUI is definetely possible, but setting up everything correctly depends on your environment (OSGeo4W console, conda console, PyCharm, etc.). If that is taken care of, something like that should work:

import processing
from qgis.core import QgsRasterLayer, QgsVectorLayer
from enmapbox.testing import start_app

qgsApp = start_app()
result = eval("processing.run('enmapbox:FitRandomforestclassifier', dict(raster=QgsRasterLayer(r'C:/source/enmap-box-testdata/enmapboxtestdata/enmap_berlin.bsq'), classification=QgsVectorLayer(r'C:/source/enmap-box-testdata/enmapboxtestdata/landcover_berlin_polygon.shp'), outclassifier='c:/vsimem/classifier.pkl'))")
print(result)

Prints:
{'outclassifier': 'c:/vsimem/classifier.pkl'}

2021-02-24T13:58:51+00:00

Benjamin Jakimow
fixed processing provider flag in metadata.txt addresses ~~#614~~ addresses ~~#626~~

Signed-off-by: Benjamin Jakimow benjamin.jakimow@geo.hu-berlin.de benjamin.jakimow@geo.hu-berlin.de

→ <<cset 9a3b7080ed3b>>
- 2021-02-24T15:39:27+00:00
Andreas Janz
Actually, the eval statement above is a bit overcomplicated, you can just use:
result = processing.run('enmapbox:FitRandomforestclassifier', dict(raster=QgsRasterLayer(r'C:/source/enmap-box-testdata/enmapboxtestdata/enmap_berlin.bsq'), classification=QgsVectorLayer(r'C:/source/enmap-box-testdata/enmapboxtestdata/landcover_berlin_polygon.shp'), outclassifier='c:/vsimem/classifier.pkl'))
- 2021-03-17T16:55:51+00:00
Andreas Janz
- changed status to resolved
- 2021-03-17T16:56:19+00:00

Benjamin Jakimow

File paths can be defined as string and do not require to create a QgsMapLayer instance first.

parameters = dict(
  raster=r'/enmap_berlin.bsq',
  classification=r'/landcover_berlin_polygon.shp',
  outclassifier='/classifier.pkl'
)
result = processing.run('enmapbox:FitRandomforestclassifier', parameters)

‌

2021-03-17T18:23:32+00:00

Andreas Janz
@Benjamin Jakimow , in theory yes, but actually, there seams to be a bug when using filepathes only: processing.run won’t load the default style defined in the *.qml sidecar file. Instead you get an single band gray renderer. This is a big problem, especially when we expect categorized rasters (aka Classification) as input.
So, for now, I would rather keep it that way.

If the actual renderer is not important, you can skip the class contructors.
- 2021-03-17T18:43:48+00:00
Andreas Janz
@Benjamin Jakimow feel free to create an issue in the QGIS repo and report back if fixed.
- 2021-03-17T18:45:34+00:00
Benjamin Jakimow
If this does not work with plain filepaths, users cannot use this algorithms from the CLI.
This affects the classification vector source only, right? Does the algorithm's prepare(…) return False and an error message in that case (if yes, it will be shown on the CLI).
- 2021-03-17T18:59:03+00:00

Benjamin Jakimow

@Andreas Janz you just need to load the default style to get the QgsCategorizedSymbolRenderer

Seems that QgsProcessingAlgorithm::parameterAsVectorLayer creates vector layer from text input without loading the default style.

from enmapboxtestdata import landcover_polygons

from qgis.core import QgsVectorLayer, QgsCategorizedSymbolRenderer

options = QgsVectorLayer.LayerOptions(loadDefaultStyle=False)
lyr = QgsVectorLayer(landcover_polygons, 'my vector layer', options=options)
print(lyr.renderer())
assert not isinstance(lyr.renderer(), QgsCategorizedSymbolRenderer)
lyr.loadDefaultStyle()
assert isinstance(lyr.renderer(), QgsCategorizedSymbolRenderer)
print(lyr.renderer())

‌

2021-03-17T19:07:52+00:00

Andreas Janz

Simple filenames do work now as layer input! I overloaded the parameterAs*Layer methods to take care of the default style, e.g.

    def parameterAsVectorLayer(
            self, parameters: Dict[str, Any], name: str, context: QgsProcessingContext
    ) -> Optional[QgsVectorLayer]:
        layer = super().parameterAsVectorLayer(parameters, name, context)
        if isinstance(layer, QgsVectorLayer) and isinstance(parameters[name], str):
            layer.loadDefaultStyle()
        return layer

‌

2021-03-18T06:32:01+00:00

Assignee: –

Type: enhancement

Priority: major

Status: resolved

Component: Processing

Milestone: –

Version: 3.7

Votes: 0

Watchers: 1