Save processing information for reproducible research

Issue #626 resolved
Agustin Lobo created an issue

It would be important that all information introduced in a panel (i.e. Classification Workflow) could be saved to a file so that it could be kept along with the output for future reference. Actually, the best would be saving the actual python script that is run.

Comments (25)

  1. Andreas Janz

    Yes, we plan to do something like that. For each algorithm, we want to log a console command, that the user can use to re-execute the algorithm. It’s pretty much like GDAL algorithms do it:

  2. Agustin Lobo reporter

    Great!, for which version is this planned?

    This will also be very useful for making scripts.

  3. Agustin Lobo reporter

    Anyway, it would be better to record all the info from the input file (e.g., input and output files, the actual process, selected options…) to the *acass.html file

  4. Andreas Janz

    Great!, for which version is this planned?

    Mayby v3.8 in April 2021 or perhaps in v.3.9 in November.

  5. Andreas Janz

    Anyway, it would be better to record all the info from the input file (e.g., input and output files, the actual process, selected options…) to the *acass.html file

    I would rather not convolute the Accuracy Assessment Report with such information. I think the user will need to keep track of it.

  6. Andreas Janz

    I discussed this with @Benjamin Jakimow and we decided to implement a log-writer that redirects all the information from the processing algorithm gui log widget into a file that will be placed next to the results on disk. E.g. all this goes to a *.log file:

  7. Agustin Lobo reporter

    This is very good for saving all the necessary information. It will not be as useful as an aid for writing scripts. My ideal would be saving the actual python script that actually executed the task, but perhaps this is not possible or it is too costly (do not know how things are done behind the scenes).

  8. Benjamin Jakimow

    We want to do both,
    (i) saving the log messages that are generated when running a QgsProcessingAlgorithm. We like to store them along with its outputs. And
    (ii) implement the QgsProcessingAlgorithm::asPythonCommand() method. It returns the python code that shows how to run the algorithm with the current parameterization. This one is used e.g. by the QGIS model builder, that allows you to store entire processing models als python script.

  9. Andreas Janz

    It should’nt be redirection, just a printout of the related Log into a text file.

    That’s what I meant. I will print/redirect everything that goes to feedback.pushInfo also into the log file.

  10. Andreas Janz

    Ok, quick update.

    1. Algorithms will log a command that can be executed inside the QGIS Python Console, to reproduce the processing:

    2. Informations from the log will be written to a log-file located next to the “main-result”:

  11. Andreas Janz

    Existing algorithms need to be updated to reflect that changes. Will keep this issue open until that's done.

  12. Agustin Lobo reporter

    I understand such python command must be run from within the qgis python console, am I wrong?

    Is it possible to run the python script within a regular python command console (outside qgis)? What libraries should be imported?

  13. Andreas Janz

    Running it from within QGIS python console ensures the correct environment setup.

    Running it without QGIS GUI is definetely possible, but setting up everything correctly depends on your environment (OSGeo4W console, conda console, PyCharm, etc.). If that is taken care of, something like that should work:

    import processing
    from qgis.core import QgsRasterLayer, QgsVectorLayer
    from enmapbox.testing import start_app
    
    qgsApp = start_app()
    result = eval("processing.run('enmapbox:FitRandomforestclassifier', dict(raster=QgsRasterLayer(r'C:/source/enmap-box-testdata/enmapboxtestdata/enmap_berlin.bsq'), classification=QgsVectorLayer(r'C:/source/enmap-box-testdata/enmapboxtestdata/landcover_berlin_polygon.shp'), outclassifier='c:/vsimem/classifier.pkl'))")
    print(result)
    

    Prints:
    {'outclassifier': 'c:/vsimem/classifier.pkl'}

  14. Andreas Janz

    Actually, the eval statement above is a bit overcomplicated, you can just use:
    result = processing.run('enmapbox:FitRandomforestclassifier', dict(raster=QgsRasterLayer(r'C:/source/enmap-box-testdata/enmapboxtestdata/enmap_berlin.bsq'), classification=QgsVectorLayer(r'C:/source/enmap-box-testdata/enmapboxtestdata/landcover_berlin_polygon.shp'), outclassifier='c:/vsimem/classifier.pkl'))

  15. Benjamin Jakimow

    File paths can be defined as string and do not require to create a QgsMapLayer instance first.

    parameters = dict(
      raster=r'/enmap_berlin.bsq',
      classification=r'/landcover_berlin_polygon.shp',
      outclassifier='/classifier.pkl'
    )
    result = processing.run('enmapbox:FitRandomforestclassifier', parameters)
    

  16. Andreas Janz

    @Benjamin Jakimow , in theory yes, but actually, there seams to be a bug when using filepathes only: processing.run won’t load the default style defined in the *.qml sidecar file. Instead you get an single band gray renderer. This is a big problem, especially when we expect categorized rasters (aka Classification) as input.
    So, for now, I would rather keep it that way.

    If the actual renderer is not important, you can skip the class contructors.

  17. Benjamin Jakimow

    If this does not work with plain filepaths, users cannot use this algorithms from the CLI.
    This affects the classification vector source only, right? Does the algorithm's prepare(…) return False and an error message in that case (if yes, it will be shown on the CLI).

  18. Benjamin Jakimow

    @Andreas Janz you just need to load the default style to get the QgsCategorizedSymbolRenderer

    Seems that QgsProcessingAlgorithm::parameterAsVectorLayer creates vector layer from text input without loading the default style.

    from enmapboxtestdata import landcover_polygons
    
    from qgis.core import QgsVectorLayer, QgsCategorizedSymbolRenderer
    
    options = QgsVectorLayer.LayerOptions(loadDefaultStyle=False)
    lyr = QgsVectorLayer(landcover_polygons, 'my vector layer', options=options)
    print(lyr.renderer())
    assert not isinstance(lyr.renderer(), QgsCategorizedSymbolRenderer)
    lyr.loadDefaultStyle()
    assert isinstance(lyr.renderer(), QgsCategorizedSymbolRenderer)
    print(lyr.renderer())
    

  19. Andreas Janz

    Simple filenames do work now as layer input! I overloaded the parameterAs*Layer methods to take care of the default style, e.g.

        def parameterAsVectorLayer(
                self, parameters: Dict[str, Any], name: str, context: QgsProcessingContext
        ) -> Optional[QgsVectorLayer]:
            layer = super().parameterAsVectorLayer(parameters, name, context)
            if isinstance(layer, QgsVectorLayer) and isinstance(parameters[name], str):
                layer.loadDefaultStyle()
            return layer
    

  20. Log in to comment