All classes have 0 elements sampled in Classification Workflow

Issue #289 resolved
Agustin Lobo created an issue

We are trying to run the Classification Workflow with our own data. We load our shape file and select the labels column but then the sampling process selects 0 samples for every class, even if we have selected 100% sample size:

We have made 2 versions of the points data (attached zip files):

test.shp has one field for each point (“single parts”),

test2 has one field for each class (“multi part”).

In both cases we select the “level_1_id” field.

In both cases, the points are displayed over the image (envi format) in EnMapBox Map window.

Comments (16)

  1. Agustin Lobo reporter

    I have found that the problem was not having the same CRS information in the hdr file of the image than in the prj file of the shape.

    While the .prj states the info for epsg:25831, our hdr had:

    map info = {Arbitrary, 1, 1, 0, 0, 1, 1, 0, North}
    

    Despite being prompted 4 or 5 times for the CRS of the image by EnMapBox, entering epsg:25831 every time and getting a correct overlay of points in the Map display, it seems that the Classification process is not actually able to extract the image information for the points.

    The problem was solved by entering full CRS information in the hdr:

    map info = {UTM, 1.000, 1.000, 0, 0, 1, 1, 31, North, ETRS-89, units=Meters}
    coordinate system string = {PROJCS["ETRS89 / UTM zone 31N",
        GEOGCS["ETRS89",
            DATUM["European_Terrestrial_Reference_System_1989",
                SPHEROID["GRS 1980",6378137,298.257222101,
                    AUTHORITY["EPSG","7019"]],
                TOWGS84[0,0,0,0,0,0,0],
                AUTHORITY["EPSG","6258"]],
            PRIMEM["Greenwich",0,
                AUTHORITY["EPSG","8901"]],
            UNIT["degree",0.0174532925199433,
                AUTHORITY["EPSG","9122"]],
            AUTHORITY["EPSG","4258"]],
        PROJECTION["Transverse_Mercator"],
        PARAMETER["latitude_of_origin",0],
        PARAMETER["central_meridian",3],
        PARAMETER["scale_factor",0.9996],
        PARAMETER["false_easting",500000],
        PARAMETER["false_northing",0],
        UNIT["metre",1,
            AUTHORITY["EPSG","9001"]],
        AXIS["Easting",EAST],
        AXIS["Northing",NORTH],
        AUTHORITY["EPSG","25831"]]}
    

    Note that omitting the coordinate system string results on qgis reading the image as epsg:32631 (WGS-84 datum instead of ETRS-89)

    I do not know if requiring explicit CRS information in the image (and thus neglecting was has been entered by the user when prompted) is intentional or a bug

    In any case, what is really needed is a clear error message in case of disagreement between the CRS information of the image and that of the vector, i.e. “ERROR: CRS information of input layers do not agree” and best if the message is followed by both CRS infos as read by the classification process.

    I thus think that this issue can be closed (cannot find the way to close it myself) and I open a request for a clear error message.

  2. Benjamin Jakimow

    map info = {Arbitrary, 1, 1, 0, 0, 1, 1, 0, North}
    

    @Agustin Lobo as you already discovered, the QGIS API requires more than the map info tag to derive a valid CRS definition. For such cases you can specify a default CRS that is used for every case where the CRS is unknown (see #274)

    Regards the classification workflow I suggest to implement a clear error message just for the case that one of the inputs has an invalid CRS.

    If both CRS are valid, the vector reference should by warped on-the-fly into the raster CRS (can be easily done in memory).

    In case of artificial data sets (like that above) we might use a default projected metric CRS, e.g. EPSG:32626 (Atlantic)

  3. Agustin Lobo reporter

    Actually, I would segregate the extraction of information for the points to a previous step. The input for the classification process should be a table with the spectral values for each point, not the actual points shapefile. The user would take care of making that table before the actual classification workflow. This would have 2 advantages:

    1. Separate the eventual geometric problems from the actual classification process.
    2. Most important, let the user include information from different images (or even other sources) in the table, model and apply the model to the input image.

  4. Andreas Janz

    @Agustin Lobo the inclusion of samples from different images can be achieved using the SpectralLibrary View using the Import profiles from raster + vector sources option multiple times. Unfortunately, the Classification Workflow App is not yet able to use a spectral library as input.

    If you like, you can provide we with some testdata and I will implement your usecase.

  5. Agustin Lobo reporter

    Andreas,

    Any of the current test data would be enough. The point is being able to classify from training datasets that come from multiple images. I think the most straightforward would be an input in the form of a simple csv file with columns ID, class, band1, band2,band3…

    Perhaps you have reasons to prefer the spectral library to the csv file (maybe keeping an internal consistency across the package) and that would be fine with me (provided spectral libraries can be built in a non-interactive way from polygon or point vector files). But for sure that the current procedure (described in https://enmap-box.readthedocs.io/en/latest/usr_section/usr_cookbook/classification.html) in which the user must convert the training set from vector to raster (in a procedure with the confusing name of “Classification from Vectorraster”) is very inconvenient.

    Maybe we should move this discussion to a new ticket named “Allow for classification training sets to be built from multiple images”.

  6. Benjamin Jakimow

    +1

    I like the idea to train a classifiers based on CSV input, as this is probably more convenient with other machine learning frameworks. I’ll add an export function to the Spectral Library to export 1 selected label column + n columns for n spectral bands as *.csv

  7. Benjamin Jakimow

    @Agustin Lobo @Andreas Janz has already implemented several improvements to the Classification Workflow. It now shoud handle better projection differences and allows to use different inputs:

    • raster features + raster references
    • raster features + vector references. Aactually it’s required to use a vector layer with CategorizedSymbolRender from which the class info is derived.
    • Spectral Library with Categorited Symbol Renderer

    We also started to provide “developer” versions though the QGIS Plugin Repository. Got into the PLugin Manager Settings, activate “Show also experimental plugins”, than you can install the “experimental” developer/test versions:

  8. Andreas Janz

    @Agustin Lobo we have a new release v3.7 planned for end of October. I would recommend waiting.

    This release will allow to build training dataset from multiple images via the SpectralView.

    Regarding training via CSVs: if your format is compatible with the SpectralView import “CSV Table”, you can again train via the SpectralView.

    Does that makes sense?

  9. Log in to comment