Classification Workflow (Raster/Vector): Different Sample size than in previous versions

Issue #630 resolved
Agustin Lobo created an issue

In the Classification Workflow (Raster/Vector), we find that the sample size (with 100% selected) of the same vector is now larger than it used to be in previous versions (the one active in August). I suspect this is a change of criterion (or perhaps a bug fix) in the so-called “Classification from Vector” process. Could anyone let us know the actual change in the criterion?

Comments (14)

  1. Andreas Janz

    Yes, that is true. In previous versions we skipped all pixel with a polygon coverage less than 75%.

    In the current version we simply use GDAL burn logic, which selects all pixels, where the pixel center is covered by a polygon.

    Actually, we are not happy with that either. So in a future version, we will select the proper class by a majority voting.

    Take for example this pixel: in the old version, this pixel would be dropped, because the coverage is less than 75%. In the current version the pixel is labelled as tree (green class) because the tree polygon covers the center. In a future version the pixel will be labeled as street (grey class) because the street polygon covers most of the pixel.

    Such details should of course be covered in the docs. The future version will have a description similar to this: Rasterization is done by an Oversampling Majority Voting approach, that burns classes at x10 finer resolution, resulting in 100 classified subpixel used for the final majority vote.

  2. Agustin Lobo reporter

    In the case that the input is a vector of polygons, I think that the rasterization should offer the user the option of setting the % pixel covered to be considered as to label the whole pixel (in many cases I set 100% to avoid considering mixed pixels in the training set). In the case of points, it is more difficult (and I never use points as training set), perhaps voting but with a threshold for a majority (e.g., in case a pixel has 21% of 1, 35% of 2 and 44% of 3, you could leave the pixel unlabeled, but if you have 95% of 1 and 5% of 2 label as 1.

  3. Andreas Janz

    I will include the option “Minimum Pixel Coverage [%]” with default value of 0%.

    In the case of points, we simply use all the pixels that are covered by points.

  4. Agustin Lobo reporter

    I understood (Issue #631) that this process was going to be named “Rasterize categorized vector layer“ (I suggest “Rasterize labelled vector” as another alternative).

  5. Andreas Janz

    BTW - This algorithm, and all other algorithms that create classification datasets, are now available as a shortcut inside the Classification Workflow app and the Fit classifier algorithms:

  6. Log in to comment