Wiki

Clone wiki

enmap-box-idl / Application Tutorial - Regression Techniques

Introduction

The goal of this tutorial is to make you familiar with some important regression approaches which are implemented in the EnMAP-Box. These are Random Forests (imageRF), Support Vector Machines (imageSVM) and Partial Least Squares Regression (autoPLSR).

Data Preparation

  • Select File > Open > EnMAP-Box Test Images. In order to get an idea of the distribution of land cover types in the test images, take a look at the Image Statistics.
  • Select Tools > Image Statistics and select as Input Image the classification file ‘AF_LC’ and Accept.

The least represented class is named ‘soils & manmade’ (1), the three other classes are ‘water’ (2), ‘forest & natural vegetation’ (3) and ‘agriculture’ (4).

image001.png

In the next step you create a stratified random sample from the test image containing Leaf Area Index (LAI) values from all of these classes.

  • Select Tools > Random Sampling.
  • Choose the Input Image named ‘AF_LAI’ and check Stratification. Your Stratification image has to be ‘AF_LC’, then Accept.

image003.png

  • In the new dialog select Equalized Sampling and type in 50, creating a total sample of 200 pixels.
  • Define the Output path of your Random Sample, name it ‘sample_LAI’ and Accept.

image005.png

Your sample will appear in the Image List. These 200 points might represent Leaf Area Index values measured in the field. For a later evaluation of the performance of the models, please divide the sample into a training (70%) and a validation (30%) data set.

  • Select Tools > Random Sampling.
  • Choose the Input Image named ‘sample_LAI’ and Accept.
  • Now select Relative Sampling and type in ‘70’. Define the output path of your Random Sample and name the training data set ‘TrainSample1’, then check Complement, define the output path of your validation data set and name it ‘ValidSample1’. Now two files are created, the first containing 70% randomly chosen pixels of the ‘sample_LAI’ file, the second the remaining 30%.

image007.png

ImageRF

1) Parameterization

  • Select Applications > Regression > imageRF Regression > Parameterize RF Regression (RFR).
  • The Input Image has to be ‘AF_Image’ and the Reference Area ‘TrainSample1’.
  • Some Parameters, e.g. Number of Trees, are already pre-defined and do not have to be changed. Simply define where to save the Output RFR Model, name it ‘rfrModel1_1’ and Accept.

image009.png

2) Application

  • After completion of Parameterization, you are asked if you want to apply the model to an image, answer ‘yes’. In the next dialog the last RFR Model and the Image to is already selected. Now define where the regression estimation is to be saved and Accept.

image011.png

  • After completion, you can visualize the rfrEstimation in an Image View (drag-and-drop the file onto the view manager). The grey values represent the estimated LAI values.

image013.png

3) Accuracy Assessment

  • Select Applications > Regression > imageRF Regression > Fast Accuracy Assessment.
  • As RFR Model the last one is again already selected, as well as the Image. For the Reference Areas choose the ‘ValidSample1’.

image015.png

In your HTML browser several accuracy measures will show up. Leave the browser open for a later comparison of results.

number of samples (n): 60 (masked: 115551 total: 115611)
mean absolute error (MAE): 0.551599
mean squared error (MSE): 1.724486
root mean squared error (RMSE): 1.313197
pearson correlation (r): 0.89
squared person correlation (r^2): 0.78
nash-sutcliffe efficiency (NSE) : 0.72

image017.png
In the next step you are going to follow the same procedure again, in order to check for possible deviations in the parameterization of the model using the same training data. Hence, you start again with step 1 (Parameterization) and name the new model ‘rfrModel1_2’, then apply it to the image and do the accuracy assessment again. In your HTML browser now a second tab with the new result report should open up.
In the last step your task is to run the model once again, this time with a different allocation of trainings- and validation pixels.

  • Select Tools > Random Sampling.
  • Again choose the Input Image named ‘sample_LAI’, then Accept.
  • Select Relative Sampling and type in 70 (%).
  • Define the Output path of your random sample and name it this time ‘TrainSample2’.
  • Check again Complement, define the path and name it ‘ValidSample2’, then Accept. Two new files should appear in the File List.
  • Finally do the three steps again, namely
  1. Parameterization (using TrainSample2, naming the model ‘rfrModel1_3’)
  2. Application
  3. Accuracy Assessment (with ValidSample2).

By comparing the three accuracy measures in your HTML browser, you will notice slightly different results between the three models. Perhaps your results will look comparably to those in the following example.

Model 1.1Model 1.2Model 1.3
MAE = 0.551599MAE = 0.550949MAE = 0.510775
MSE = 1.724486MSE = 1.647037MSE = 1.002154
RMSE = 1.313197RMSE = 1.283369RMSE = 1.001076
r = 0.89r = 0.89r = 0.92
r² = 0.78r² = 0.79r² = 0.84
NSE = 0.72NSE = 0.73NSE = 0.79

ImageSVM

  • Select Applications > Regression > imageSVM Regression > Parameterize SV Regression (SVR).
  • Choose for the Training Data the ImageAF_Image’ and as Reference Areas the ‘TrainSample1’ from section Data Preparation, then Accept.
  • Now choose where to save the SVR File and name it ‘svrModel1.svr’, then Accept.

image019.png

  • When the parameterization is completed, select Applications > Regression > imageSVM Regression > Apply SVR to Image.
  • The previous SVR file is already selected, so choose as Image the ‘AF_Image’ and define a path for the regression result and a name for the file. Name it ‘AF_Image_SVR_1’, then Accept.
  • After completion, do an Accuracy Assessment. Select Applications > Regression > imageSVM Regression > Fast Accuracy Assessment.
  • In the first dialog the last SVR file is already selected, simply Accept.
  • As Validation Data select the Image ‘AF_Image’ and as Reference Areas the ‘ValidSample1’. The Accuracy Assessment yields accuracy measures, a scatterplot with histograms and a residuals plot.

image021.png

image023.png

If you follow the steps again (Parameterization, Application, Accuracy Assessment) one time, you might again notice slightly different results. Finally, like in section ImageRF, repeat the three steps again using ‘TrainSample2’ for the Parameterization and ‘ValidSample2’ for the Accuracy Assessment.

Model 2.1Model 2.2Model 2.3
MAE0.36930.52560.4391
RMSE0.82691.41701.001
R0.95260.89000.9098
0.90740.79210.8277

autoPLSR

  • Select Applications > Regression > autoPLSR > Calibrate Model.
  • Under the first bullet point, choose as Input Image the ‘AF_Image’ and as target image ‘TrainSample1’.
  • For the Output define where to save the autoPLSR Model and unclick Show Report as well as Save Report, then Accept.

image025.png

  • After completion, select Applications > Regression > autoPLSR > Apply Model.
  • Choose the Input Model ‘modelPLSR.plsr’, the Input Image ‘AF_Image’ and define the name and path for the Output image, then Accept.

image027.png

  • Now select Applications > Accuracy Assessment > Regression.
  • Choose as Estimation the ‘autoPLSR_Estimation’ and as Reference ‘ValidSample1’, click Accept.

image029.png

As in the case of imageRF, some accuracy measures will show up in your HTML browser. Following the same procedure as before, do the three steps again (Calibration, Application and Accuracy Assessment). Again the result will differ slightly. Finally do the three steps again using ‘TrainSample2’ for the Calibration and ‘ValidSample2’ for the Accuracy Assessment. You should now have received accuracy measures of the three different models.

Model 3.12. Model 3.23. Model 3.3
MAE = 0.687782MAE = 0.655503MAE = 0.697173
MSE = 1.815288MSE = 1.668609MSE = 1.518849
RMSE = 1.347326RMSE = 1.291747RMSE = 1.232416
r = 0.86r = 0.87r = 0.87
r² = 0.74r² = 0.76r² = 0.76
NSE = 0.71NSE = 0.73NSE = 0.75

Mean accuracy measures of the three approaches.

imageRFimageSVMautoPLSR
MAE0.51-0.550.37-0.530.66-0.70
RMSE1.00-1.310.82-1.411.23-1.34
R0.89-0.920.89-0.950.86-0.87
R^20.78-0.840.79-0.910.74-0.76
NSE0.72-0.79-0.71-0.75


Conclusions:

  • Even with the same training data and approach the results differ markedly, which is also true for a different allocation of training and validation pixels.
  • Choice of training data with respect to amount and representativeness is important even for more robust regression approaches.

Updated