# Wiki

Clone wiki# enmap-box-idl / Application Tutorial - Regression Techniques

# Introduction

The goal of this tutorial is to make you familiar with some important regression approaches which are implemented in the EnMAP-Box. These are Random Forests (**imageRF**), Support Vector Machines (**imageSVM**) and Partial Least Squares Regression (**autoPLSR**).

# Data Preparation

- Select
**File > Open > EnMAP-Box Test Images**. In order to get an idea of the distribution of land cover types in the test images, take a look at the Image Statistics. - Select
**Tools > Image Statistics**and select as Input Image the classification file ‘**AF_LC**’ and**Accept**.

The least represented class is named ‘soils & manmade’ (1), the three other classes are ‘water’ (2), ‘forest & natural vegetation’ (3) and ‘agriculture’ (4).

In the next step you create a stratified random sample from the test image containing Leaf Area Index (LAI) values from all of these classes.

- Select
**Tools > Random Sampling**. - Choose the
**Input Image**named ‘**AF_LAI**’ and check**Stratification**. Your Stratification image has to be ‘**AF_LC**’, then**Accept**.

- In the new dialog select
**Equalized Sampling**and type in**50**, creating a total sample of 200 pixels. - Define the Output path of your Random Sample, name it ‘
**sample_LAI**’ and**Accept**.

Your sample will appear in the **Image List**. These 200 points might represent Leaf Area Index values measured in the field.
For a later evaluation of the performance of the models, please divide the sample into a training (70%) and a validation (30%) data set.

- Select
**Tools > Random Sampling**. - Choose the
**Input Image**named ‘**sample_LAI**’ and**Accept**. - Now select
**Relative Sampling**and type in ‘**70**’. Define the output path of your Random Sample and name the training data set ‘**TrainSample1**’, then check Complement, define the output path of your validation data set and name it ‘**ValidSample1**’. Now two files are created, the first containing 70% randomly chosen pixels of the ‘sample_LAI’ file, the second the remaining 30%.

# ImageRF

**1) Parameterization**

- Select
**Applications > Regression > imageRF Regression > Parameterize RF Regression (RFR)**. - The
**Input Image**has to be ‘**AF_Image**’ and the Reference Area ‘**TrainSample1**’. - Some
**Parameters**, e.g.**Number of Trees**, are already pre-defined and do not have to be changed. Simply define where to save the**Output RFR Model**, name it ‘**rfrModel1_1**’ and Accept.

**2) Application**

- After completion of Parameterization, you are asked if you want to apply the model to an image, answer ‘yes’. In the next dialog the last
**RFR Model**and the**Image**to is already selected. Now define where the regression estimation is to be saved and**Accept**.

- After completion, you can visualize the
**rfrEstimation**in an Image View (drag-and-drop the file onto the view manager). The grey values represent the estimated LAI values.

**3) Accuracy Assessment**

- Select
**Applications > Regression > imageRF Regression > Fast Accuracy Assessment**. - As
**RFR Model**the last one is again already selected, as well as the**Image**. For the Reference Areas choose the ‘**ValidSample1**’.

In your HTML browser several accuracy measures will show up. Leave the browser __open__ for a later comparison of results.

```
number of samples (n): 60 (masked: 115551 total: 115611)
mean absolute error (MAE): 0.551599
mean squared error (MSE): 1.724486
root mean squared error (RMSE): 1.313197
pearson correlation (r): 0.89
squared person correlation (r^2): 0.78
nash-sutcliffe efficiency (NSE) : 0.72
```

In the next step you are going to follow the same procedure again, in order to check for possible deviations in the parameterization of the model using the same training data. Hence, you start again with step 1 (Parameterization) and name the new model ‘**rfrModel1_2**’, then apply it to the image and do the accuracy assessment again. In your HTML browser now a second tab with the new result report should open up.

In the last step your task is to run the model once again, this time with a different allocation of trainings- and validation pixels.

- Select
**Tools > Random Sampling**. - Again choose the
**Input Image**named ‘**sample_LAI**’, then**Accept**. - Select
**Relative Sampling**and type in**70**(%). - Define the Output path of your random sample and name it this time ‘
**TrainSample2**’. - Check again
**Complement**, define the path and name it ‘**ValidSample2**’, then**Accept**. Two new files should appear in the**File List**. - Finally do the three steps again, namely

- Parameterization (using
**TrainSample2**, naming the model ‘**rfrModel1_3**’) - Application
- Accuracy Assessment (with
**ValidSample2**).

By comparing the three accuracy measures in your HTML browser, you will notice slightly different results between the three models. Perhaps your results will look comparably to those in the following example.

Model 1.1 | Model 1.2 | Model 1.3 |
---|---|---|

MAE = 0.551599 | MAE = 0.550949 | MAE = 0.510775 |

MSE = 1.724486 | MSE = 1.647037 | MSE = 1.002154 |

RMSE = 1.313197 | RMSE = 1.283369 | RMSE = 1.001076 |

r = 0.89 | r = 0.89 | r = 0.92 |

r² = 0.78 | r² = 0.79 | r² = 0.84 |

NSE = 0.72 | NSE = 0.73 | NSE = 0.79 |

# ImageSVM

- Select
**Applications > Regression > imageSVM Regression > Parameterize SV Regression (SVR)**. - Choose for the Training Data the
**Image**‘**AF_Image**’ and as**Reference Areas**the ‘**TrainSample1**’ from section Data Preparation, then Accept. - Now choose where to save the SVR File and name it ‘
**svrModel1.svr**’, then Accept.

- When the parameterization is completed, select
**Applications > Regression > imageSVM Regression > Apply SVR to Image**. - The previous SVR file is already selected, so choose as Image the ‘
**AF_Image**’ and define a path for the regression result and a name for the file. Name it ‘**AF_Image_SVR_1**’, then Accept. - After completion, do an Accuracy Assessment. Select
**Applications > Regression > imageSVM Regression > Fast Accuracy Assessment**. - In the first dialog the last SVR file is already selected, simply
**Accept**. - As Validation Data select the Image ‘AF_Image’ and as Reference Areas the ‘ValidSample1’. The Accuracy Assessment yields accuracy measures, a scatterplot with histograms and a residuals plot.

If you follow the steps again (Parameterization, Application, Accuracy Assessment) one time, you might again notice slightly different results. Finally, like in section ImageRF, repeat the three steps again using ‘**TrainSample2**’ for the Parameterization and ‘**ValidSample2**’ for the **Accuracy Assessment**.

Model 2.1 | Model 2.2 | Model 2.3 | |
---|---|---|---|

MAE | 0.3693 | 0.5256 | 0.4391 |

RMSE | 0.8269 | 1.4170 | 1.001 |

R | 0.9526 | 0.8900 | 0.9098 |

R² | 0.9074 | 0.7921 | 0.8277 |

# autoPLSR

- Select
**Applications > Regression > autoPLSR > Calibrate Model**. - Under the first bullet point, choose as Input Image the ‘
**AF_Image**’ and as target image ‘**TrainSample1**’. - For the
**Output**define where to save the**autoPLSR Model**and__unclick__**Show Report**as well as**Save Report**, then**Accept**.

- After completion, select
**Applications > Regression > autoPLSR > Apply Model**. - Choose the Input Model ‘
**modelPLSR.plsr**’, the Input Image ‘**AF_Image**’ and define the name and path for the**Output image**, then**Accept**.

- Now select
**Applications > Accuracy Assessment > Regression**. - Choose as
**Estimation**the ‘**autoPLSR_Estimation**’ and as Reference ‘**ValidSample1**’, click**Accept**.

As in the case of imageRF, some accuracy measures will show up in your HTML browser. Following the same procedure as before, do the three steps again (**Calibration**, **Application** and **Accuracy Assessment**). Again the result will differ slightly. Finally do the three steps again using ‘**TrainSample2**’ for the Calibration and ‘**ValidSample2**’ for the Accuracy Assessment. You should now have received accuracy measures of the three different models.

Model 3.1 | 2. Model 3.2 | 3. Model 3.3 |
---|---|---|

MAE = 0.687782 | MAE = 0.655503 | MAE = 0.697173 |

MSE = 1.815288 | MSE = 1.668609 | MSE = 1.518849 |

RMSE = 1.347326 | RMSE = 1.291747 | RMSE = 1.232416 |

r = 0.86 | r = 0.87 | r = 0.87 |

r² = 0.74 | r² = 0.76 | r² = 0.76 |

NSE = 0.71 | NSE = 0.73 | NSE = 0.75 |

Mean accuracy measures of the three approaches.

imageRF | imageSVM | autoPLSR | |
---|---|---|---|

MAE | 0.51-0.55 | 0.37-0.53 | 0.66-0.70 |

RMSE | 1.00-1.31 | 0.82-1.41 | 1.23-1.34 |

R | 0.89-0.92 | 0.89-0.95 | 0.86-0.87 |

R^2 | 0.78-0.84 | 0.79-0.91 | 0.74-0.76 |

NSE | 0.72-0.79 | - | 0.71-0.75 |

Conclusions:

- Even with the same training data and approach the results differ markedly, which is also true for a different allocation of training and validation pixels.
- Choice of training data with respect to amount and representativeness is important even for more robust regression approaches.

Updated