EnMAP-Box Application Tutorial: Regression - Hu

EnMAP-Box
Application Tutorial: Regression Techniques
Date
16.03.2015
Authors: Matthias Held, Sebastian van der Linden, Benjamin Jakimow, Andreas
Rabe and Patrick Hostert
Abstract: Application and performance evaluation of different regression
techniques
Copyright
© Humboldt-Universität zu Berlin, Geomatics Lab, 2015, www.hu-geomatics.de
Citation
Please cite this tutorial as: Held, M., van der Linden, S., Jakimow, B., Rabe, A.,
Hostert, P. (2015). EnMAP-Box Application Tutorial: Regression Techniques,
Humboldt-Universität zu Berlin, Germany.
Disclaimer
The authors of this tutorial accept no responsibility for errors or omissions in this
work and shall not be liable for any damage caused by these errors or omissions.
2
Contents
1
Introduction ......................................................................................... 4
2
Data Preparation .................................................................................. 4
3
ImageRF .............................................................................................. 7
4
ImageSVM ......................................................................................... 10
5
autoPLSR ........................................................................................... 12
3
1
Introduction
The goal of this tutorial is to make you familiar with some important regression
approaches which are implemented in the EnMAP-Box. These are Random
Forests (imageRF), Support Vector Machines (imageSVM) and Partial Least
Squares Regression (autoPLSR).
2
Data Preparation

Select File > Open > EnMAP-Box Test Images.
In order to get an idea of the distribution of land cover types in the test images,
take a look at the Image Statistics.

Select Tools > Image Statistics and select as Input Image the
classification file ‘AF_LC’ and Accept.
The least represented class is named ‘soils & manmade’ (1), the three other
classes are ‘water’ (2), ‘forest & natural vegetation’ (3) and ‘agriculture’ (4).
In the next step you create a stratified random sample from the test image
containing Leaf Area Index (LAI) values from all of these classes.

Select Tools > Random Sampling.

Choose the Input Image named ‘AF_LAI’ and check Stratification. Your
Stratification image has to be ‘AF_LC’, then Accept.

In the new dialog select Equalized Sampling and type in 50, creating a
total sample of 200 pixels.
4

Define the Output path of your Random Sample, name it ‘sample_LAI’
and Accept.
Your sample will appear in the Image List. These 200 points might represent
Leaf Area Index values measured in the field.
For a later evaluation of the performance of the models, please divide the sample
into a training (70%) and a validation (30%) data set.

Select Tools > Random Sampling.

Choose the Input Image named ‘sample_LAI’ and Accept.

Now select Relative Sampling and type in ‘70’. Define the output path of
your Random Sample and na me the training data set ‘TrainSample1’,
then check Complement, define the output path of your validation data set
and name it ‘ValidSample1’. Now two files are created, the first containing
70% randomly chosen pixels of the ‘sample_LAI’ file, the second the
remaining 30%.
5
6
3
ImageRF
1) Parameterization

Select Applications > Regression
Parameterize RF Regression (RFR).
>
imageRF
Regression
>

The Input Image has to be ‘AF_Image’ and the Reference Area
‘TrainSample1’.

Some Parameters, e.g. Number of Trees, are already pre-defined and do
not have to be changed. Simply define where to save the Output RFR
Model, name it ‘rfrModel1_1’ and Accept.
2) Application

After completion of Parameterization, you are asked if you want to apply the
model to an image, answer ‘yes’. In the next dialog the last RFR Model and
the Image to is already selected. Now define where the regression
estimation is to be saved and Accept.

After completion, you can visualize the rfrEstimation in an Image View
(drag-and-drop the file onto the view manager). The grey values represent
the estimated LAI values.
7
3) Accuracy Assessment

Select Applications > Regression > imageRF Regression > Fast
Accuracy Assessment.

As RFR Model the last one is again already selected, as well as the Image.
For the Reference Areas choose the ‘ValidSample1’.
In your HTML browser several accuracy measures will show up. Leave the
browser open for a later comparison of results.
number of samples (n):
mean absolute error (MAE):
mean squared error (MSE):
root mean squared error (RMSE):
pearson correlation (r):
squared person correlation (r^2):
nash-sutcliffe efficiency (NSE) :
60 (masked: 115551 total: 115611)
0.551599
1.724486
1.313197
0.89
0.78
0.72
8
In the next step you are going to follow the same procedure again, in order to
check for possible deviations in the parameterization of the model using the
same training data. Hence, you start again with step 1 (Parameterization) and
name the new model ‘rfrModel1_2’, then apply it to the image and do the
accuracy assessment again. In your HTML browser now a second tab with the
new result report should open up.
In the last step your task is to run the model once again, this time with a
different allocation of trainings- and validation pixels.

Select Tools > Random Sampling.

Again choose the Input Image named ‘sample_LAI’, then Accept.

Select Relative Sampling and type in 70 (%).

Define the Output path of your random sample and name it this time
‘TrainSample2’.

Check again Complement, define the path and name it ‘ValidSample2’,
then Accept. Two new files should appear in the File List.

Finally do the three steps again, namely
1. Parameterization (using TrainSample2, naming the model ‘rfrModel1_3’)
2. Application
3. Accuracy Assessment (with ValidSample2).
By comparing the three accuracy measures in your HTML browser, you will notice
slightly different results between the three models. Perhaps your results will look
comparably to those in the following example.
9
Model 1.1
Model 1.2
Model 1.3
MAE = 0.551599
MAE = 0.550949
MAE = 0.510775
MSE = 1.724486
MSE = 1.647037
MSE = 1.002154
RMSE = 1.313197
RMSE = 1.283369
RMSE = 1.001076
r = 0.89
r = 0.89
r = 0.92
r² = 0.78
r² = 0.79
r² = 0.84
NSE = 0.72
NSE = 0.73
NSE = 0.79
4
ImageSVM

Select Applications > Regression
Parameterize SV Regression (SVR).

Choose for the Training Data the Image ‘AF_Image’ and as Reference
Areas the ‘TrainSample1’ from section 2, then Accept.

Now
choose
where
to
save
the
‘AF_Image_scaled1.svr’, then Accept.

When the parameterization is completed, select
>
imageSVM
SVR
File
Regression
and
name
>
it
Applications > Regression > imageSVM Regression > Apply SVR to
Image.

The previous SVR file is already selected, so choose as Image the
‘AF_Image’ and define a path for the regression result and a name for the
file. Name it ‘AF_Image_SVR_1’, then Accept.

After completion, do an Accuracy Assessment. Select
Applications > Regression > imageSVM Regression > Fast Accuracy
Assessment.

In the first dialog the last SVR file is already selected, simply Accept.

As Validation Data select the Image ‘AF_Image’ and as Reference Areas
the ‘ValidSample1’.
The Accuracy Assessment yields a accuracy measures, a scatterplot with
histograms and a residuals plot.
10
If you follow the steps again (Parameterization, Application, Accuracy
Assessment) one time, you might again notice slightly different results. Finally,
like in section 3, repeat the three steps again using ‘TrainSample2’ for the
Parameterization and ‘ValidSample2’ for the Accuracy Assessment.
Model 2.1
Model 2.2
Model 2.3
MAE
0.3693
0.5256
0.4391
RMSE
0.8269
1.4170
1.001
R
0.9526
0.8900
0.9098
R²
0.9074
0.7921
0.8277
11
5
autoPLSR

Select Applications > Regression > autoPLSR > Calibrate Model.

Under the first bullet point, choose as Input Image the ‘AF_Image’ and as
target image ‘TrainSample1’.

For the Output define where to save the autoPLSR Model and unclick
Show Report as well as Save Report, then Accept.

After completion, select Applications > Regression > autoPLSR > Apply
Model.

Choose the Input Model ‘modelPLSR.plsr’, the Input Image ‘AF_Image’
and define the name and path for the Output image, then Accept.
12

The image ‘autoPLSR_Estimation’ has to have the file type EnMAP-Box
Regression and a data ignore value in order to perform an accuracy
assessment, which it currently does not. Therefore, right-click on the file
‘autoPLSR_Estimation’ and choose Edit Header File. Then change the line
‘file type = envi standard’ to ‘file type = EnMAP-Box Regression’
and add the following line
‘data ignore value = -1’

In the same window click File > Save and close the window.

Now select Applications > Accuracy Assessment > Regression.

Choose as Estimation the ‘autoPLSR_Estimation’ and as Reference
‘ValidSample1’, click Accept.
As in the case of imageRF, some accuracy measures will show up in your HTML
browser. Following the same procedure as before, do the three steps again
(Calibration, Application and Accuracy Assessment). Again the result will
differ slightly. Finally do the three steps again using ‘TrainSample2’ for the
Calibration and ‘ValidSample2’ for the Accuracy Assessment. You should now
have received accuracy measures of the three different models.
1. Model 3.1
2. Model 3.2
3. Model 3.3
MAE = 0.687782
MAE = 0.655503
MAE = 0.697173
MSE = 1.815288
MSE = 1.668609
MSE = 1.518849
RMSE = 1.347326
RMSE = 1.291747
RMSE = 1.232416
r = 0.86
r = 0.87
r = 0.87
r² = 0.74
r² = 0.76
r² = 0.76
NSE = 0.71
NSE = 0.73
NSE = 0.75
Mean accuracy measures of the three approaches.
imageRF
imageSVM
autoPLSR
MAE
0.51-0.55
0.37-0.53
0.66-0.70
RMSE
1.00-1.31
0.82-1.41
1.23-1.34
R
0.89-0.92
0.89-0.95
0.86-0.87
R^2
0.78-0.84
0.79-0.91
0.74-0.76
NSE
0.72-0.79
-
0.71-0.75
13
Conclusions:
-
Even with the same training data and approach the results differ markedly,
which is also true for a different allocation of training and validation pixels.
-
Choice of training data with respect to amount and representativeness is
important even for more robust approaches.
14