\label{subsection-prediction}
To estimate the f-score of a binarization algorithm, we automatically build a prediction model based on the most significant features among the $18$. More precisely, the prediction model is computed using multivariate step wise linear regression \cite{thompson1978selectionp1,thompson1978selectionp2,hocking}, followed by repeated random sub-sampling validation (cross validation).
-The linear regression models (as an hyperplane) the relationship between the features and the groundtruthed f-scores. This result can then be used to predict a f-score according to the set of computed features.
-The prediction can be improved by using only a pertinent subset of features among the $18$ independent computed features.
-There are 3 main ways to carry out a selection. First, Forward strategy consist in computing a criteria by adding one feature at a time. On the contrary, a second approach (Backward) consist in starting with all the features and deleting them one at a time. After each deleting a criteria is computed. At last, a strategy consist of testing all the possible combination.
-The criteria is linked to the rsquare value (indicates
-as we have only 18 features, we decided to use the exhaustive staregy.
+The linear regression models, as an hyperplane, the relationship between the features and the groundtruthed f-scores. This result can then be used to predict a f-score according to a set of computed features. The prediction can be improved by using only a pertinent subset of features among the $18$ independent computed features.
+There are three main ways to carry out a selection. First, the forward strategy consists in computing a criteria (linked to the $R^2$ value) by adding one feature at a time. On the contrary, a second approach (backward strategy) consists in starting with all the features and deleting them one at a time. After each deletion the criteria is computed. The last strategy consists in testing all the possible combinations. As we have only 18 features, we decided to use the exhaustive strategy.
This overall process can be divided into five steps\footnote{The overall R project script and our evaluation data can be downloaded from the following website \texttt{https://bitbucket.org/vrabeux/qualityevaluation}}~:
\item \textbf{Features computation:} The $18$ proposed features are computed for each image.
- \item \textbf{F-scores computation:} We run the binarization algorithm on the overall dataset and measure its accuracy relative to the ground truth. In the following section, these f-scores are called ground truth f-scores.
+ \item \textbf{F-scores computation:} We run the binarization algorithm and compute the f-score for each image by comparing the binarization result and the ground truth. In the following section, these f-scores are called ground truth f-scores.
- \item \textbf{Generation of the predictive model}: This step consists of applying a step wise multivariate linear regression to the overall dataset, allowing us to select the most significant features for predicting the given binarization algorithm. Some features may not be significant for predicting a specific binarization method. Moreover, even if a feature is highly correlated to the accuracy of an algorithm, it may have a weak contribution to the final prediction model. Keeping all features in each prediction model would lead to overset models. The output of this step is a linear function that gives a predicted f-score value for any image, for one binarization algorithm, knowing the selected features.
+ \item \textbf{Generation of the predictive model}: This step consists of applying a step wise multivariate linear regression to the overall dataset, allowing us to select the most significant features for predicting the given binarization algorithm. Keeping all features in each prediction model would lead to overparameterized models. Indeed, some features may not be significant for predicting a specific binarization method. Moreover, even if a feature is highly correlated to the accuracy of an algorithm, it may have a weak contribution to the final prediction model. The output of this step is a linear function that gives a predicted f-score value for any image, for one binarization algorithm, knowing the selected features.
- \item \textbf{Evaluation of model accuracy}: The $R^{2}$ value indicates the proportion of variability in a data set that is accounted for by the statistical model and provides a measure of how well the model predicts future outcomes. The best theoretical value for $ R^{2}$ is 1. Moreover, a p-value is computed for each selected feature indicating its significance. There is no automatic rule to decide whether a model is valid. In our tests, we choose to keep the model only if $R^2 > 0.7$ and if a majority of p-values are lower than $0.1$.
+ \item \textbf{Evaluation of model accuracy}:
+The $R^{2}$ value measures the quality of the prediction model. It can be interpreted as a correlation between the ground truth and the prediction. The best theoretical value for $ R^{2}$ is 1. Moreover, a p-value is computed for each selected feature indicating its significance : a low p-value leads to reject the hypothesis that the selected feature is not significant (null hypothesis).
+There is no automatic rule to decide whether a model is valid. In our tests, we choose to keep the model only if $R^2 > 0.7$ and if a majority of p-values are lower than $0.1$.
%??? We also look at the slope coefficient of the validation regression, which also needs to be the closest to 1.
%This $6$-steps process permits to have a prediction model validated for an algorithm (function).
All $11$ binarization methods described in this article require parameter settings. Note that our methodology involves the creation of different predictive models, one for each parameter set. For example, Sauvola's method with a $5 \times 5$ window size is different from Sauvola's method with an $8 \times 8$ window size and will require the creation of a different prediction model.
+%Linear regression makes sense here because we observed that the computed features are linearly correlated to ?
%This methodology can be applied to different types of algorithms. In \cite{rabeux2011ancient} it is applied, with similar measures, to predict two OCRs accuracy : Abbyy Fine Reader and OCROpus.
%%In the latter article, the two models are accurate (R-square close to 0.9),