fce2225
Merge
committed
Commits
Comments (0)
Files changed (1)

+16 7IJDAR/prediction.tex
IJDAR/prediction.tex
To estimate the fscore of a binarization algorithm, we automatically build a prediction model based on the most significant features among the $18$. More precisely, the prediction model is computed using multivariate step wise linear regression \cite{thompson1978selectionp1,thompson1978selectionp2,hocking, cohen}, followed by repeated random subsampling validation (cross validation).
The linear regression models, as an hyperplane, the relationship between the features and the groundtruthed fscores. This result can then be used to predict a fscore according to a set of computed features. The prediction can be improved by using only a pertinent subset of features among the $18$ independent computed features.
There are three main ways to carry out a selection. First, the forward strategy consists in computing a criteria (linked to the $R^2$ value) by adding one feature at a time. On the contrary, a second approach (backward strategy) consists in starting with all the features and deleting them one at a time. After each deletion the criteria is computed. The last strategy consists in testing all the possible combinations. As we have only 18 features, we decided to use the exhaustive strategy.
 \item \textbf{Fscores computation:} We run the binarization algorithm and compute the fscore for each image by comparing the binarization result and the ground truth. In the following section, these fscores are called ground truth fscores.
+ \item \textbf{Fscores computation:} We run the binarization algorithm and compute the fscore for each image by comparing the binarization result and the ground truth. In the following, these fscores are called ground truth fscores and denoted by $f_i, i\in[0,N]$ (with $N=\#images$)
 \item \textbf{Generation of the predictive model}: This step consists of applying a step wise multivariate linear regression to the overall dataset, allowing us to select the most significant features for predicting the given binarization algorithm. Keeping all features in each prediction model would lead to overparameterized models. Indeed, some features may not be significant for predicting a specific binarization method. Moreover, even if a feature is highly correlated to the accuracy of an algorithm, it may have a weak contribution to the final prediction model. The output of this step is a linear function that gives a predicted fscore value for any image, for one binarization algorithm, knowing the selected features.
+ \item \textbf{Generation of the predictive model}: This step consists of applying a step wise multivariate linear regression to the overall dataset, allowing us to select the most significant features for predicting the given binarization algorithm. Keeping all features in each prediction model would lead to overfitt models. Indeed, some features may not be significant for predicting a specific binarization method. Moreover, even if a feature is highly correlated to the accuracy of an algorithm, it may have a weak contribution to the final prediction model. The output of this step is a linear function that gives a predicted fscore value for any image, for one binarization algorithm, knowing the selected features. We denote by $\hat{f_i}, i\in[0,N]$ the predicted fscores.
The $R^{2}$ value measures the quality of the prediction model. It can be interpreted as a correlation between the ground truth and the prediction. The best theoretical value for $ R^{2}$ is 1. Moreover, a pvalue is computed for each selected feature indicating its significance : a low pvalue leads to reject the hypothesis that the selected feature is not significant (null hypothesis).
+The $R^{2}$ value measures the quality of the prediction model. It can be interpreted as a correlation between the ground truth and the prediction.
+The best theoretical value for $ R^{2}$ is 1. Moreover, a pvalue is computed for each selected feature indicating its significance : a low pvalue leads to reject the hypothesis that the selected feature is not significant (null hypothesis).
There is no automatic rule to decide whether a model is valid. In our tests, we choose to keep the model only if $R^2 > 0.7$ and if a majority of pvalues are lower than $0.1$.
Because of the relatively few images in the dataset, we use cross validation to estimate the performance of the predictive function generated in step 2. We randomly split the overall set of images into two different subsets: the training set and the validation set. In our experiments, the training set is composed of 90\% of the dataset images and the validation set is composed of the remaining 10\%.
By applying linear regression to the training set, we compute a new prediction function with its associated $R^2$. We apply this new predictive function to the validation set. The obtained predicted fscores are compared with the ground truth fscores using a linear regression that provides the slope coefficient $\alpha$.
+By applying linear regression to the training set, we compute a new prediction function with its associated $R^2$. The features used here are those selected at step 2.
+We apply this new predictive function to the validation set. The obtained predicted fscores are compared with the ground truth fscores using a linear regression that provides the slope coefficient $\alpha$.
If $\alpha$ is close to $1$, then the fscores of the binarization are well predicted on the validation set.
All $11$ binarization methods described in this article require parameter settings. Note that our methodology involves the creation of different predictive models, one for each parameter set. %For example, Sauvola's method with a $5 \times 5$ window size is different from Sauvola's method with an $8 \times 8$ window size and will require the creation of a different prediction model.
+Some binarization methods described in this article require parameter settings. Note that our methodology involves the creation of different predictive models, one for each parameter set. %For example, Sauvola's method with a $5 \times 5$ window size is different from Sauvola's method with an $8 \times 8$ window size and will require the creation of a different prediction model.
%Linear regression makes sense here because we observed that the computed features are linearly correlated to ?
The prediction model for Otsu's, Sauvola's and Shijian's binarization algorithms were generated with the methodology described in section~\ref{subsectionprediction}. The coefficients associated with the most significant selected features, their pvalues and the intercept of the linear predictive function are detailed in Tables~\ref{otsuPredictionModel}, \ref{sauvolaPredictionModel} and \ref{shijianPredictionModel}. If a feature is not present in a table, then it was not selected by the step wise algorithm. As mentioned in the previous section, the cross validation for each model gives the pair $(\bar{\alpha}, \bar{R^{2}})$.
The most significant selected features for Otsu's prediction model are $ \mIInk $, $ v_{I}$, $ v_{B} $, $ \mu_{B} $, $\mu$ and $v$ (see Table~\ref{otsuPredictionModel} for the coefficients of the predictive function). For Otsu's prediction model, we can explain the feature selection by the fact that Otsu's binarization method is based on global thresholding. That is why features such as $\mIInk$, $\mu$ and $v$ are significant and have such low pvalues. The model's $ R^{2}$ equals $0.93$, which is considered very good.
+The most significant selected features for Otsu's prediction model are $ \mIInk $, $ v_{I}$, $ v_{B} $, $ \mu_{B} $, $\mu$ and $v$ (see Table~\ref{otsuPredictionModel} for the coefficients of the predictive function). For Otsu's prediction model, we can explain the feature selection by the fact that Otsu's binarization method is based on global thresholding. That is why features such as $\mIInk$, $\mu$ and $v$ are significant and have such low pvalues. The model's $ R^{2}$ equals $0.93$, which is considered very good \ref{cohen}.
The crossvalidation gives a $\bar{\alpha}$ coefficient of $0.989$ and $\bar{R^{2}}$ of $0.987$. These results indicate that our model does not depend on the chosen training data.