%% In this paper, we focus on quality evaluation of historical documents. As mentioned before, these documents suffer from different types of degradations such as spots, speckles, ink loss, non-uniform illumination, bleed-through. We assume that it's possible to evaluate the global quality of a document image where most of others methods try to localize and evaluate separately each defect.
%% % and also complex document informations such as figures, drawings, hand writing annotations
%% These degradations are visible when a document is in a grayscale, and should not be measured and characterized at a post binarization level. Therefore the previous techniques and measures cannot be applied to historical documents. Moreover we believe that the binarization step is the first key to have a successful document analysis workflow.
-This section presents a unified methodology that is able to predict most types of binarization methods (for example, adaptive thresholding, clustering, entropic, document dedicated). Our methodology is evaluated on $11$ binarization methods used in document analysis. The methods are referenced in the text by their author's names.
+This section presents a unified methodology that is able to predict most types of binarization methods (for example, adaptive thresholding, clustering, entropic, document dedicated). Our methodology is evaluated on $12$ binarization methods used in document analysis. The methods are referenced in the text by their author's names.
\item Bernsen \cite{bernsen} is a local adaptive thresholding technique.
\item Kapur \cite{kapur1985new} is an entropy-based thresholding method.
-The best theoretical value for $ R^{2}$ is 1. Moreover, a p-value is computed for each selected feature indicating its significance : a low p-value leads to reject the hypothesis that the selected feature is not significant (null hypothesis).
-At this step, there is no automatic rule to decide whether a model is valid or not. The $R^{2}$ value computed at this step gives an indication of how well the model can be used in practice. The model still needs to be statically validated. This statistical validation is done at the next step.
-%However, in our tests, we choose to keep the model only if a majority of p-values are lower than $0.1$.
+The best theoretical value for $ R^{2}$ is 1. Moreover, a p-value is computed for each selected feature indicating its significance : a low p-value leads to reject the hypothesis that the selected feature is not significant (null hypothesis). At this step, there is no automatic rule to decide whether a model is valid or not. However, in our experiments, we choose to keep the model only if the $R^{2}$ value is higher than 0.7 and if a majority of p-values are lower than $0.1$.
%??? We also look at the slope coefficient of the validation regression, which also needs to be the closest to 1.
Among the 18 features, most models embed about 7 features. Globally the selected features are consistent with the binarization algorithm : the step wise selection process tends to keep global (resp. local) features for global (resp. local) binarization algorithms. We also note that $\mS$ is never selected by any prediction model. Indeed, the binarization accuracy is measured at the pixel level (f-score). With this accuracy measure, the feature $\mSG$ becomes more significant than $\mS$, which may not have been the case with another evaluation measure.
-The two values $\bar{R^2}$ and $mpe$ show the quality of each prediction model.
-A $\bar{R^{2}}$ value higher than $0.7$ indicates that it is possible to predict the results of a binarization method~\cite{cohen}. As a result, $12$ binarization methods can be well predicted. The mean percentage error ($mpe$) is the average difference between predicted f-scores and real f-scores. This value is around $5\%$.
+The $R^{2}$ values show the quality of each prediction model. The prediction models of Sahoo and Niblack binarization methods were not kept for the statistical validation step since the $R^{2}$ values were below 0.7. For these two binarization models new features have to be created in order to obtain more accurate prediction models.
+The two values $\bar{R^2}$ and $mpe$ show the accuracy of each prediction model on the validation step. A $\bar{R^{2}}$ value higher than $0.7$ indicates that it is possible to predict the results of a binarization method~\cite{cohen}. As a result, $12$ binarization methods can be well predicted. The mean percentage error ($mpe$) is the average difference between predicted f-scores and real f-scores. This value is around $5\%$.
\begin{tabular}{|c|p{3cm}|c|c|c|}
-Method & Selected Features & $R^{2}$ & $mpe$ \\
+Method & Selected Features & $R^{2}$ & $\bar{R^{2}}$ & $mpe$ \\
Bernsen & $\mIInk$; $\mA$; $\mSG$; $v$; $v_{D}$; $v_{I}$ & 0.83 & 0.96 & 6\% \\