Commits

Vincent Rabeux committed c33857a

Fixing the rejecting explanations.

Comments (0)

Files changed (3)

 
 \section{Conclusion and research perspectives}
 
-This paper presented $18$ features that characterize the quality of a document image. These features are used in step-wise multivariate linear regression to create prediction models for  $12$ binarization methods. Repeated random sub-sampling cross-validation shows that these $12$ models are accurate (max percentage error equals 11\%) and can be used to automatically choose the best binarization method. Moreover, given the step-wise approach of the linear regression, these models are not over parameterized.  
+This paper presented $18$ features that characterize the quality of a document image. These features are used in step-wise multivariate linear regression to create prediction models for $12$ binarization methods. However, 2 prediction models are rejected due to the lack of accuracy of the latters. New dedicated features have to be created and used in the presented methodology to circumvent this issue. 
+Repeated random sub-sampling cross-validation shows that the models are accurate (max percentage error equals 11\%) and can be used to automatically choose the best binarization method. Moreover, given the step-wise approach of the linear regression, these models are not over parameterized.  
 
 One of our future research goals is to apply the same methodology to predict OCR error rates. 
 %In \cite{rabeux2011ancient}, similar features are used with a multivariate linear regression to predict the OCR error rate.

IJDAR/measures.tex

 	 
 According to the computed features, it is preferable to use Sauvola's method for the first image and Otsu's for the second. Doing so is consistent with the f-scores of the two binarization methods. 
 	
-The proposed features characterize three different aspects of degradation: intensity, quantity and location. The next section details a methodology that uses these features to predict the result of a binarization algorithm, which is applied to the  prediction of $11$ binarization algorithms on the DIBCO dataset.
+The proposed features characterize three different aspects of degradation: intensity, quantity and location. The next section details a methodology that uses these features to predict the result of a binarization algorithm, which is applied to the  prediction of $12$ binarization algorithms on the DIBCO dataset.
 
 % Depending of the algorithm that we aim to predict, all these measures may not be use on the same prediction model. A sub-selection of measures is necessary. This process is done in an automated way witch is presented in following section.
 

IJDAR/prediction.tex

 %% In this paper, we focus on quality evaluation of historical documents. As mentioned before, these documents suffer from different types of degradations such as spots, speckles, ink loss, non-uniform illumination, bleed-through. We assume that it's possible to evaluate the global quality of a document image where most of others methods try to localize and evaluate separately each defect.
 %% % and also complex document informations such as figures, drawings, hand writing annotations
 %% These degradations are visible when a document is in a grayscale, and should not be measured and characterized at a post binarization level. Therefore the previous techniques and measures cannot be applied to historical documents. Moreover we believe that the binarization step is the first key to have a successful document analysis workflow. 
-This section presents a unified methodology that is able to predict most types of binarization methods (for example, adaptive thresholding, clustering, entropic, document dedicated). Our methodology is evaluated on $11$ binarization methods used in document analysis. The methods are referenced in the text by their author's names.
+This section presents a unified methodology that is able to predict most types of binarization methods (for example, adaptive thresholding, clustering, entropic, document dedicated). Our methodology is evaluated on $12$ binarization methods used in document analysis. The methods are referenced in the text by their author's names.
  \begin{enumerate}
  	\item Bernsen \cite{bernsen} is a local adaptive thresholding technique.
 	\item Kapur \cite{kapur1985new} is an entropy-based thresholding method.
  
 
 
-The best theoretical value for $ R^{2}$ is 1. Moreover, a p-value is computed for each selected feature indicating its significance :  a low p-value leads to reject the hypothesis that the selected feature is not significant (null hypothesis).
-
-At this step, there is no automatic rule to decide whether a model is valid or not. The $R^{2}$ value computed at this step gives an indication of how well the model can be used in practice. The model still needs to be statically validated. This statistical validation is done at the next step. 
-
-%However, in our tests, we choose to keep the model only if a majority of p-values are lower than $0.1$. 
+The best theoretical value for $ R^{2}$ is 1. Moreover, a p-value is computed for each selected feature indicating its significance :  a low p-value leads to reject the hypothesis that the selected feature is not significant (null hypothesis). At this step, there is no automatic rule to decide whether a model is valid or not. However, in our experiments, we choose to keep the model only if the $R^{2}$ value is higher than 0.7 and if a majority of p-values are lower than $0.1$. 
 
 %???  We also look at the slope coefficient of the validation regression, which also needs to be the closest to 1.
 
 
 Among the 18 features, most models embed about 7 features. Globally the selected features are consistent with the binarization algorithm : the step wise selection process tends to keep global (resp. local) features for global (resp. local) binarization algorithms. We also note that $\mS$ is never selected by any prediction model. Indeed, the binarization accuracy is measured at the pixel level (f-score). With this accuracy measure, the feature $\mSG$ becomes more significant than $\mS$, which may not have been the case with another evaluation measure.
 
-The two values $\bar{R^2}$ and $mpe$ show the quality of each prediction model.
-A $\bar{R^{2}}$ value higher than $0.7$ indicates that it is possible to predict the results of a binarization method~\cite{cohen}. As a result, $12$ binarization methods can be well predicted. The mean percentage error ($mpe$) is the average difference between predicted f-scores and real f-scores. This value is around $5\%$.
+
+The $R^{2}$ values show the quality of each prediction model. The prediction models of Sahoo and Niblack binarization methods were not kept for the statistical validation step since the $R^{2}$ values were below 0.7. For these two binarization models new features have to be created in order to obtain more accurate prediction models.
+
+
+The two values $\bar{R^2}$ and $mpe$ show the accuracy of each prediction model on the validation step. A $\bar{R^{2}}$ value higher than $0.7$ indicates that it is possible to predict the results of a binarization method~\cite{cohen}. As a result, $12$ binarization methods can be well predicted. The mean percentage error ($mpe$) is the average difference between predicted f-scores and real f-scores. This value is around $5\%$.
 
 \begin{center}
 \begin{table}[ht]
 \begin{tabular}{|c|p{3cm}|c|c|c|}
 
 \hline
-Method &  Selected Features & $R^{2}$ & $mpe$ \\
+Method &  Selected Features & $R^{2}$ & $\bar{R^{2}}$ & $mpe$ \\
 \hline
 Bernsen & $\mIInk$; $\mA$; $\mSG$; $v$; $v_{D}$; $v_{I}$ & 0.83 & 0.96 & 6\% \\
 \hline