596a671
committed
Commits
Comments (0)
Files changed (1)

+5 9IJDAR/prediction.tex
IJDAR/prediction.tex
The methodology previously explained allows the creation of an accurate prediction model for any binarization method. Given a document image, a binarization method and its prediction model, we can compute all of the features required by the model and use them as inputs. The result is the predicted accuracy of this specific binarization method for this specific image.
+The methodology previously explained allows the creation of an accurate prediction model for any binarization method. Given a document image, a binarization method and its prediction model, we can compute all of the features required by the model and use them as inputs. The result is the predicted accuracy of this specific binarization method for this specific image. Given several binarization prediction models, we can create a binarization process that uses these prediction models to select the optimal binarization method for each image of a dataset.
+For instance, Shijian's method is a performant binarization method which gives the best results on average. However, in some borderline cases Shijian's significantly fails while other methods perform better. This is illustrated in Figure \ref{figshijianfails} where the bleedthrough defect disrupts methods which use a local analysis of the image.
\caption{Sophisticated binarization algorithms do not always propose the best output : a. the original image, b. the Shijian binarization output, c. the Sauvola binarization output, d. the Otsu binarization output. Ostu's algorithm gives the best performances on this specific image.}
+\caption{Sophisticated binarization algorithms do not always give the best output : a. original image, b. Shijian's binarization output, c. Sauvola's binarization output, d. Otsu's binarization output. Ostu's algorithm has the best performances on this specific image.}
Given several binarization prediction models, we can create a binarization process that uses these prediction models to select the optimal binarization method for each image of a dataset.
Shijian's method is a performant binarization method which gives the best results on average. However, in some borderline cases Shijian significantly fails while other methods perform better.
Table \ref{selectionRes} presents some fscore statistics obtained from binarizing the DIBCO dataset. The first line corresponds to the best theoretical fscores (having the ground truth, we know for each image the binarization method that will provide the best fscore). The second line corresponds to the fscores obtained using only Shijian's method. The last line corresponds to the fscores obtained using our automatic binarization selection.
We analyse the accuracy of our binarization method selection algorithms in several ways. First, the method has a slightly better (2\%) mean accuracy than using only Shijian's method. Importantly, note that our algorithm has a higher global accuracy (the standard deviation equals $0.04$). Last, the worst binarization result of our method is much higher than Shijian's (56\%).