481af53

committed
# Commits

# Comments (0)

# Files changed (6)

# IJDAR/imgs/diffMethodesBinar/Otsu.png

##### Added

# IJDAR/imgs/diffMethodesBinar/Shijian.png

##### Added

# IJDAR/imgs/diffMethodesBinar/original.png

##### Added

# IJDAR/imgs/diffMethodesBinar/sauvola.png

##### Added

# IJDAR/prediction.tex

The methodology previously explained allows the creation of an accurate prediction model for any binarization method. Given a document image, a binarization method and its prediction model, we can compute all of the features required by the model and use them as inputs. The result is the predicted accuracy of this specific binarization method for this specific image.

+\caption{Sophisticated binarization algorithms do not always propose the best output : a. the original image, b. the Shijian binarization output, c. the Sauvola binarization output, d. the Otsu binarization output. Ostu's algorithm gives the best performances on this specific image.}

Given several binarization prediction models, we can create a binarization process that uses these prediction models to select the optimal binarization method for each image of a dataset.

Table \ref{selectionRes} presents some f-score statistics obtained from binarizing the DIBCO dataset. The first line corresponds to the best theoretical f-scores (having the ground truth, we know for each image the binarization method that will provide the best f-score). The second line corresponds to the f-scores obtained using only Shijian's method. The last line corresponds to the f-scores obtained using our automatic binarization selection.

We analyse the accuracy of our binarization method selection algorithms in several ways. First, the method has a slightly better (2\%) mean accuracy than using only Shijian's method. Importantly, note that our algorithm has a higher global accuracy (the standard deviation equals $0.04$). Last, the worst binarization result of our method is much higher than Shijian's (56\%).

Second, we compared our method with the optimal selection that we can compute from the ground truth. The results are very similar, indicating that the prediction models are accurate enough to select the best binarization method for each image (70\% perfect match). The mean error of our method is $0.009$ (standard deviation equals $0.02$), and, the worst error equals $0.06$.

\caption{Binarization of the DIBCO dataset. Comparison between the best theoretical f-score (computed from the ground truth), f-scores obtained using only Shijian's method and f-scores obtained from our automatic selection.}