Commits

vialard committed 1dfe788

Version finale ?

Comments (0)

Files changed (1)

ICDAR2013/ICDAR2013_paper/icdar.tex

 
 \title{Quality evaluation of ancient digitized documents for binarization prediction}
 
-\author{\IEEEauthorblockN{Rabeux Vincent}
+\author{\IEEEauthorblockN{Vincent Rabeux}
 \IEEEauthorblockA{University of Bordeaux\\
 LaBRi\\
 Bordeaux\\
 rabeux@labri.fr}
 \and
-\IEEEauthorblockN{Journet Nicholas}
+\IEEEauthorblockN{Nicholas Journet}
 \IEEEauthorblockA{University of Bordeaux\\
 LaBRi\\
 Bordeaux\\
 Bordeaux\\
 vialard@labri.fr}
 \and
-\IEEEauthorblockN{Domenger Jean Philippe}
+\IEEEauthorblockN{Jean-Philippe Domenger}
 \IEEEauthorblockA{University of Bordeaux\\
 LaBRi\\
 Bordeaux\\
  
 
 \section{Characterization of the degradation layer }
+This section details new features used to characterize document image degradation. A first set of global features is extracted directly from grayscale histograms without spatial consideration. A second set of features characterizes the localization of the degradation.
+
+
 \subsection{The degradation layer extraction} 
+
 \label{layers}
 
 
-In this article, we do not measure each type of degradation separately. Indeed, we assume that an historical document can be modeled as the diffusion of several information layers. Most degradations altering a binarization result are in a layer composed of gray pixels different from ink and background pixels (bleed-through, spots, speckles, non-uniform illumination, ink loss, ...). That's why we globally measure and characterize document degradations by distinguishing three different layers of pixels : 
+We assume that an ancient document can be modeled as the combination of three different layers: the text pixel layer, the background pixel layer and the degradation pixel layer. Most of the degradation (for example, bleed-through, spots, speckles, non-uniform illumination, ink loss) appears as connected components with grayscale values that differ from background and ink pixels. We distinguish the three different layers of pixels according to the pixels' gray level. Let us denote the gray level of pixel $p$ by $g(p)$. Let $\inkp$ be the set of ink pixels, $\degp$ be the set of degradation pixels and $\backp$ be the set of background pixels defined as follows:
+
+
 \begin{enumerate}
-	\item really dark pixels (most of the ink pixels) below $ s_{0} $.
-	\item gray pixels (degradations) between $ s_{0} $ and $ s_{1} $.
-	\item background pixels higher than $ s_{1} $.
+
+	\item  $\inkp  =   \{  p, g(p) \leq s_{0} \}$  ink layer
+
+	\item  $\degp  =   \{ p,  s_{0} < g(p) < s_{1} \}$  degradation layer
+
+	\item  $\backp   =    \{ p,  g(p) \geq s_{1} \}$  background layer
+
 \end{enumerate}
- 
- 
- Setting the two thresholds $ s_{0} $ and $ s_{1} $ can be determined using any classification algorithm. Our experiments used a 3-means clustering algorithm. We do not aim to enhance the document with a pixel close identification. Therefore, the layers dissociation does not need to be highly accurate. Table \ref{measuresExamplesOnRealImages} shows an example of a 3-means clustering algorithm applied on a image with huge defects.
-Let us denote by $g(p)$ the intensity of the pixel $p$.
-Once the three different layers are dissociated, we extract the three corresponding sets of pixels : $\inkp =  \{  p, g(p) \leq s_{0} \}$; $\degp = \{ p,  s_{0} < g(p) < s_{1} \}$;$ \backp =  \{ p,  g(p) \leq s_{1} \}$.
-From these sets of pixels, we define three sets of values corresponding to the pixels intensities :
-$\inkInt =   \{  g(p), p \in \inkp  \}$; $ \grayInt =   \{ g(p), p \in \degp  \}$; $ \backInt  =    \{ g(p), p \in \backp \}$.
-Some measures are also based on sets of 4-connected components.
-Let $S$ be a set of pixels. We denote the set of the 4-connected components of $S$ by $CC(S)$. In the rest of the section, we use the following notations : $\inkc = CC(\inkp)$, $\dc = CC(\degp)$ and $\backc = CC(\backp)$.
- 
 
 
+Setting the two thresholds $ s_{0} $ and $ s_{1} $ can be determined using any classification algorithm. Our experiments used a 3-means clustering algorithm. Table \ref{measuresExamplesOnRealImages} shows that most degradation present in a document image can be extracted using these two thresholds.% Obviously, it is not possible to perfectly classify the image pixels using only the gray-level histogram.
 
-%\section{Characterization of the degradation layer }
 
-The following sections detail our proposition for characterizing the degradation layer previously extracted. A first group of features is extracted directly from the grayscale histogram. A second family of features is dedicated to the characterization of the local degradation surrounding ink components.
 
 
 \subsection{Global Features}
 \label{measures}
 
 
-The global grayscale histogram contains information characterizing the document's quality. We aim to compute the following global statistic features of the grayscale histogram: mean, variance and skewness.  We denote the mean of the global histogram by $\mu$, its variance by $v$, and its skewness by $s$. The mean, variance and skewness are also computed on the three \emph{sub-histograms} to characterize each layer distribution (ink, background and degradation).   
-This step provides 12 features :  $\mu$, $v$, $s$, $\mu_{\inkInt}$,  $v_{\inkInt}$, $s_{\inkInt}$, $\mu_{\grayInt}$,  $v_{\grayInt}$, $s_{\grayInt}$, $\mu_{\backInt}$, $v_{\backInt}$, $s_{\backInt}$.
+%The global grayscale histogram contains information characterizing document quality. 
 
-The previous global measures cannot precisely represent the relation between the ink layer, the degradation layer and the background layer. Therefore, we define two feature $ \mIInk $ and $ \mIBack $, where $ \mIInk $ corresponds to the distance between the average intensity of degradation pixels and the average intensity of ink pixels, and, $ \mIBack $ is the distance  between the average intensity of degradation pixels and the average intensity of background pixels.
+We compute the following global statistic features of the grayscale histogram: mean, variance and skewness. We denote the mean of the global histogram by $\mu$, its variance by $v$, and its skewness by $s$. The mean, variance and skewness are also computed on the three \emph{sub-histograms} to characterize each layer distribution (ink, background and degradation): 
+
+\begin{itemize}
+
+	\item $\mu$, $v$, $s$ (global histogram)
+
+	\item $\mu_{\inkp}$,  $v_{\inkp}$, $s_{\inkp}$ (ink histogram)
+
+	\item $\mu_{\degp}$,  $v_{\degp}$, $s_{\degp}$ (degradation histogram)
+
+	\item $\mu_{\backp}$, $v_{\backp}$, $s_{\backp} $ (background histogram)
+
+\end{itemize}
+
+
+The previous global features characterizing the histograms cannot precisely represent the relationship between the ink layer, the degradation layer and the background layer. Therefore, we introduce two last global features extracted from the grayscale histogram to characterize the distance between the three layers : $ \mIInk $ and $ \mIBack $, where $ \mIInk $ corresponds to the distance between the average intensity of degradation pixels and the average intensity of ink pixels and, $ \mIBack $ is the distance between the average intensity of degradation pixels and the average intensity of background pixels. 
+
 $$
 \begin{array}{cc}
+
 			
-		\mIInk     =  \displaystyle\frac{\moy{\grayInt} - \moy{\inkInt}}{255}
+
+		\mIInk     =  \displaystyle\frac{\moy{\degp} - \moy{\inkp}}{255}
+
 		
+
 		&
-		\mIBack = \displaystyle\frac{\moy{\backInt} - \moy{\grayInt}}{255} 
+
+		\mIBack = \displaystyle\frac{\moy{\backp} - \moy{\degp}}{255} 
+
 		\\
+
 \end{array}
 $$
-%Figure \ref{mii} illustrates the value range of  $ \mIInk $ and $ \mIBack $ on a simple example.
-%\begin{figure}[htbp]
-%\begin{center}
-%a. \includegraphics[width=70px]{imgs/mii-mib-80.png}
-%b. \includegraphics[width=70px]{imgs/mii-mib-180.png}
-%\caption{Two different images illustrating $\mIInk$ and $\mIBack$. a. contains one  dark spot, the distance to the ink is low $\mIInk =  (80 - 0) / 255 = 0.3 $ the distance to the background is high $ \mIBack = (255-80)/255 = 0.7$
-%b. contains the same spot but with a lighter gray value, the distance to the ink is high $\mIInk =  (180 - 0) / 255 = 0.7 $ the distance to the background lowers $ \mIBack = (255-180)/255 = 0.3$.
-%}
-%\label{mii}
-%\end{center}
-%\end{figure}
-%
 
-The amount of degradation pixels is also directly correlated with the binarization performance. 
-We aim to measure this performance as the relative quantity of ink and degradation pixels. We define $ \mQ  $ as the following ratio : $	\mQ  =  \frac{\card{\degp}}{\card{ \inkp }}$
 
-%Figure \ref{MQ} illustrates the value range of  $ \mQ $ regarding to the ratio between the quantity of ink pixels and degraded pixels.
-%
-%
-%\begin{figure}[htbp]
-%
-%\begin{center}
-%a. \includegraphics[width=70px]{imgs/mq-low.png}
-%b. \includegraphics[width=70px]{imgs/mq-high.png}
-%\caption{$\mQ$ example on two images :  a. does not contain a lot of noise, $\mQ$ is low : $\mQ =  1 / 9 = 0.1 $, b. has much more degradation pixels $\mQ$ is higher : $\mQ  = (1+3+14)/9=2 $ 
-%}
-%\label{MQ}
-%\end{center}
-%\end{figure}
+The gray-values of the three layers are not the only characteristics that could affect a binarization algorithm. The amount of degradation pixels is also directly correlated with the binarization performance. 
+
+We measure this performance as the relative quantity of ink and degradation pixels. We define $ \mQ  $ as the following ratio : $\mQ  =  \frac{\card{\degp}}{\card{ \inkp }}$.
+
+
 
 \subsection{Spatial deformation features}
 
-The location of the degradation pixels is also a significant characteristic that needs to be considered and measured. Figure \ref{locations} illustrates the three main situations seen in real documents were the degradation pixels spatially interfere with ink pixels. 
+As a good binarization should preserve the shape of the objects and avoid the creation of unwanted black or white components, the location of the degradation pixels is a significant characteristic that can influence the binarization result. Figure \ref{locations} illustrates the main situations observed in real documents in which the degradation pixels spatially interfere with ink pixels.
+
+
+Let $S$ be a set of pixels. We denote the set of the 4-connected components of $S$ by $CC(S)$. In the rest of the section, we use the following notations : $\inkc = CC(\inkp)$, $\dc = CC(\degp)$ and $\backc = CC(\backp)$.
+
 
 \begin{figure}[htbp]
+
 \begin{center}
 a.\includegraphics[width=70px]{imgs/mA.png}
 b.\includegraphics[width=70px]{imgs/mS.png}
 c.\includegraphics[width=70px]{imgs/mSG.png}
 \caption{The  different locations of a degradation component on the page: a. the degradation component is not connected to an ink component, b. a small degradation component is adjacent to an ink component, c. a large degradation component is adjacent to an ink component.}
+
 \label{locations}
+
 \end{center}
+
 \end{figure}
 
-More precisely, from these observations, we propose to compute three different measures.
-Let $ c_{0} $ be an ink component and $ c_{1} $ be a degradation component. 
-Let us also denote by $\n(c)$ the neighboring pixels of the connected component $c$ :
-$
-\n(c) = \{p \notin c \mid \exists q \in c, p \mbox{~and~} q \mbox{~are 4-connected}\}
-$.
-We distinguish three different cases that can produce different type of binarization errors.
 
+Let  $c_{\inkp} \in \inkc$  be an ink component and  $c_{\degp} \in \dc$ be a degradation component. We denote the predicate returning true by $SG(c_{\inkp}, c_{\degp})$  if $ c_{\inkp}  $ and $ c_{\degp}  $ are connected~:
 
-\textbf{Case 1 :}  If  $c_{\inkp}$ and  $c_{\degp}$  are not connected (figure \ref{locations}.a), the original character will not be altered by the binarization process. If this configuration occurs numerous times, the binarization can lead to a document image highly degraded by many small black spots between characters. Let $\cma$ be  the set of degradation components that are not connected to any ink component~:
- $$
- \cma = \{c_{\degp} \in \dc \mid \forall p_{\degp} \in c_{\degp},  \nexists c_{\inkp} \in \inkc, p_{\degp} \in \n(c_{\inkp}) \} 
+$$ SG (c_{\inkp}, c_{\degp}) =  \exists (p_{\inkp}, p_{\degp}) \in c_{\inkp} \times c_{\degp} \mid p_{\inkp} \mbox{~and~} p_{\degp} \mbox{~are 4-connected}$$ 
+
+We distinguish three different cases that can produce different types of binarization errors~: 
+
+
+\begin{enumerate}
+
+\item  If  $c_{\inkp}$ and  $c_{\degp}$  are not connected (figure \ref{locations}.a), the original character will not be altered by the binarization process. If this configuration occurs numerous times, the binarization can lead to a document image highly degraded by many small black spots between characters. Let $\cma$ be  the set of degradation components that are not connected to any ink component~:
+
 $$
-The feature $\mA$ is defined as : $ \mA = \displaystyle\frac{ \card{ \cma } }{ \card{\inkc} } $
-
-\textbf{Case 2 :} %if  $ c_{0} $ and $ c_{1} $ are connected (figure \ref{locations}.b), the original letter will be altered. %The quantity of touching connected components is measured by $ \mS $.
-If  $c_{\inkp}$ and $c_{\degp}$ are connected (Figure \ref{locations}.b), the original character may be altered by the binarization: degraded pixels may be misclassified as ink pixels. Let $\cms$ be the set of all ink components that are connected to at least one degradation component: 
-$$
-\cms = \{ c_{\inkp} \in \inkc \mid \exists p_{\inkp} \in c_{\inkp} \mbox{~and~} c_{\degp} \in \dc, p_{\inkp} \in \n(c_{\degp}) \}
+\cma = \{c_{\degp} \in \dc \mid \forall c_{\inkp} \in \inkc, SG (c_{\inkp}, c_{\degp})=false \}
 $$
 
- The feature $ \mS $ is defined as the ratio between the number of ink components that may be expended by at least one degradation component and the total number of ink components : $ \mS = \displaystyle\frac{ \card{ \cms } }{ \card{  \inkc}}  $
 
+The relative quantity of non-connected ink and degradation components is measured by $ \mA $~:
 
+$$ \mA = \displaystyle\frac{ \card{ \cma } }{ \card{\inkc} } $$
 
-\textbf{Case 3 :}  $ \mSG $ measures the possible extent of ink component deformation using the number of known ink components that may be modified by the binarization process. It is defined as the mean area of the pairs of components that satisfy $ SG $ over the mean area of all ink components. Let $ c_{\inkp} $ be an ink connected component and $ c_{\degp} $ a degradation connected component. We denote by $SG$ the predicate returning true if $ c_{\inkp}  $ and $ c_{\degp}  $ are touching :
-$$ SG (c_{\inkp}, c_{\degp}) =  \exists p_{\inkp} \in c_{\inkp}, p_{\inkp} \in \n(c_{\degp})$$
+%The range of this feature depends on the image size, but it can still be used to create a prediction model.% This is discussed in section \ref{prediction}.
 
-$\mSG$ can now be defined as the mean area of all connected components that satisfies $ SG $ over the mean area of all ink components : 
+
+
+\item If  $c_{\inkp}$ and $c_{\degp}$ are connected (Figure \ref{locations}.b), the original character may be altered by the binarization: degraded pixels may be misclassified as ink pixels. Let $\cms$ be the set of all ink components that are connected to at least one degradation component: 
 
 $$
-\mSG  =  \frac{\displaystyle {Average}_{\{ (c_{\inkp}, c_{\degp}), SG (c_{\inkp}, c_{\degp})\}} (\card{c_{\inkp}} + \card{c_{\degp}})}{\displaystyle {Average}_{c_{\inkp} \in \inkc} (\card{c_{\inkp}}) }  $$
- 
-The higher $ \mSG $ is, the more likely it is that the document has large spots around ink components. Combined with other features (for example, $ \mIInk $), $ \mSG $ helps predict whether the spots lead to binarization errors.
+\cms = \{ c_{\inkp} \in \inkc \mid \exists c_{\degp} \in \dc, SG(c_{\inkp}, c_{\degp}) \}
+$$
+
+The feature $ \mS $ is defined as the ratio between the number of ink components that may be expended by at least one degradation component and the total number of ink components:
+
+$$ \mS = \displaystyle\frac{ \card{ \cms } }{ \card{  \inkc}}  $$
+
+
+\item  $ \mSG $ measures the possible extent of ink component deformation using the number of known ink components that may be modified by the binarization process. It is defined as the mean area of the pairs of components that satisfy $ SG $ over the mean area of all ink components: 
+
+
+$$
+\mSG  =  \frac{\displaystyle {Average}_{\{ (c_{\inkp}, c_{\degp})\mid SG (c_{\inkp}, c_{\degp})\}} (\card{c_{\inkp}} + \card{c_{\degp}})}{\displaystyle {Average}_{c_{\inkp} \in \inkc} (\card{c_{\inkp}}) }
+$$
+
  
 
-%Table \ref{locationExemples} shows the values of the three spatial derfomation features on the examples of figure \ref{locations}. With all the previously defined features, each document image is characterized by vector of dimension $18$.
-%
-%\begin{table}[htdp]
-%\begin{center}
-%\begin{tabular}{|c|c|c|c|}
-%\hline
-% & Figure \ref{locations}.a & Figure \ref{locations}.b & Figure \ref{locations}.c \\
-% \hline
-%  $ \mA $    & $ 1 $ & 1 & 0 \\
-%  $ \mS $    & 0 & 1 & 1 \\
-%  $ \mSG $ & 0  & 1.3 & 2.2 \\
-%\hline
-%
-%\end{tabular}
-%\end{center}
-%\caption{Example of the location measures on previous examples images. $ \mA $ is none 0 in  only the first case. $ \mSG $ is higher the less the two connected components have pixels in common.}
-%\label{locationExemples}
-%\end{table}%
+The higher $ \mSG $ is, the more likely it is that the document has large spots around ink components. Combined with other features (for example, $ \mIInk $), $ \mSG $ helps predict whether the spots lead to binarization errors.
 
 
-%\subsection{Measures example on real life images}
-%
-%
+\end{enumerate}
+
+
+Given all of the previously defined features, each document image is characterized by a vector of dimension $18$. An example is given in Table~\ref{measuresExamplesOnRealImages} which shows the degradation extraction and the values of the proposed features on one document image. The analysis of these values indicates that it may be preferable to use Sauvola's method to binarize this image. Indeed, the values of $\mIInk$ and $\mIBack$ are low meaning that a global thresholding method like Otsu's is likely to fail to correctly classify the pixels.  The value of $\mSG$ is also high : there are large spots around the characters. Window-based method have, most of the time, better results on this kind of documents.  This hypothesis is confirmed with the f-score of Otsu's and Sauvola's methods. On this image, Ostu makes a score of 0.4 and Sauvola of 0.7.
+
+
+
+%% The location of the degradation pixels is also a significant characteristic that needs to be considered and measured. Figure \ref{locations} illustrates the three main situations seen in real documents were the degradation pixels spatially interfere with ink pixels. 
+
+
+%% \begin{figure}[htbp]
+
+%% \begin{center}
+
+%% a.\includegraphics[width=70px]{imgs/mA.png}
+
+%% b.\includegraphics[width=70px]{imgs/mS.png}
+
+%% c.\includegraphics[width=70px]{imgs/mSG.png}
+
+%% \caption{The  different locations of a degradation component on the page: a. the degradation component is not connected to an ink component, b. a small degradation component is adjacent to an ink component, c. a large degradation component is adjacent to an ink component.}
+
+%% \label{locations}
+
+%% \end{center}
+
+%% \end{figure}
+
+
+%% More precisely, from these observations, we propose to compute three different measures.
+
+%% Let $ c_{0} $ be an ink component and $ c_{1} $ be a degradation component. 
+
+%% Let us also denote by $\n(c)$ the neighboring pixels of the connected component $c$ :
+
+%% $
+
+%% \n(c) = \{p \notin c \mid \exists q \in c, p \mbox{~and~} q \mbox{~are 4-connected}\}
+
+%% $.
+
+%% We distinguish three different cases that can produce different type of binarization errors.
+
+
+
+%% \textbf{Case 1 :}  If  $c_{\inkp}$ and  $c_{\degp}$  are not connected (figure \ref{locations}.a), the original character will not be altered by the binarization process. If this configuration occurs numerous times, the binarization can lead to a document image highly degraded by many small black spots between characters. Let $\cma$ be  the set of degradation components that are not connected to any ink component~:
+
+%%  $$
+
+%%  \cma = \{c_{\degp} \in \dc \mid \forall p_{\degp} \in c_{\degp},  \nexists c_{\inkp} \in \inkc, p_{\degp} \in \n(c_{\inkp}) \} 
+
+%% $$
+
+%% The feature $\mA$ is defined as : $ \mA = \displaystyle\frac{ \card{ \cma } }{ \card{\inkc} } $
+
+
+%% \textbf{Case 2 :} %if  $ c_{0} $ and $ c_{1} $ are connected (figure \ref{locations}.b), the original letter will be altered. %The quantity of touching connected components is measured by $ \mS $.
+
+%% If  $c_{\inkp}$ and $c_{\degp}$ are connected (Figure \ref{locations}.b), the original character may be altered by the binarization: degraded pixels may be misclassified as ink pixels. Let $\cms$ be the set of all ink components that are connected to at least one degradation component: 
+
+%% $$
+
+%% \cms = \{ c_{\inkp} \in \inkc \mid \exists p_{\inkp} \in c_{\inkp} \mbox{~and~} c_{\degp} \in \dc, p_{\inkp} \in \n(c_{\degp}) \}
+
+%% $$
+
+
+%%  The feature $ \mS $ is defined as the ratio between the number of ink components that may be expended by at least one degradation component and the total number of ink components : $ \mS = \displaystyle\frac{ \card{ \cms } }{ \card{  \inkc}}  $
+
+
+
+
+%% \textbf{Case 3 :}  $ \mSG $ measures the possible extent of ink component deformation using the number of known ink components that may be modified by the binarization process. It is defined as the mean area of the pairs of components that satisfy $ SG $ over the mean area of all ink components. Let $ c_{\inkp} $ be an ink connected component and $ c_{\degp} $ a degradation connected component. We denote by $SG$ the predicate returning true if $ c_{\inkp}  $ and $ c_{\degp}  $ are touching :
+
+%% $$ SG (c_{\inkp}, c_{\degp}) =  \exists p_{\inkp} \in c_{\inkp}, p_{\inkp} \in \n(c_{\degp})$$
+
+
+%% $\mSG$ can now be defined as the mean area of all connected components that satisfies $ SG $ over the mean area of all ink components : 
+
+
+%% $$
+
+%% \mSG  =  \frac{\displaystyle {Average}_{\{ (c_{\inkp}, c_{\degp}), SG (c_{\inkp}, c_{\degp})\}} (\card{c_{\inkp}} + \card{c_{\degp}})}{\displaystyle {Average}_{c_{\inkp} \in \inkc} (\card{c_{\inkp}}) }  $$
+
+ 
+
+%% The higher $ \mSG $ is, the more likely it is that the document has large spots around ink components. Combined with other features (for example, $ \mIInk $), $ \mSG $ helps predict whether the spots lead to binarization errors.
+
+ 
+
+
+
 \begin{center}
+
 \begin{table*}[!htdp]
+
 {\scriptsize
+
 \hfill{}
+
 \begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}
+
 \hline
+
 	 \multicolumn{5}{|c|}{Image}  & \multicolumn{6}{c|}{GrayScale Histogram}  & \multicolumn{7}{c|}{3-mean clusters}   \\
 
+
 \hline
+
 	 \multicolumn{5}{|c|}{}  & \multicolumn{6}{c|}{}  & \multicolumn{7}{c|}{}   \\
+
 	 \multicolumn{5}{|c|}{\includegraphics[width=140px]{imgs/H04-2.png}} &
+
 	 \multicolumn{6}{|c|}{\includegraphics[width=140px]{imgs/H04-2-histo.png}} &
+
 	 \multicolumn{7}{|c|}{\includegraphics[width=140px]{imgs/H04-2-seg.png}} \\
+
 	 
+
 \hline	 
+
 	 $\mIInk $ & $\mIBack$ & $\mQ$ & $ \mA $ & $ \mS $ & $ \mSG $ & $ s_{i} $ & $ s_{g} $ & $ s_{b} $ & $ v_{i} $ & $ v_{g} $ &  $ v_{b}$ & $ \mu_{i} $ & $ \mu_{g} $ & $ \mu_{b} $  & s          & v & $ \mu $ \\
+
 	 0.2  		& 	0.1		& 0.3 	&	0.05	&	0.2	&	3,6	&	-0.4	&  -0.05    & -0.5          &  741 	&  392         &    161       &     66            &        135          &        199       &  -1.25    &  2065  & 171              \\
+
 	
+
 \hline		
-%\hline
-%
-%	 \multicolumn{5}{|c|}{Image}  & \multicolumn{6}{c|}{GrayScale Histogram}  & \multicolumn{7}{c|}{3-mean clusters}   \\
-%
-%\hline
-%	 \multicolumn{5}{|c|}{}  & \multicolumn{6}{c|}{}  & \multicolumn{7}{c|}{}   \\
-%	 \multicolumn{5}{|c|}{\includegraphics[width=140px]{imgs/H03.png}} &
-%	 \multicolumn{6}{|c|}{\includegraphics[width=140px]{imgs/H03-histo.png}} &
-%	 \multicolumn{7}{|c|}{\includegraphics[width=140px]{imgs/H03-seg.png}} \\
-%	 
-%\hline	 
-%	 $\mIInk $ & $\mIBack$ & $\mQ$ & $ \mA $ & $ \mS $ & $ \mSG $ & $ s_{i} $ & $ s_{g} $ & $ s_{b} $ & $ v_{i} $ & $ v_{g} $ &  $ v_{b}$ & $ \mu_{i} $ & $ \mu_{g} $ & $ \mu_{b} $  & s & v & $ \mu $ \\
-%	0.13   	& 	0.2		&  0.03	&	0.3	&	0.2	&	1.4	&  -0.6  &    -0.02      &  -0.5	&     257     &  206          &         30    &       98         &       146       &   189               & -3     & 356  & 185    \\
-%	
-%\hline
-%
+
 \end{tabular}
+
 } \hfill{}
 
-\caption{Example on image from the DIBCO dataset.}
+
+\caption{Example on an image from the DIBCO dataset : extraction of the degradation layer and features values.}
+
 \label{measuresExamplesOnRealImages}
+
+\end{table*}
+
+\end{center}
+
 %
-\end{table*}
-\end{center}
+
 %
-%
-The table \ref{measuresExamplesOnRealImages} shows the degradation extraction and the values of the presented features on one document image. The manual analysis of these values indicates that it may be preferable to use Sauvola's for this image. Indeed, the values of $\mIInk$ and $\mIBack$ are low meaning that a global thresholding method like Otsu's is likely to fail to correctly classify the pixels.  The value of $\mSG$ is also high : there are large spots around the characters. Window-based method have, most of the time, better results on this kind of documents.  This hypothesis is confirmed with the f-score of Otsu's and Sauvola's methods. On this image, Ostu makes a score of 0.4 and Sauvola of 0.7.
+
   
 
 
+
+
 \section{Predicting binarization methods accuracy}
 \label{prediction}