An evaluation survey of binarization algorithms on historical documents.
ABSTRACT Document binarization is an active research area for many years. There are many difficulties associated with satisfactory binarization of document images and especially in cases of degraded historical documents. In this paper, we try to answer the question ldquohow well an existing binarization algorithm can binarize a degraded document image?rdquo We propose a new technique for the validation of document binarization algorithms. Our method is simple in its implementation and can be performed on any binarization algorithm since it doesnpsilat require anything more than the binarization stage. Then we apply the proposed technique to 30 existing binarization algorithms. Experimental results and conclusions are presented.
- SourceAvailable from: Halima Bahi
Conference Paper: An MLP for binarizing images of old manuscripts[Show abstract] [Hide abstract]
ABSTRACT: Ancient Arabic manuscripts' processing and analysis are very difficult tasks and are likely to remain open problems for many years to come. In this paper we tackle the problem of foreground/background separation in old documents. Our approach uses a back-propagation neural network to directly classify image pixels according to their neighborhood. We tried several multilayer Perceptron topologies and found experimentally the optimal one. Experiments were run on synthetic data obtained by image fusion techniques. The results are very promising compared to state-of-the-art techniques.International Conference on Frontiers in Handwriting Recognition 2012, Bari - Italy; 09/2012
- [Show abstract] [Hide abstract]
ABSTRACT: A necessary step for the recognition of scanned documents is binarization, which is essentially the segmentation of the document. In order to binarize a scanned document, we can find several algorithms in the literature. What is the best binarization result for a given document image? To answer this question, a user needs to check different binarization algorithms for suitability, since different algorithms may work better for different type of documents. Manually choosing the best from a set of binarized documents is time consuming. To automate the selection of the best segmented document, either we need to use ground-truth of the document or propose an evaluation metric. If ground-truth is available, then precision and recall can be used to choose the best binarized document. What is the case, when ground-truth is not available? Can we come up with a metric which evaluates these binarized documents? Hence, we propose a metric to evaluate binarized document images using eigen value decomposition. We have evaluated this measure on DIBCO and H-DIBCO datasets. The proposed method chooses the best binarized document that is close to the ground-truth of the document.Proc SPIE 01/2013;
- [Show abstract] [Hide abstract]
ABSTRACT: Document image binarization is of great importance in the document image analysis and recognition pipeline since it affects further stages of the recognition process. The evaluation of a binarization method aids in studying its algorithmic behaviour and verifying its effectiveness by providing qualitative and quantitative indication of its performance. This work concerns a pixel-based binarization evaluation methodology for historical handwritten/machine-printed document images. In the proposed evaluation scheme, the Recall and Precision evaluation measures are properly modified using a weighting scheme that diminishes any potential evaluation bias. Additional performance metrics of the proposed evaluation scheme consist of the percentage rates of broken and missed text, false alarms, background noise, character enlargement and merging. Several experiments conducted in comparison with other pixel-based evaluation measures, demonstrate the validity of the proposed evaluation scheme.IEEE Transactions on Image Processing 09/2012; · 3.20 Impact Factor
An Evaluation Survey of Binarization Algorithms on Historical Documents
Pavlos Stathis1, Ergina Kavallieratou2 and Nikos Papamarkos1
1Image Processing and Multimedia Laboratory
Department of Electrical & Computer Engineering
Democritus University of Thrace, 67100 Xanthi, Greece
2Dept. of Information and Communication Systems Engineering
University of the Aegean, 83200 Karlovassi, Samos, Greece
Document binarization is an active research area
for many years. There are many difficulties associated
with satisfactory binarization of document images and
especially in cases of degraded historical documents.
In this paper, we try to answer the question “how well
an existing binarization algorithm can binarize a
degraded document image?” We propose a new
technique for the validation of document binarization
algorithms. Our method is simple in its implementation
and can be performed on any binarization algorithm
since it doesn’t require anything more than the
binarization stage. Then we apply the proposed
technique to 30 existing binarization algorithms.
Experimental results and conclusions are presented.
The document binarization is a preprocessing task,
very useful to document processing systems. It
automatically converts the document images in a bi-
level form in such way that the foreground information
is represented by black pixels and the background by
This simple procedure has been proved to be a very
difficult task, especially in the case of historical
documents where very specialized problems arise such
as variation in contrast and illumination, smearing and
smudging of text, seeping of ink to the other side of
the page, and general degradation of the paper and ink
due to aging.
Several algorithms have been proposed for the
document binarization task [9-38]. However, the
selection of the most appropriate one is not a simple
procedure. The evaluation and comparison of these
algorithms is proved to be another difficult task since
there is no objective way to compare the results.
Leedham et al.  compared five binarization
algorithms by using the precision and recall analysis of
the resultant words in the foreground. He et al. 
compared six algorithms by evaluating their effect on
end-to-end word recognition performance utilizing a
commercial OCR engine. Sezgin and Sankur 
described 40 thresholding algorithms and categorized
them according to the used information content. They
measured and ranked their performance comparatively
in two different contexts of images.
All the mentioned works presented some very
interesting conclusions. However, the problem is that
almost in every case, they try to use results from
ensuing tasks in document processing hierarchy, in
order to estimate the performance of the binarization
algorithm. In case of historical documents where their
quality obstructs the recognition, and sometimes the
word segmentation as well, this method of evaluation
can be proved problematic. On the other hand, we need
a different, more direct evaluation technique dealing
only with the binarization stage.
The ideal way of evaluation should be able to
decide, for each pixel, if it has finally succeeded the
right color (black or white) after the binarization. This
task is implemented in this paper in an automatic way.
A wide range of binarization algorithms, from the
oldest  till the newest ones , are examined and
their performance is measured using a collection of
artificial historical documents. More information about
the tested algorithms can be found in the respective
references [9-38]. The proposed procedure is described
in detail in the section 2, while the experimental results
and the conclusions are given in sections 3 and 4,
978-1-4244-2175-6/08/$25.00 ©2008 IEEE
2. The proposed technique
Our experiments were performed on artificial
historical documents that imitate the common
problems of historical documents. The artificial
historical documents were constructed by using
techniques of image mosaicing and combining old
blank document pages with noise-free pdf documents.
This way, during the evaluation, it is possible to
objectively decide for every single pixel if its value is
correct comparing its color with the corresponding
pixel in the original pdf document.
Two sets of images were combined by using image
mosaicing techniques. The first set consists of ten
document images in pdf format, including tables,
graphics, columns, and many of the typical elements
that can be found in a document. The second set
consists of fifteen old blank images, taken from a
digitized document archive of the 18th century. These
documents include most kinds of problems that can be
met in old documents: presence of stains, background
of big variations and uneven illumination, etc.
The two sets were combined by applying image
mosaicing superimposing techniques for blending .
Two different sets of 150 document images each were
prepared. In more detail, we used as target images the
pdf documents and we resized all the noise images to
A4 size. Then, we used two different techniques for
the blending: the maximum intensity and the image
In the first case, the maximum intensity technique
(max_int), the new image was constructing by
selecting for each pixel in the new image the darkest
corresponding pixel of the two images. This means
that in case of foreground pixels, the pdf document
image will have an advantage over the noise image. On
the other hand, in the background pixels, the noise
image will have an advantage since it is almost always
darker than the document background that is
absolutely white. This technique has a good optical
result but it is not very natural. The foreground would
be always the darkest, since it is not affected at all
from the noise. This setting permits us to check how
much of the background can be detracted by a
However, in order to have a more natural result, we
also used the image averaging technique (ave-int),
where each pixel in the new image is the average of
the two corresponding ones in the original images. In
this case, the result presents a lighter background than
that of the maximum intensity technique but the
foreground is also affected by the image noise level.
3. Experimental results
In the evaluation, we used statistical measures of
image quality description . More specifically, we
used the square error (MSE), the signal to noise ratio
(SNR) and the peak signal to noise ratio (PSNR).
However, as already mentioned, our intention was to
be able to check for every pixel if it is right or wrong
and we also used this measure for the evaluation. Thus,
we call pixel error the total amount of pixels of the
output image that has been assigned to incorrect color:
that is, black if white in original document and
viceversa. Thus, the pixel error rate (PERR) will be:
Let x(i,j) represent the value of the i-th row and j-th
column pixel in the original document and let y(i,j)
represent the value of the corresponding pixel in the
output image y:MxN. Since we deal with black and
white images, both values will be either 0 (black) or
255 (white). The local error is e(i,j)=x(i,j)-y(i,j) and
the total square error rate:
Notice that if a pixel has been assigned to the right
color, the value of e(i,j)2 will be 0, while in case the
pixel is assigned to the wrong color it will be 2552.
Thus, taking into account the PERR definition, it will
SNR  is defined as the ratio of average signal
power to average noise power and for an MxN image it
) , (
The peak measure, PSNR, depends on the word-
length of the image pixels, and is defined as the ratio
of peak signal power to average noise power. For 8-bit
images, as in our case, it is:
Table 1. The resulted measures for max-int and ave-int techniques
ALGORITHMS MAX-INT TECHNIQUE
MSE SNR PSNR
SNR PERR MSE PSNR PERR
We applied all the methods to both sets described in
section 2. The pixels that changed value (white-to-
black or vice versa) were counted by comparing the
output image with the original pdf document image. It
should be mentioned that the majority of the pixel
errors are white-to-black conversions with a max of
0.02‰ black-to-white conversion in both techniques.
In table 1 you can see all the above mentioned
measures for max-int and ave-int techniques and we
can remark the following:
1) The majority of the algorithms (21 out of 30)
have better performance on the test of ave-int
technique. This way we can distinguish the methods in
those that perform considerably better when there is
clear outstanding of the foreground (e.g Lloyd) and
others in the opposite case (e.g Otsu).
2) Although there is a slight better performance
of the local binarization methods in comparison to the
global binarization methods, there are global methods
with a very good performance and local methods close
to the worst method.
A technique was proposed for the evaluation of
binarization algorithms. This method is appropriate for
document images that are difficult to be evaluated
using techniques based on segmentation or recognition
of the text. In order to survey the algorithm
performance we used 30 binarization algorithms. We
performed experiments on document archives made by
using two different techniques of image mosaicing and
combining old blank document pages with noise-free
pdf documents. This way, after the application of the
binarization algorithms to the synthetic images, it is
easy to evaluate the results by comparing the resulted
image with the original document.
 G. Leedham, S. Varma, A. Patankar, V. Govindaraju,
“Separating Text and Background in Degraded Document
Images” Proc. 8th IWFHR, pp. 244-249, September, 2002.
 J. He, Q.D.M. Do, A.C. Downton, J.H. Kim, “A
Comparison of Binarization Methods for Historical Archive
Documents”, Proc. 8th ICDAR, pp. 538-542, 2005.
 M. Sezgin, B. Sankur, “Survey over image thresholding
techniques and quantitative performance evaluation”, Journal
of Electronic Imaging 13(1), 146–165, 2004.
 S. Abutaleb, “Automatic thresholding of gray-level
pictures using two-dimensional entropy,” Comput. Vis.
Graph. Image Process. 47, 22–32, 1989.
 Y.Yang and H.Yan, ‘An adaptive logical method for
binarization of degraded document images’, Pattern
Recognition (PR), 33, pp. 787–807, 2000.
 J. Bernsen, “Dynamic thresholding of grey-level
images”, Proc. 8th ICPR, pp 1251-1255, 1986.
 W. Doyle, “Operation useful for similarity-invariant
pattern recognition,” J. Assoc. Comput. Mach, vol. 9, pp.
 V. Vonikakis, I. Andreadis and N. Papamarkos, "Robust
Document Binarization with OFF Center-surround Cells",
Pattern Analysis & Applications, to appear.
 A.D. Brink, N.E.Pendock, “Minimum Cross-Entropy
Threshold Selection”, PR(29), pp. 179-188, 1996.
 J. M. S. Prewitt and M. L. Mendelsohn, “The analysis
of cell images,” in Ann. New York Acad. Sci., vol. 128, pp.
 B. Gatos, I.E. Pratikakis, S.J. Perantonis, “Adaptive
degraded document image binarization”, PR(39), No. 3, pp.
 O.D. Trier, and T. Taxt, “Improvement of ‘integrated
function algorithm’ for binarisation of document images”,
Pattern Recognition Letters, 16, pp. 277–283, 1995.
 G. Johannsen and J. Bille, "A threshold selection
method using information measures", Proc. ICPR, pp. 140-
143, Munich, Germany, 1982.
 R. Duda and P. Hart, “Pattern Classification and Scene
Analysis”, Wiley, New York 1973.
 A. K. Jain and R. C. Dubes, "Algorithms for Clustering
Data", Prentice Hall, 1988.
 N. Kapur, P.K. Sahoo and A.K. Wong, "A new method
for gray-level picture Thresholding using the Entropy of the
histogram", Computer Vision
Processing 29, 273-285, 1985.
Graphics and Image
 J. Kittler, J. Illingworth, “On threshold selection using
clustering criteria”, IEEE Trans. Systems Man Cybernet.15,
 N. Papamarkos and A. Atsalakis, "Gray-level reduction
using local spatial features", Computer Vision and Image
Understanding, pp. 336-350, 2000.
 J. Z. Liu and W. Q. Li, “The automatic thresholding of
gray-level pictures via two-dimensional Otsu method”, Acta
Automatica Sin. 19, 101–105, 1993.
 D. E. Lloyd, “Automatic target classification using
moment invariant of image shapes”, Technical Report, RAE
IDN AW126, Farnborough, UK, Dec. 1985.
 K.V. Mardia and T.J. Hainsworth, “A Spatial
Thresholding Method for Image Segmentation”, IEEE
PAMI, 10, pp. 919-927, 1988.
 L K Huang, M J. Wang “Image thresholding by
minimizing the measures of fuzziness”, PR 28(1):41-51,
 W. Niblack, “An Introduction to Digital image
processing”, Prentice Hall, pp 115-116, 1986.
 N. Otsu, “A threshold selection method from gray-level
histograms”, IEEE Trans. Systems Man Cybernet., 9 (1), pp.
 P. Palumbo, P. Swaninathan and S. Shrihari, "Document
Image Binarization: Evaluation of Algorithms," SPIE
Applications of Digital Image Processing IX, vol. 697, pp.
 J.R. Parker, “Gray level thresholding in badly
illuminated images”, IEEE PAMI. 13 (8), 813–819, 1991.
 T. Pun, “A new method for gray-level picture
thresholding using the entropy of the histogram”, Signal
Processing 2, pp. 223–237, 1980.
 N. Ramesh, J.H. Yoo, I.K. Sethi. “Thresholding based
on histogram approximation” IEE Proc.-Vis.Image Signal
Process.,Vol.142, No.5 pp: 4147, 1995.
 S.S. Reddi, S.F. Rudin and H.R. Keshavan, “An optimal
multiple Threshold scheme for image segmentation”, IEEE
Tran. On System Man and Cybernetics 14 (4), 661-665,
 T.W.Ridler and S.Calvard, “Picture thresholding using
an iterative selection method”, IEEE Transactions on
Systems, Man, and Cybernetics 8:630-632, 1978.
 A. Rosenfeld and A. C. Kak, “Digital Picture
Processing”, 2nd ed. New York: Academic, 1982.
 J. Sauvola, M. Pietikainen, « Adaptive document image
binarization », PR 33, 225–236, 2000.
 J.C. Yen, , F.J. Chang, and S. Chang, “A New Criterion
for Automatic Multilevel Thresholding”, IP(4), No. 3, March
pp. 370-378, 1995.
 L. Gottesfeld Brown, “A survey of Image Registration
Techniques” , ACM Computing Surveys, Vol 24, No 4,325-
 T.D. Kite, B.L. Evans, N. Daamera-Venkata and A.C.
Bovil, “Image Quality Assesment Based on a Degradation
Model”, in IEEE Trans. Image Processing, vol.9, pp.909-