Conference Paper

An evaluation survey of binarization algorithms on historical documents.

Dept. of Electr. & Comput. Eng., Democritus Univ. of Thrace, Xanthi
DOI: 10.1109/ICPR.2008.4761546 Conference: 19th International Conference on Pattern Recognition (ICPR 2008), December 8-11, 2008, Tampa, Florida, USA
Source: DBLP

ABSTRACT Document binarization is an active research area for many years. There are many difficulties associated with satisfactory binarization of document images and especially in cases of degraded historical documents. In this paper, we try to answer the question ldquohow well an existing binarization algorithm can binarize a degraded document image?rdquo We propose a new technique for the validation of document binarization algorithms. Our method is simple in its implementation and can be performed on any binarization algorithm since it doesnpsilat require anything more than the binarization stage. Then we apply the proposed technique to 30 existing binarization algorithms. Experimental results and conclusions are presented.

1 Bookmark
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Ancient Arabic manuscripts' processing and analysis are very difficult tasks and are likely to remain open problems for many years to come. In this paper we tackle the problem of foreground/background separation in old documents. Our approach uses a back-propagation neural network to directly classify image pixels according to their neighborhood. We tried several multilayer Perceptron topologies and found experimentally the optimal one. Experiments were run on synthetic data obtained by image fusion techniques. The results are very promising compared to state-of-the-art techniques.
    International Conference on Frontiers in Handwriting Recognition 2012, Bari - Italy; 09/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: A necessary step for the recognition of scanned documents is binarization, which is essentially the segmentation of the document. In order to binarize a scanned document, we can find several algorithms in the literature. What is the best binarization result for a given document image? To answer this question, a user needs to check different binarization algorithms for suitability, since different algorithms may work better for different type of documents. Manually choosing the best from a set of binarized documents is time consuming. To automate the selection of the best segmented document, either we need to use ground-truth of the document or propose an evaluation metric. If ground-truth is available, then precision and recall can be used to choose the best binarized document. What is the case, when ground-truth is not available? Can we come up with a metric which evaluates these binarized documents? Hence, we propose a metric to evaluate binarized document images using eigen value decomposition. We have evaluated this measure on DIBCO and H-DIBCO datasets. The proposed method chooses the best binarized document that is close to the ground-truth of the document.
    Proc SPIE 01/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Document image binarization is of great importance in the document image analysis and recognition pipeline since it affects further stages of the recognition process. The evaluation of a binarization method aids in studying its algorithmic behaviour and verifying its effectiveness by providing qualitative and quantitative indication of its performance. This work concerns a pixel-based binarization evaluation methodology for historical handwritten/machine-printed document images. In the proposed evaluation scheme, the Recall and Precision evaluation measures are properly modified using a weighting scheme that diminishes any potential evaluation bias. Additional performance metrics of the proposed evaluation scheme consist of the percentage rates of broken and missed text, false alarms, background noise, character enlargement and merging. Several experiments conducted in comparison with other pixel-based evaluation measures, demonstrate the validity of the proposed evaluation scheme.
    IEEE Transactions on Image Processing 09/2012; · 3.20 Impact Factor

Full-text (2 Sources)

Available from
May 28, 2014