Conference Paper

An evaluation survey of binarization algorithms on historical documents

Dept. of Electr. & Comput. Eng., Democritus Univ. of Thrace, Xanthi
DOI: 10.1109/ICPR.2008.4761546 Conference: 19th International Conference on Pattern Recognition (ICPR 2008), December 8-11, 2008, Tampa, Florida, USA
Source: DBLP

ABSTRACT

Document binarization is an active research area for many years. There are many difficulties associated with satisfactory binarization of document images and especially in cases of degraded historical documents. In this paper, we try to answer the question ldquohow well an existing binarization algorithm can binarize a degraded document image?rdquo We propose a new technique for the validation of document binarization algorithms. Our method is simple in its implementation and can be performed on any binarization algorithm since it doesnpsilat require anything more than the binarization stage. Then we apply the proposed technique to 30 existing binarization algorithms. Experimental results and conclusions are presented.

Download full-text

Full-text

Available from: Ergina Kavallieratou
  • Source
    • "Ideally, the segmentation process produces a binary image where text pixels have a value of 0 and background pixels 1. A large number of text segmentation techniques have been reported in the literature [11]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Automatic machine reading of texts in scenes is largely restricted by the poor character recognition accuracy. In this paper, we extend the Histogram of Oriented Gradient (HOG) and propose two new feature descriptors: Co-occurrence HOG (Co-HOG) and Convolutional Co-HOG (ConvCo-HOG) for accurate recognition of scene texts of different languages. Compared with HOG which counts orientation frequency of each single pixel, the Co-HOG encodes more spatial contextual information by capturing the co-occurrence of orientation pairs of neighboring pixels. Additionally, ConvCo-HOG exhaustively extracts Co-HOG features from every possible image patches within a character image for more spatial information. The two features have been evaluated extensively on five scene character datasets of three different languages including three sets in English, one set in Chinese and one set in Bengali. Experiments show that the proposed techniques provide superior scene character recognition accuracy and are capable of recognizing scene texts of different scripts and languages.
    Full-text · Article · Jul 2015
  • Source
    • "We use a threshold applied on the lightness of the image. A good survey of various binarization methods is done by Pamarkos et al. in [6]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents an Italic/Roman word type recog- nition system without a priori knowledge on the charac- ters' font. This method aims at analyzing old documents in which character segmentation is not trivial. Therefore our approach segments the document into words and analyse the text word per word. To define the word style, we com- bine three criteria which are based on the visual differences between a word and a slanted version of the same word. These criteria are defined thanks to features computed from the vertical projection profile of the word. Because we do not assume a specific slant angle, we compute these mea- sures on a whole range of possible slant angles and then sum the obtained scores. Our results show a ratio of 100 % recognition for Italic words and 97.2 % for Roman words.
    Preview · Conference Paper · Jan 2009
  • Source

    Preview · Conference Paper · Jan 2009
Show more