Conference Paper

YinYang, a Fast and Robust Adaptive Document Image Binarization for Optical Character Recognition

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Building upon our prior research with the YinYang algorithm [6], which was recognized for its effectiveness in OCR preprocessing during the DocEng'22 [20] and DocEng'23 binarization competitions, the ZigZag algorithm simplifies the design of its predecessor while retaining its successful strategies. ZigZag begins by accurately estimating the background, then isolates the foreground through background subtraction, normalizes the foreground, and applies a threshold to generate a clear binary image. ...
... Our evaluation includes multiple datasets and quality metrics. The algorithms assessed alongside ZigZag include Bernsen [5], Bradley [7], Michalak [21], Nick [16], Niblack [16], Sauvola [29], and YinYang [6], with Otsu's method [26] serving as our baseline for comparison. Each algorithm was executed with its default settings, as specified in the original research papers. ...
... There are many extensions of this classic method [6] - [8] that cope with noisy images and uneven illumination conditions. One of the newest and most efficient document binarization techniques known as YinYang [9] uses local Otsu thresholding for its final step of binarization. ...
Conference Paper
A new method for selection of binarization threshold is suggested. It is based on the concept of inter-class border length of a binarized image. Border length is a function of binarization threshold. If the image contains solid objects that are binarized without damaging, and background elements are not revealed, then the border length is almost constant in respect to threshold variation. To the contrary, when binarization leads to object destruction or background penetration, the border length function changes rapidly due to formation of many small clusters of pixels. Basing on this notion, a correct binarization threshold could be chosen. A fast algorithm to find border length function is described, the calculation time being proportional to the pixel count of the grayscale image. Two discrimination parameters derived from this function are suggested. One of them depends on the object line width while the other corresponds to the average curvature of object contour. Basing on a combination of these parameters one can find the optimal binarization threshold provided object line width and/or average curvature is known in advance. Border length method was successfully used for recognition of serial numbers of banknotes in mass-produced equipment. The method recovers object shape and dimensions close to the original, and is almost independent on the background pattern.
Article
Full-text available
Smartphones with an in-built camera are omnipresent today in the life of over eighty percent of the world’s population. They are very often used to photograph documents. Document binarization is a key process in many document processing platforms. This paper assesses the quality, file size and time performance of sixty-eight binarization algorithms using five different versions of the input images. The evaluation dataset is composed of deskjet, laser and offset printed documents, photographed using six widely-used mobile devices with the strobe flash off and on, under two different angles and four shots with small variations in the position. Besides that, this paper also pinpoints the algorithms per device that may provide the best visual quality-time, document transcription accuracy-time, and size-time trade-offs. Furthermore, an indication is also given on the “overall winner” that would be the algorithm of choice if one has to use one algorithm for a smartphone-embedded application.
Conference Paper
Full-text available
Document image binarization is still an active research area as shows the number of binarization techniques proposed since many decades. The binarization of degraded document images is still difficult and encourages the development of new algorithms. For the last decade, discrete conditional random fields have been successfully used for many domains such as automatic language analysis. In this paper, we propose a CRF based framework to explore the combination capabilities of this model by combining discrete outputs from several well known binarization algorithms. The framework uses two 1D CRF models on the horizontal and the vertical directions that are coupled for each pixel by the product of the marginal probabilities computed from the both models. Experiments are made on two datasets from the Document Image Binarization Contest (DIBCO) 2009 and 2011 and show best performances than most of the methods presented at DIBCO 2011.
Conference Paper
Full-text available
Document image binarization has been studied for decades, and many practical binarization techniques have been proposed for different kinds of document images. However, many state-of-the-art methods are particularly suitable for the document images that suffer from certain specific type of image degradation or have certain specific type of image characteristics. In this paper, we propose a classification framework to combine different thresholding methods and produce better performance for document image binarization. Given the binarization results of some reported methods, the proposed framework divides the document image pixels into three sets, namely, foreground pixels, background pixels and uncertain pixels. A classifier is then applied to iteratively classify those uncertain pixels into foreground and background, based on the pre-selected froeground and background sets. Extensive experiments over different datasets including the Document Image Binarization Contest(DIBCO)2009 and Handwritten Document Image Binarization Competition(H-DIBCO)2010 show that our proposed framework outperforms most state-of-the-art methods significantly.
Article
Full-text available
Image thresholding is a common task in many computer vision and graphics applications. The goal of thresholding an image is to classify pixels as either "dark" or "light". Adaptive thresholding is a form of thresholding that takes into account spatial variations in illumination. We present a technique for real-time adaptive thresholding using the integral image of the input. Our technique is an extension of a previous method. However, our solution is more robust to illumination changes in the image. Additionally, our method is simple and easy to implement. Our technique is suitable for processing live video streams at a real-time frame-rate, making it a valuable tool for interactive applications such as augmented reality. Source code is available online.
Article
Full-text available
Cubic convolution interpolation is a new technique for resampling discrete data. It has a number of desirable features which make it useful for image processing. The technique can be performed efficiently on a digital computer. The cubic convolution interpolation function converges uniformly to the function being interpolated as the sampling increment approaches zero. With the appropriate boundary conditions and constraints on the interpolation kernel, it can be shown that the order of accuracy of the cubic convolution method is between that of linear interpolation and that of cubic splines. A one-dimensional interpolation function is derived in this paper. A separable extension of this algorithm to two dimensions is applied to image data.
Chapter
The recent Time-Quality Binarization Competitions have shown that no single binarization algorithm is good for all kinds of document images and that the time elapsed in binarization varies widely between algorithms and also depends on the document features. On the other hand, document applications for portable devices have space and processing limitations that allow to implement only the “best” algorithm. This paper presents the methodology and assesses the time-quality performance of 61 binarization algorithms to choose the most time-quality efficient one, under two criteria.KeywordsSmartphone appletsDocument binarizationDIB-datasetPhotographed documentsBinarization competitions
Conference Paper
The ICDAR 2019 Time-Quality Binarization Competition assessed the performance of seventeen new together with thirty previously published binarization algorithms. The quality of the resulting two-tone image and the execution time were assessed. Comparisons were on both in “real-world” and synthetic scanned images, and in documents photographed with four models of widely used portable phones. Most of the submitted algorithms employed machine learning techniques and performed best on the most complex images. Traditional algorithms provided very good results at a fraction of the time. Keywords - Binarization; documents; algorithms; quality evaluation, performance evaluation, historical documents.
Article
This paper proposes an integrated system for the binarization of normal and degraded printed documents for the purpose of visualization and recognition of text characters. In degraded documents, where considerable background noise or variation in contrast and illumination exists, there are many pixels that cannot be easily classified as foreground or background pixels. For this reason, it is necessary to perform document binarization by combining and taking into account the results of a set of binarization techniques, especially for document pixels that have high vagueness. The proposed binarization technique takes advantage of the benefits of a set of selected binarization algorithms by combining their results using a Kohonen self-organizing map neural network. Specifically, in the first stage the best parameter values for each independent binarization technique are estimated. In the second stage and in order to take advantage of the binarization information given by the independent techniques, the neural network is fed by the binarization results obtained by those techniques using their estimated best parameter values. This procedure is adaptive because the estimation of the best parameter values depends on the content of images. The proposed binarization technique is extensively tested with a variety of degraded document images. Several experimental and comparative results, exhibiting the performance of the proposed technique, are presented.
Article
Adaptive binarization methods play a central role in document image processing. In this work, an adaptive and parameterless generalization of Otsu's method is presented. The adaptiveness is obtained by combining grid-based modeling and the estimated background map. The parameterless behavior is achieved by automatically estimating the document parameters, such as the average stroke width and the average line height. The proposed method is extended using a multiscale framework, and has been applied on various datasets, including the DIBCO'09 dataset, with promising results.
Elisa Barney Smith, Rodrigo Barros Bernardino, and Darlisson Marinho de Jesus. 2019. ICDAR 2019 time-quality binarization competition
  • Ergina Rafael Dueire Lins
  • Elisa Barney Kavallieratou
  • Rodrigo Barros Smith
  • Darlisson Bernardino
  • Jesus Marinho De
  • Lins Rafael Dueire
ICDAR 2021 competition on time-quality document image binarization
  • Rodrigo Barros Rafael Dueire Lins
  • Elisa Barney Bernardino
  • Ergina Smith
  • Kavallieratou
  • Lins Rafael Dueire
Rafael Dueire Lins, Rodrigo Barros Bernardino, Elisa Barney Smith, and Ergina Kavallieratou. 2021. ICDAR 2021 competition on time-quality document image binarization. In International Conference on Document Analysis and Recognition. Springer, 708-722.
ICDAR 2019 time-quality binarization competition
  • Ergina Rafael Dueire Lins
  • Elisa Barney Kavallieratou
  • Rodrigo Barros Smith
  • Darlisson Bernardino
  • Jesus Marinho De
  • Lins Rafael Dueire