Conference Paper

Scene Text Extraction Using Image Intensity and Color Information

Comput. Sci. Dept., KAIST, Daejeon, South Korea
DOI: 10.1109/CCPR.2009.5343971 Conference: Pattern Recognition, 2009. CCPR 2009. Chinese Conference on
Source: IEEE Xplore

ABSTRACT Robust extraction of text from scene images is essential for successful scene text recognition. Scene images usually have nonuniform illumination, complex background, and text-like objects. In this paper, we propose a text extraction algorithm by combining the adaptive binarization and perceptual color clustering method. Adaptive binarization method can handle gradual illumination changes on character regions, so it can extract whole character regions even though shadows and/or light variations affect the image quality. However, image binarization on gray-scale images cannot distinguish different color components having the same luminance. Perceptual color clustering method complementary can extract text regions which have similar color distances, so that it can prevent the problem of the binarization method. Text verification based on local information of a single component and global relationship between multiple components is used to determine the true text components. It is demonstrated that the proposed method achieved reasonabe accuracy of the text extraction for the moderately difficult examples from the ICDAR 2003 database.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Over the past few years, research on scene text extraction has developed rapidly. Recently, condition random field (CRF) has been used to give connected components (CCs) 'text' or 'non-text' labels. However, a burning issue in CRF model comes from multiple text lines extraction. In this paper, we propose a two-step iterative CRF algorithm with a Belief Propagation inference and an OCR filtering stage. Two kinds of neighborhood relationship graph are used in the respective iterations for extracting multiple text lines. Furthermore, OCR confidence is used as an indicator for identifying the text regions, while a traditional OCR filter module only considered the recognition results. The first CRF iteration aims at finding certain text CCs, especially in multiple text lines, and sending uncertain CCs to the second iteration. The second iteration gives second chance for the uncertain CCs and filter false alarm CCs with the help of OCR. Experiments based on the public dataset of ICDAR 2005 prove that the proposed method is comparative with the existing algorithms.
    Document Analysis and Recognition (ICDAR), 2011 International Conference on; 10/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a framework for isolating text regions from natural scene images. The main algorithm has two functions: it generates text region candidates, and it verifies of the label of the candidates (text or non-text). The text region candidates are generated through a modified K-means clustering algorithm, which references texture features, edge information and color information. The candidate labels are then verified in a global sense by the Markov Random Field model where collinearity weight is added as long as most texts are aligned. The proposed method achieves reasonable accuracy for text extraction from moderately difficult examples from the ICDAR 2003 database.
    20th International Conference on Pattern Recognition, ICPR 2010, Istanbul, Turkey, 23-26 August 2010; 01/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: Understanding text captured in real-world scenes is a challenging problem in the field of visual pattern recognition and continues to generate a significant interest in the OCR (Optical Character Recognition) community. This paper proposes a novel method to recognize scene texts avoiding the conventional character segmentation step. The idea is to scan the text image with multi-scale windows and apply a robust recognition model, relying on a neural classification approach, to every window in order to recognize valid characters and identify non valid ones. Recognition results are represented as a graph model in order to determine the best sequence of characters. Some linguistic knowledge is also incorporated to remove errors due to recognition confusions. The designed method is evaluated on the ICDAR 2003 database of scene text images and outperforms state-of-the-art approaches.
    Proc. of Int. Workshop on Document Analysis Systems (DAS'12); 01/2012