Conference Paper

Scene Text Extraction Using Image Intensity and Color Information

Comput. Sci. Dept., KAIST, Daejeon, South Korea
DOI: 10.1109/CCPR.2009.5343971 Conference: Pattern Recognition, 2009. CCPR 2009. Chinese Conference on
Source: IEEE Xplore

ABSTRACT Robust extraction of text from scene images is essential for successful scene text recognition. Scene images usually have nonuniform illumination, complex background, and text-like objects. In this paper, we propose a text extraction algorithm by combining the adaptive binarization and perceptual color clustering method. Adaptive binarization method can handle gradual illumination changes on character regions, so it can extract whole character regions even though shadows and/or light variations affect the image quality. However, image binarization on gray-scale images cannot distinguish different color components having the same luminance. Perceptual color clustering method complementary can extract text regions which have similar color distances, so that it can prevent the problem of the binarization method. Text verification based on local information of a single component and global relationship between multiple components is used to determine the true text components. It is demonstrated that the proposed method achieved reasonabe accuracy of the text extraction for the moderately difficult examples from the ICDAR 2003 database.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Text contained in scene images provides the semantic context of the images. For that reason, robust extraction of text regions is essential for successful scene text understanding. However, separating text pixels from scene images still remains as a challenging issue because of uncontrolled lighting conditions and complex backgrounds. In this paper, we propose a two-stage conditional random field (TCRF) approach to robustly extract text regions from the scene images. The proposed approach models the spatial and hierarchical structures of the scene text, and it finds text regions based on the scene text model. In the first stage, the system generates multiple character proposals for the given image by using multiple image segmentations and a local CRF model. In the second stage, the system selectively integrates the generated character proposals to determine proper character regions by using a holistic CRF model. Through the TCRF approach, we cast the scene text separation problem as a probabilistic labeling problem, which yields the optimal label configuration of pixels that maximizes the conditional probability of the given image. Experimental results indicate that our framework exhibits good performance in the case of the public databases.
    Image and Vision Computing 11/2013; 31(11):823–840. · 1.58 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Over the past few years, research on scene text extraction has developed rapidly. Recently, condition random field (CRF) has been used to give connected components (CCs) 'text' or 'non-text' labels. However, a burning issue in CRF model comes from multiple text lines extraction. In this paper, we propose a two-step iterative CRF algorithm with a Belief Propagation inference and an OCR filtering stage. Two kinds of neighborhood relationship graph are used in the respective iterations for extracting multiple text lines. Furthermore, OCR confidence is used as an indicator for identifying the text regions, while a traditional OCR filter module only considered the recognition results. The first CRF iteration aims at finding certain text CCs, especially in multiple text lines, and sending uncertain CCs to the second iteration. The second iteration gives second chance for the uncertain CCs and filter false alarm CCs with the help of OCR. Experiments based on the public dataset of ICDAR 2005 prove that the proposed method is comparative with the existing algorithms.
    Document Analysis and Recognition (ICDAR), 2011 International Conference on; 10/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Understanding text captured in real-world scenes is a challenging problem in the field of visual pattern recognition and continues to generate a significant interest in the OCR (Optical Character Recognition) community. This paper proposes a novel method to recognize scene texts avoiding the conventional character segmentation step. The idea is to scan the text image with multi-scale windows and apply a robust recognition model, relying on a neural classification approach, to every window in order to recognize valid characters and identify non valid ones. Recognition results are represented as a graph model in order to determine the best sequence of characters. Some linguistic knowledge is also incorporated to remove errors due to recognition confusions. The designed method is evaluated on the ICDAR 2003 database of scene text images and outperforms state-of-the-art approaches.
    Proc. of Int. Workshop on Document Analysis Systems (DAS'12); 01/2012