Conference Proceeding

Extraction of Text under Complex Background Using Wavelet Transform and Support Vector Machine

Inst. of Artificial Intelligence & Robotics, Northeastern Univ., Shenyang
07/2006; DOI:10.1109/ICMA.2006.257850 In proceeding of: Mechatronics and Automation, Proceedings of the 2006 IEEE International Conference on
Source: IEEE Xplore

ABSTRACT A method based on wavelet transform and support vector machine (SVM) for detecting text under complex background is proposed. First, the image is decomposed by wavelet, and then the texture characteristic of text is extracted by using SVM on low-frequency approximate sub-space and high-frequency energy sub-space. Combining wavelet transform and SVM not only reduces the number of input training samples but also accelerates the speed of SVM for learning and classification. This method utilizes the characteristic that SVM is suited to high-dimension space work and improves the efficiency of extracting text. Experimental results show that the current proposed method can correctly and effectively locate text region in the digital image

0 0
  • [show abstract] [hide abstract]
    ABSTRACT: In this paper, we propose a simple yet powerful video text location scheme. Firstly, an edge-based background classification is applied to the input video frames, which are subsequently classified into three categories: simple, normal and complex. Then, for the three different types of video frames, different text location methods are adopted, respectively: for the simple background class, a stroke-based text location scheme is used; for the normal background class, a variant of morphology called conditional morphology is incorporated to remove the non-text noises; for the complex background situation, after location routine based on stroke analysis and conditional morphology, an SVM text detector is trained to reduce the false alarms. Experimental results show that our approach performs well in various videos with high speed and precision.
    IJDAR. 01/2010; 13:173-186.
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: In recent years, the amount of streaming video has grown rapidly on the Web. Often, retrieving these streaming videos offers the challenge of indexing and analyzing the media in real time because the streams must be treated as effectively infinite in length, thus precluding offline processing. Generally speaking, captions are important semantic clues for video indexing and retrieval. However, existing caption detection methods often have difficulties to make real-time detection for streaming video, and few of them concern on the differentiation of captions from scene texts and scrolling texts. In general, these texts have different roles in streaming video retrieval. To overcome these difficulties, this paper proposes a novel approach which explores the inter-frame correlation analysis and wavelet-domain modeling for real-time caption detection in streaming video. In our approach, the inter-frame correlation information is used to distinguish caption texts from scene texts and scrolling texts. Moreover, wavelet-domain Generalized Gaussian Models (GGMs) are utilized to automatically remove non-text regions from each frame and only keep caption regions for further processing. Experiment results show that our approach is able to offer real-time caption detection with high recall and low false alarm rate, and also can effectively discern caption texts from the other texts even in low resolutions.
    Proc SPIE 01/2008;
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: This paper presents a robust and efficient text detection algorithm for news video. The proposed algorithm uses the temporal information of video and logical AND operation to remove most of irrelevant background. Then a window-based method by counting the black-and-white transitions is applied on the resulted edge map to obtain rough text blobs. Line deletion technique is used twice to refine the text blocks. The proposed algorithm is applicable to multiple languages (English, Japanese and Chinese), robust to text polarities (positive or negative), various character sizes (from 4×7 to 30×30), and text alignments (horizontal or vertical).Three metrics, recall (R), precision (P), and quality of bounding preciseness (Q), are adopted to measure the efficacy of text detection algorithms. According to the experimental results on various multilingual video sequences, the proposed algorithm has a 96% and above performance in all three metrics. Comparing to existing methods, our method has better performance especially in the quality of bounding preciseness that is crucial to later binarization process.