Extraction of Text under Complex Background Using Wavelet Transform and Support Vector Machine
ABSTRACT A method based on wavelet transform and support vector machine (SVM) for detecting text under complex background is proposed. First, the image is decomposed by wavelet, and then the texture characteristic of text is extracted by using SVM on low-frequency approximate sub-space and high-frequency energy sub-space. Combining wavelet transform and SVM not only reduces the number of input training samples but also accelerates the speed of SVM for learning and classification. This method utilizes the characteristic that SVM is suited to high-dimension space work and improves the efficiency of extracting text. Experimental results show that the current proposed method can correctly and effectively locate text region in the digital image
SourceAvailable from: Wen Gao[Show abstract] [Hide abstract]
ABSTRACT: In recent years, the amount of streaming video has grown rapidly on the Web. Often, retrieving these streaming videos offers the challenge of indexing and analyzing the media in real time because the streams must be treated as effectively infinite in length, thus precluding offline processing. Generally speaking, captions are important semantic clues for video indexing and retrieval. However, existing caption detection methods often have difficulties to make real-time detection for streaming video, and few of them concern on the differentiation of captions from scene texts and scrolling texts. In general, these texts have different roles in streaming video retrieval. To overcome these difficulties, this paper proposes a novel approach which explores the inter-frame correlation analysis and wavelet-domain modeling for real-time caption detection in streaming video. In our approach, the inter-frame correlation information is used to distinguish caption texts from scene texts and scrolling texts. Moreover, wavelet-domain Generalized Gaussian Models (GGMs) are utilized to automatically remove non-text regions from each frame and only keep caption regions for further processing. Experiment results show that our approach is able to offer real-time caption detection with high recall and low false alarm rate, and also can effectively discern caption texts from the other texts even in low resolutions.Proceedings of SPIE - The International Society for Optical Engineering 01/2008; DOI:10.1117/12.759571 · 0.20 Impact Factor
[Show abstract] [Hide abstract]
ABSTRACT: This paper presents a multiple frames integration based approach to detect and localize static caption texts on news videos. Utilizing the temporal information of videos, the algorithm includes robust text features and the non-text line deletion technique, and yields precise and tight localization for detected text regions. The Canny edge detector is first applied on reference frames and is followed by executing the logical AND to reduce the edges from the variation of the background including the scrolling texts. Next, rough text candidate regions are determined by calculating the number black-white transition (BWT). Finally, the text regions are refined by the non-text line deletion technique. The proposed algorithm is applicable to multiple languages and robust to text polarities, alignments, and character sizes (from 10×10 to 30×30). According to the experimental results on various multilingual video sequences, the proposed algorithm has a 96% and above performance in recall, precision, and quality of bounding preciseness.
[Show abstract] [Hide abstract]
ABSTRACT: This paper proposed a method for localization of caption text, particularly Kannada caption text from video sequence. The proposed method uses multiple frames integration approach based on three characteristics namely character location, edge distribution and pixel contrast. The novelty of our approach lies on the pairing of pixels by finding double edge maps of the original and its rotated image. To highlight the text area based on the video temporal information, a Roberts' edge detector is applied to reference gray scale frames of both original and rotated images, followed by combination of the two edge maps and later on a multiple frame integration based on the above three characteristics by employing logical AND operator that keeps only pixels that are invariant among the frames. The morphological operations are applied on the edge map to connect the text characters and discard nontext components. The result is then smoothed and overlaid on one of the original reference images to extract text candidate block. The experimental results carried out on sample video data of Kannada TV channel (Commercial) show that the proposed approach achieves a high precision and recall rate.National Conference on Challenges in Research & Technology in the Coming Decades National Conference on Challenges in Research & Technology in the Coming Decades (CRT 2013); 01/2013