Figure - available from: Multimedia Tools and Applications
This content is subject to copyright. Terms and conditions apply.
Flowchart of the proposed method for script identification (with noisy scenario and without noisy scenario)
Source publication
Videos – a high volume of texts – broadcast via different media, such as television and the internet. Since Optical Character Recognition (OCR) engines are script-dependent, script identification is a precursor. Other than that, video script identification is not trivial as we have difficult issues, such as low resolution, complex background, noise...
Similar publications
We aim to investigate the performance of current OCR systems on low resource languages and low resource scripts. We introduce and make publicly available a novel benchmark, \textsc{OCR4MT}, consisting of real and synthetic data, enriched with noise, for 60 low-resource languages in low resource scripts. We evaluate state-of-the-art OCR systems on o...
Citations
... In the context of Bengali natural language processing (NLP), several approaches including deep learning (DL) techniques have proved their capabilities in several fields, such as handwritten Bengali text detection [26][27][28][29][30], using Deep learning Identification of Bangla overlaid text on videos [31][32][33], hand sign language to text using Neural Network [34], Automatic Speech Recognition [35][36][37][38], Machine Translation [39][40][41], Text Categorization [42][43][44] and Bangla Emotion Detection [44][45][46][47]. These are some works relevant to our methodology to propose a novel study on Bangladesh Bkash financial transaction awareness by detecting fraudulent texts using ML and DL. ...
The growth of mobile financial services over the last decade is well accepted by the people of Bangladesh. However, although reported here and there, a complete understanding of illegal payments and fraudulent transactions is still missing. This paper studies one of the popular and experienced financial transaction services of Bangladesh, Bkash, to identify fraudulent text messages through the service. We propose an improved classification method for fraudulent messages through feature extraction and a hybrid deep learning (DL) approach with ML, which combines a convolutional neural network with sequential learning, referred to as the customized Bkash fraud semantic sequential learning (CBF-SSL) model. Seven machine-learning classifiers and five deep-learning sequential models are experimentally evaluated within the hybrid classifier. The performance of the fraud detection system is assessed using the loss function and confusion matrix on a dataset of 500 Bengali text messages. The performance evaluation demonstrates that the hybrid model of convolution neural network (CNN) using long short-term memory (LSTM) outperforms all other classifiers with the lowest error 1 rate of 0.2532%, the highest F1-measure of 75.00% with the highest test accuracy. The proposed text features in combination with CNN and LSTM prove to be highly effective in detecting fraudulent messages.
... We adapted our network utilizing four publicly accessible datasets that comprise varying quantities of images, from relatively few to a large number. Leveraging the capabilities of CNNs [15,16,17], our intent was to amplify the efficacy and precision of DR diagnosis via automated retinal image analysis. ...
... Script identification in document images is a well-established problem [7]. Considerable efforts have been devoted to this issue. ...
Script identification is an essential preliminary step in multilingual OCR systems. This paper focuses primarily on tackling the challenging problem of script identification in scene text images, which are usually characterized by low image quality, diverse text styles, and complex backgrounds. Furthermore, script identification becomes a fine-grained classification problem when some scripts share common characters. To address this issue, we propose a novel end-to-end CNN comprising two streams for extracting distinct types of features, namely, visual features and spatial features. In the visual stream, we introduce an enhanced Squeeze-and-Excitation (SE) channel attention mechanism to emphasize valuable features and suppress irrelevant ones. The enhanced SE is composed of squeeze and excitation steps. The squeeze step employs adaptive average pooling for information aggregation. Two 1x1 convolutional layers are used to derive channel weights in the excitation step. In the spatial stream, we perform efficient analysis of the spatial dependencies within the text lines based on LSTM. Finally, we propose an adaptive fusion approach that combines probability vectors from the two streams. Instead of being fixed, the weight assigned to each probability vector is learned during network training. To validate our proposed method, we conduct extensive tests on four publicly available datasets, viz. MLe2e, RRC-MLT2017, SIW-13, and CVSI-2015. Our proposed method achieves accuracies of 97.66%, 90.24%, 96.66%, and 98.44% on these four datasets, respectively, which compare favorably with state-of-the-art methods. The two streams have demonstrated complementarity. Moreover, ablation experiments have been conducted to verify the effectiveness of each component in the proposed method.
... Hence, it is essential to localize texts prior to recognition. Several handcrafted-based approaches were proposed, which have been surpassed by recent deep leaning-based approaches [43,44]. Hence, we adhered to a deep learning-based technique for text localization using M-EAST [45]. ...
Real-world images often encompass embedded texts that adhere to disparate disciplines like business, education, and amusement, to name a few. Such images are graphically rich in terms of font attributes, color distribution, foreground-background similarity, and component organization. This aggravates the difficulty of recognizing texts from these images. Such characteristics are very prominent in the case of movie posters. One of the first pieces of information on movie posters is the title. Automatic recognition of movie titles from images can aid in efficient indexing as well as information conveyance. However, it is accompanied by other texts like the names of actors, producers, taglines, dates, etc. Though the organization of components is somewhat similar across different film industries like Tollywood (West Bengal), Bollywood (Mumbai), and Hollywood (Los Angeles), the graffiti patterns differ in multifarious instances. To address the problem of movie title understanding, we propose a dataset named MOvie POsters-Hollywood Bollywood Tollywood (MOPO-HBT) that encompasses movie posters from the aforementioned industries. The entire dataset is publicly available (http://ieee-dataport.org/11564) for research purposes. The baseline results of title identification and recognition were obtained with a CNN-based (Convolutional Neural Network) approach, wherein the titles were extracted using the M-EAST (Modified-Efficient and Accurate Scene Text) detector model.
... Many diagnostic technologies are available, but Xray is widely used, cheap, non-invasive, and easy to acquire. Several fields of research have been transformed by Deep Learning techniques over the last few years [1,2,3,4]. Deep learning techniques are particularly beneficial in the medical field because imaging data sets, such as retinal images, chest X-rays, and brain MRIs, show promising results with improved accuracy. X-ray machines provide inexpensive and quicker results for scanning various human organs in hospitals. ...
According to WHO, lung infection is one of the most serious problems across the world, especially for children under five years old and older people over sixteen years old. In this study, we designed a deep learning-based model to aid medical practitioners in their diagnostic process. Here, U-Net based segmentation framework is considered to get the region of interest (ROI) of the lung area from the chest x-ray images. Two standard deep learning models and a developed CNN model comprise this framework. A deep ensemble framework method is presented to detect COVID-19 disease from a collection of chest X-ray images of disparate cases in both segment-free and segmented-based lung images. Different public datasets were used for segmentation and classification to test the system’s robustness. The performance of segmentation and classification approaches returns promising outcomes compared to the state-of-the-art.
... These tests rely on the recognition and binding of pathogen-specific antigens by antibodies, providing a sensitive and specific detection method. 3. Machine learning (ML) and Artificial Intelligence (AI): By leveraging ML and AIbased models [5][6][7][8], large datasets of plant images, genomic sequences, and environmental parameters can be analysed to develop predictive models for disease detection. These models can identify patterns, correlations, and anomalies that may not be apparent to human observers, improving the accuracy and efficiency of detection. ...
Plant diseases pose a significant threat to agriculture, causing substantial yield losses and economic damages worldwide. Traditional methods for detecting plant diseases are often time-consuming and require expert knowledge. In recent years, deep learning-based approaches have demonstrated great potential in the detection and classification of plant diseases. In this paper, we propose a Convolutional Neural Network (CNN) based framework for identifying 15 categories of plant leaf diseases, focusing on Tomato, Potato, and Bell pepper as the target plants. For our experiments, we utilized the publicly available PlantVillage dataset. The choice of a CNN for this task is justified by its recognition as one of the most popular and effective deep learning methods, especially for processing spatial data like images of plant leaves. We evaluated the performance of our model using various performance metrics, including accuracy, precision, recall, and F1-score. Our findings indicate that our approach outperforms state-of-the-art techniques, yielding encouraging results in terms of disease identification accuracy and classification precision.
... Hence, it is essential to localize texts prior to recognition. Several handcrafted-based approaches were proposed, which have been surpassed by recent deep leaning-based approaches [43,44]. Hence, we adhered to a deep learning-based technique for text localization using M-EAST [45]. ...
Real-world images often encompass embedded texts that adhere to dis-parate disciplines like business, education, and amusement, to name a few. Such images are graphically rich in terms of font attributes, color distribution, foreground-background similarity, and component organization. This aggravates the difficulty of recognizing texts from these images. Such characteristics are very prominent in the case of movie posters. One of the first pieces of information on movie posters is the title. Automatic recognition of movie titles from images can aid in efficient indexing as well as information conveyance. However, it is accompanied by other texts like the names of actors, producers , taglines, dates, etc. Though the organization of components is somewhat similar across different film industries like Tollywood (West Bengal), Bollywood (Mumbai), and Hollywood (Los Angeles), the graf-fiti patterns differ in multifarious instances. To address the problem of 1 Springer Nature 2021 L A T E X template 2 MOPO-HBT movie title understanding, we propose a dataset named MOvie POsters-Hollywood Bollywood Tollywood (MOPO-HBT) that encompasses movie posters from the aforementioned industries. The entire dataset is publicly available (link) for research purposes. The baseline results of text identification and recognition were obtained with a CNN-based (Convolutional neural network) approach, wherein the titles were extracted using the M-EAST (Modified-Efficient and Accurate Scene Text detector) model.
... They experimented with SIW-13, ICDAR-17, MLe2e images datasets, and the CVSI-15 video scripts dataset. Ghosh et al. (2021a) developed a lightweight CNN framework for video script identification. The performance of the system was tested on the CVSI-15 dataset in noiseless and disparate noisy scenarios. ...
... The process of text detection was assessed by employing different protocols like ICDAR 03 ) (considered best match among text rectangles), DetEval (Wolf and Jolion 2006)(paid attention towards many matches), IoU (Karatzas et al. 2015), Yao (Yao et al. 2012) (concentrated on arbitrary orientations), TedEval ) (characterlevel detection), etc. For script identification, protocols like accuracy, precision, sensitivity, specificity, etc., Ghosh et al. (2021a) were used. The text recognition was assessed utilizing word recognition accuracy or an end-to-end recognition . ...
Computational perception has indeed been dramatically modified and reformed from handcrafted feature-based techniques to the advent of deep learning. Scene text identification and recognition have inexorably been touched by this bow effort of upheaval, ushering in the period of deep learning. It is an important aspect of machine vision. Society has seen significant improvements in thinking, approach, and effectiveness over time. The goal of this study is to summarize and analyze the important developments and notable advancements in scene text identification and recognition over the past decade. We have discussed the significant handcrafted feature-based techniques which had been regarded as flagship systems in the past. They were succeeded by deep learning-based techniques. We have discussed such approaches from their inception to the development of complex models which have taken scene text identification to the next stage.
... Mamoun et al. [22] presented an approach to solve the handwritten Arabic language recognition problem. Mridul et al. [17] focused on the problem of text with noise and a complex background. ...
... We begin with digital text and progress to works dealing with different languages. Mridul et al. [17] offered a DL-based system called LWSINet: LightWeight Script Identification Content courtesy of Springer Nature, terms of use apply. Rights reserved. ...
Text, as one of humanity’s most influential innovations, has played an important role in shaping our lives. Reading a text is a difficult task due to several reasons factors, such as luminosity, text orientation, writing style, and very light colors. However, the visually impaired, on the other hand, have difficulty reading a text in all of these situations. In particular, a handwritten text is more difficult to read than a digital text due to the different forms and styles of the handwriting of different writers or, sometimes, of the same writer. Therefore, they would benefit from a device or a system to help them to solve this problem and improve their quality of life. Arabic language recognition and identification is a very difficult task because of diacritics such as consonant score, tashkil, and others. In this context, we propose a recognition and identification system for Arabic Handwritten Texts with Diacritics (AHTD) based on deep learning by using the convolutional neural network. Text images are trained, tested, and validated with our Arabic Handwritten Texts with a Diacritical Dataset (AHT2D). Then, the recognized text is enhanced with augmented reality technology and produced as a 2D image. Finally, the recognized text is converted into an audio output using AR technology. Voice output and visual output are given to the visually impaired user. The experimental results show that the proposed system is robust, with an accuracy rate of 95%.
... With the support of deep neural models, researchers were able to achieve excellent outcomes in object identification and tracking utilising Computer Vision Techniques [4]. Deep learning algorithms have achieved significant results in image and video script recognition [5] since their development. Without using typical image processing methods and methodologies, the deep neural model processes the image or video data directly. ...