Fig 2 - uploaded by Muhammad Abul Hasan
Content may be subject to copyright.
Proposed Modified ResNet-18 architecture for Bangla HCR. In the diagram, conv stands for Convolutional layer, Pool stands for MaxPool layer, batch norm stand for batch normalization, Relu stands for rectified linear unit activation layer, Sum stands for the addition in ResNet, and FC stand for fully connected hidden layers. In this architecture, we have eight ResNet modules which are modified by adding a dropout layer after the second convolutional layers. 

Proposed Modified ResNet-18 architecture for Bangla HCR. In the diagram, conv stands for Convolutional layer, Pool stands for MaxPool layer, batch norm stand for batch normalization, Relu stands for rectified linear unit activation layer, Sum stands for the addition in ResNet, and FC stand for fully connected hidden layers. In this architecture, we have eight ResNet modules which are modified by adding a dropout layer after the second convolutional layers. 

Context in source publication

Context 1
... use Softmax after fully connected layers as default. Figure 2 shows the proposed modified ResNet-18 architecture and Table I shows the configuration detail of ResNet-18 architecture. We experimented using both Root Mean Square Propagation (RMSProp), Adam [31], and Stochastic Gradient Descent (SGD) optimizers to find the global minima of the cat- egorical cross entropy loss function. ...

Similar publications

Chapter
Full-text available
Many image classification models have been presented and studied for handwritten character recognition (HCR), the most recent and successful one being the convolutional neural network (CNNs). The success of CNNs has been proven in the famous ImageNet Large Scale Visual Recognition Challenge (ILSVRC) where different CNN models have been proposed and...
Preprint
Full-text available
This paper investigates the problem of aerial vehicle recognition using a text-guided deep convolutional neural network classifier. The network receives an aerial image and a desired class, and makes a yes or no output by matching the image and the textual description of the desired class. We train and test our model on a synthetic aerial dataset a...
Article
Full-text available
The article is devoted to the use of neural network methods to solve the problem of detection and classification of small objects on aerial photographs. The architectures of YOLO 2 and YOLO 3 are discussed. The training procedure is described, the obtained results are analyzed. During the work it was shown that the considered architectures can be u...

Citations

... On the training and validation set, it obtained accuracy values of 96.90% and 97.73%, respectively. In a study by Alif et al. [1], they proposed a modified version of the ResNet18 model to identify Bengali characters belonging to 84 different classes using the Banglalekha isolated dataset [4]. To improve the model's efficiency in Bangla handwritten character recognition, a dropout layer was added to each ResNet module. ...
... As shown in Table 4, the combination of preprocessing techniques and the proposed model surpasses the accuracy of all previous methods. [7] 60 92.25% Huda et al. [11] 60 96.42% alif et al. [1] 84 95.10% Azad et al. [2] 84 95.21% The Proposed Method 84 96.29% ...
Conference Paper
Full-text available
Recognizing handwritten characters presents significant challenges, particularly in languages like Bengali, which have numerous char- acters with their own styles and shapes. To address this challenge, a comprehensive approach to handwriting character recognition (HCR) in Bengali is proposed that utilizes convolutional neural net- works (CNNs) enhanced with upsampling layers, residual blocks, and efficient channel attention. The technique leverages morpho- logical preprocessing, like opening and closing in the preprocessing stage, to improve the model’s ability to distinguish between visu- ally identical characters. The opening and closing techniques are applied to one character and another between two characters with almost identical shapes, thereby enabling the model to discriminate between characters with similar shapes. The suggested method- ology is effective, achieving a noteworthy accuracy of 96.29% on the Banglalekha isolated dataset of 84 classes. Additionally, the Efficient Channel Attention(ECA) mechanism greatly improves accuracy from 96.09% to 96.29% on the recognition challenge by allowing the model to concentrate on discriminative features. Over- all, the incorporation of Efficient Channel Attention mechanisms and morphological preprocessing significantly improve the model’s capacity to recognize Bengali characters.The suggested method can address various practical challenges in Bengali handwritten character recognition, potentially boosting the accuracy of such systems substantially.
... Further advancements in this field have led to the development of region-based convolutional networks (R-CNN), Fast R-CNN, Faster R-CNN, and region-based fully convolutional networks (R-FCN), which employ a two-stage detection process. This process begins with the generation of region proposals and is followed by their refinement through localization and classification [6][7][8][9][10]. ...
Article
Full-text available
Accurate vehicle detection is crucial for the advancement of intelligent transportation systems, including autonomous driving and traffic monitoring. This paper presents a comparative analysis of two advanced deep learning models—YOLOv8 and YOLOv10—focusing on their efficacy in vehicle detection across multiple classes such as bicycles, buses, cars, motorcycles, and trucks. Using a range of performance metrics, including precision, recall, F1 score, and detailed confusion matrices, we evaluate the performance characteristics of each model.The findings reveal that YOLOv10 generally outperformed YOLOv8, particularly in detecting smaller and more complex vehicles like bicycles and trucks, which can be attributed to its architectural enhancements. Conversely, YOLOv8 showed a slight advantage in car detection, underscoring subtle differences in feature processing between the models. The performance for detecting buses and motorcycles was comparable, indicating robust features in both YOLO versions. This research contributes to the field by delineating the strengths and limitations of these models and providing insights into their practical applications in real-world scenarios. It enhances understanding of how different YOLO architectures can be optimized for specific vehicle detection tasks, thus supporting the development of more efficient and precise detection systems.
... Besides English handwritten text, studies have showcased the efficacy of deep CNNs for Devanagari [33], Bangla [34], Shui [35] and Gujarati [36] character recognition. More promising CNN-based classification results on highly cursive Urdu ligatures and Arabic characters have been reported by Javed et al. [37] and Elleuch et al. [38], respectively. ...
Preprint
Full-text available
The use of convolutional neural networks (CNNs) has accelerated the progress of handwritten character classification/recognition. Handwritten character recognition (HCR) has found applications in various domains, such as traffic signal detection, language translation, and document information extraction. However, the widespread use of existing HCR technology is yet to be seen as it does not provide reliable character recognition with outstanding accuracy. One of the reasons for unreliable HCR is that existing HCR methods do not take the handwriting styles of non-native writers into account. Hence, further improvement is needed to ensure the reliability and extensive deployment of character recognition technologies for critical tasks. In this work, the classification of English characters written by non-native users is performed by proposing a custom-tailored CNN model. We train this CNN with a new dataset called the handwritten isolated English character (HIEC) dataset. This dataset consists of 16,496 images collected from 260 persons. This paper also includes an ablation study of our CNN by adjusting hyperparameters to identify the best model for the HIEC dataset. The proposed model with five convolutional layers and one hidden layer outperforms state-of-the-art models in terms of character recognition accuracy and achieves an accuracy of 97.04\mathbf{97.04}%. Compared with the second-best model, the relative improvement of our model in terms of classification accuracy is 4.38\mathbf{4.38}%.
... CNNs have demonstrated remarkable success in various computer vision tasks, including object detection and classification [9]. For instance, CNNs have been widely adopted for pedestrian detection [10], face recognition [11], handwriting detection [12,13], and defect detection in manufacturing [14]. In the domain of transportation, CNNs have been employed for vehicle detection and classification [15], road sign recognition [16], and crack detection in roads and bridges [17]. ...
Article
Full-text available
Railway infrastructure safety is a paramount concern, with bolt integrity being a critical component. In the realm of railway maintenance, the detection of missing bolts is a vital task that ensures the stability and safety of tracks. Traditionally, this task has been approached through manual inspections or conventional automated methods, which are often time-consuming, costly, and prone to human error. Addressing these challenges, this paper presents a state-of-the-art solution with the development of a lightweight convolutional neural network (CNN) featuring an integrated attention mechanism. This novel model is engineered to be computationally efficient while maintaining high accuracy, making it particularly suitable for real-time analysis in resource-constrained environments commonly found in railway inspections. The proposed CNN utilises a distinctive architecture that synergises the speed of lightweight networks with the precision of attention-based mechanisms. By integrating an attention mechanism, the network selectively concentrates on regions of interest within the image, effectively enhancing the model's capability to identify missing bolts with remarkable accuracy. Comprehensive testing showcases a remarkable 96.43% accuracy and an impressive 96 F1-score, substantially outperforming existing deep learning frameworks in the context of missing bolt detection. Key contributions of this research include the model's innovative attention-integrated approach, which significantly reduces the model complexity without compromising detection performance. Additionally, the model offers scalability and adaptability to various railway settings, proving its efficacy not just in controlled environments but also in diverse real-world scenarios. Extensive experiments, rigorous evaluations, and real-time deployment results collectively underscore the transformative potential of the presented CNN model in advancing the domain of railway safety maintenance.
... Additionally, the 112x112 pixel standard picture size contributed to a decrease in memory use and computational complexity [36]. Furthermore, the key characteristics and details of the damaged pallet racking were preserved because of the 112x112 picture size. ...
Article
Full-text available
Pallet racking systems are shelves that are specifically intended to hold palletised items, and they are essential for the safe and effective handling of products in warehouses. These shelves are susceptible to damage from a variety of sources, including as wear and tear and collisions, which might jeopardise their structural integrity and put workers and stored items at risk. It's critical to identify faulty pallet racking quickly to avoid mishaps, product loss, and interruptions to business operations. Pallet racking system upkeep and routine inspections, however, can be expensive and prone to human mistakes. This research study suggests Pallet-Net, a unique deep learning technique that employs an attention-based convolutional neural network (CNN) to automatically detect faulty pallet racking, as a solution to this problem. The suggested technique uses attention processes to concentrate on the pallet racking image's damaged areas, making it easier to locate and identify damage. Pallet-Net precisely categorises the racking as either damaged or undamaged by learning the discriminative properties of these zones. The suggested approach, when compared to previous studies, provides great robustness and accuracy in locating and recognising damaged areas in pallet racking photos. Moreover, the proposed method obtains a 97.64% total accuracy rate, with 98% precision, 98% recall, and 98% F1 score. Recent deep learning models like Vision Transformer (ViT) and Compact Convolutional Transformer (CCT) are also analysed and compared to the suggested architecture.
... These qualities make it difficult to detect even with the human eye. Previously, we presented a slightly modified Resnet architecture that demonstrated state-of-the-art accuracy in identifying Bangla-isolated handwritten characters to address these issues and detect these minor differences in the Bangla handwriting [31]. After more experimentation and research on the previously proposed architecture, we are now presenting a version of modified ResNet-34 that is remarkably resilient in identifying Banglaisolated handwritten letters and has higher ac-curacy than its predecessor. ...
... In the previous paper [31], we explored the ResNet-18 capabilities by adding additional dropout layers, which provide generalised output with a higher degree of regularisation inspired by [32]. In this paper, we propose an improvement to the previously modified ResNet resulting in a more efficient gaining of the representation of structural characteris-tics of the Handwritten character statistical information and thus improving the overall classification accuracy. ...
... We have also experimented with various state-of-the-art CNN models, and our previously introduced modified ResNet-18 [31] with the same hybrid dataset and measured their performance against our proposed model. This experiment included unmodified Res-Net-18, unmodified ResNet-34, previously modified ResNet-18, and the proposed modified ResNet-34 architecture on the mixture of the two BanglaLekha-Isolated and Ekush dataset's 11 vowel classes. ...
Article
Full-text available
Bangla Handwritten Character Recognition (HCR) remains a persistent challenge within the domain of Optical Character Recognition (OCR) systems. Despite extensive research efforts spanning several decades, achieving satisfactory success in this field has proven to be complicated. Bangla, being one of the most widely spoken languages worldwide, consists of 50 primary characters, including 11 vowels and 39 consonants. Unlike Latin languages, Bangla characters exhibit complex patterns, diverse sizes, significant variations, intricate letter shapes, and intricate edges. These characteristics further differ based on factors such as the writer's age and birthplace. In this paper, we propose a modified ResNet-34 architecture, a convolutional neural network (CNN) model, to identify Bangla handwritten characters accurately. The proposed approach is validated using a merged subset of two popular Bangla handwritten datasets. Through our technique, we achieve state-of-the-art recognition performance. Experimental results demonstrate that the suggested model attains an average accuracy of 98.70% for Bangla handwritten vowels, 97.34% for consonants, and 99.02% for numeric characters. Additionally, when applied to a mixed dataset comprising vowels, consonants, and numeric characters, the proposed model achieves an overall accuracy of 97%. This research contributes to advancing digital manufacturing systems by addressing the challenge of Bangla Handwritten Character Recognition, offering a high-performing solution based on a modified ResNet-34 architecture. The achieved recognition accuracy signifies significant progress in this field, potentially paving the way for enhanced automation and efficiency in various applications that involve processing Bangla handwritten text.
... They were able to identify 95.13% of Handwritten Bangla digits. Alif et al. [7] ...
Article
Full-text available
Handwritten digit recognition is a fundamental problem in the field of computer vision and pattern recognition. This paper presents a Convolutional Neural Network (CNN) approach for recognizing handwritten Bangla digits. The proposed method utilizes a dataset of handwritten Bangla digit images and trains a CNN model to classify these digits accurately. The dataset is preprocessed to enhance the quality of the images and make them suitable for training the CNN model. The trained model is then tested on a separate test dataset to evaluate its performance in terms of accuracy. With the Ekush: Bangla Handwritten Data - Numerals dataset, we tested our CNN implementation to determine the precision of handwritten characters. According to the test results, 25% of the images using a training set of more than 150,000 images from Ekush dataset had an accuracy of 98.3%.
... Rabbani Alif et al. introduced a modified ResNet-18 architecture by adding a dropout layer to each module, which in turn improved the generalization and regularization of the input data [8]. They applied their model on two isolated Bangla handwritten dataset namely BanglaLekha-isolated [9] and CMATERdb dataset [10] achieving 95.10% and 95.99% accuracy respectively. ...
Conference Paper
Full-text available
Optical Character Recognition (OCR) systems are very powerful tools that are used to convert handwritten texts or digital data on an image to machine readable texts. The importance of Optical Character Recognition for handwritten documents cannot be overstated due to its widespread use in human transactions. OCR technology allows for the conversion of various types of documents or images into machine understandable data that can be analyzed, edited, and searched. In earlier years, manually crafted feature extraction techniques were used on comparatively small datasets which were not good enough for practical use. With the advent of deep learning, it was possible to perform OCR tasks more efficiently and accurately than ever before. In this paper, several OCR techniques have been reviewed. We mostly reviewed works on Bangla scripts and also gave an overview of the contemporary works and recent progresses in OCR technology (e.g. TrOCR, transformer w/ CNN). It was found that for Bangla handwritten texts, CNN models like DenseNet121, ResNet50, MobileNet etc are the commonly adopted techniques because of their state of the art performance in object recognition tasks. Using an RNN layer like LSTM or GRU alongside the base CNN-based architecture, the accuracy can be further improved. TrOCR is a fairly new technique in this field that shows promise. Experimental results show that in synthetic IAM handwriting dataset it showed a Character Error Rate (CER) of 2.89. The goal of this paper is to provide a summary of the research conducted on character recognition of handwritten documents in Bangla Scripts and suggest future research directions.
... Alif et al. [9] suggested a modified ResNet-18 architecture for recognizing isolated handwritten Bangla characters belonging to 84 different classes. The model achieves an accuracy of 95.99 percent. ...
Article
Full-text available
Accurately classifying user-independent handwritten Bengali characters and numerals presents a formidable challenge in their recognition. This task becomes more complicated due to the inclusion of numerous complex-shaped compound characters and the fact that different authors employ diverse writing styles. Researchers have recently conducted significant researches using individual approaches to recognize handwritten Bangla digits, alphabets, and slightly compound characters. To address this, we propose a straightforward and lightweight convolutional neural network (CNN) framework to accurately categorize handwritten Bangla simple characters, compound characters, and numerals. The suggested approach exhibits outperformance in terms of performance when compared too many previously developed procedures, with faster execution times and requiring fewer epochs. Furthermore, this model applies to more than three datasets. Our proposed CNN-based model has achieved impressive validation accuracies on three datasets. Specifically, for the BanglaLekha isolated dataset, which includes 84-character classes, the validation accuracy was 92.48%. On the Ekush dataset, which includes 60-character classes, the model achieved a validation accuracy of 97.24%, while on the customized dataset, which includes 50-character classes, the validation accuracy was 97.03%. Our model has demonstrated high accuracy and outperformed several prominent existing frameworks.
... Moreover, the stated technique has a recorded recognition rate for 10 numerals class of 98.66%, 11 classes on vowels of 94.99%, and 91.23% (having 50 categories) on alphabets. In [4], a reformed structure of ResNet-18 has been suggested for the classification of Bangla handwritten character samples. ...
... Paper ID Jan [38,42,47,57,83,96,110] Feb [4,19,22,27,29,37,43,54,56,91,98,122] March [9,18,20,44,49,80,88] April [3,5,45,51,73,114] May [23,25,36,120] June [8,24,35,40,41,55,72,100,121] July [11,15,53 first step, preprocessing, the transformation of data is performed to bring out the appropriate format for further processing which is crucial for the accuracy of any recognition system. In this step, different noise reduction methods based on filtering and morphological operations are performed. ...
Article
Full-text available
Document Analysis and Recognition (DAR) is ongoing research that has been studied extensively for many decades and it has achieved a substantial level that generates several technology-driven applications. Further, with the advent of high-end computing power, the implementation of advanced character recognition techniques is enabled and creates growing demand on different emerging application domains. The development of such application domains required more advanced techniques and methodologies. The main aim of this paper is to present a manual update for researchers working in the field of character recognition. A brief history of how character recognition has evolved is depicted. Then, general steps involved in character recognition are discussed, and each step is reviewed with the available techniques. The current status and future direction for the character recognition system are discussed. Lastly, the focus is given to the off-line handwritten recognition systems as this research area needs more experiments to attained the machine simulation of human reading capability.