ArticlePDF Available

A review of neural networks in handwritten character recognition

Authors:

Abstract

Handwritten character recognition has been a significant research focus in the fields of pattern recognition and artificial intelligence over the past decade. With the advent of neural networks, particularly deep learning models, the accuracy and efficiency of offline handwritten character recognition have dramatically improved. This paper presents a comprehensive review of recent developments in applying neural networks to handwritten character recognition. The literature review covers studies conducted between 2016 and 2023, providing insights into the methodologies, data processing techniques, and evaluation metrics used. The review spans various neural network architectures, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and hybrid models. It categorizes and compares their performance across multiple benchmark datasets, highlighting specific improvements in recognition accuracy and efficiency. Furthermore, the review discusses the challenges faced in large-scale datasets, such as the diversity of handwriting styles and computational cost constraints. Notably, CNNs have shown outstanding performance, but the integration of advanced techniques like transfer learning and Generative Adversarial Networks (GANs) is explored as a potential avenue to enhance future recognition systems.
A review of neural networks in handwritten character
recognition
Ruoxin Li
Electronic and Electrical Engineering, University of Sheffield, Sheffield, S10 2TN,
United Kingdom
liruoxin1209@gmail.com
Abstract. Handwritten character recognition has been a significant research focus in the fields
of pattern recognition and artificial intelligence over the past decade. With the advent of neural
networks, particularly deep learning models, the accuracy and efficiency of offline handwritten
character recognition have dramatically improved. This paper presents a comprehensive review
of recent developments in applying neural networks to handwritten character recognition. The
literature review covers studies conducted between 2016 and 2023, providing insights into the
methodologies, data processing techniques, and evaluation metrics used. The review spans
various neural network architectures, including Convolutional Neural Networks (CNNs),
Recurrent Neural Networks (RNNs), and hybrid models. It categorizes and compares their
performance across multiple benchmark datasets, highlighting specific improvements in
recognition accuracy and efficiency. Furthermore, the review discusses the challenges faced in
large-scale datasets, such as the diversity of handwriting styles and computational cost
constraints. Notably, CNNs have shown outstanding performance, but the integration of
advanced techniques like transfer learning and Generative Adversarial Networks (GANs) is
explored as a potential avenue to enhance future recognition systems.
Keywords: Hand-written character recognition, convolutional neural network, recurrent neural
network, hybrid model, deep learning.
1. Introduction
As a research focus in the field of pattern recognition for several decades, handwritten character
recognition (HCR) has been applied to several areas including postal address reading, bank check
processing, and historical document digitization [1]. It can be categorized as online HCR and offline
HCR. The complexity of this problem arises from the diversity of individual writing styles, distortions,
and noise in the data. Early studies state that the traditional method primarily involved stages of data
preprocessing, feature extraction, and classification for character recognition. Despite these efforts, the
recognition rates achieved by most renowned commercial brands were below 90% [1]. The advent of
neural networks has brought groundbreaking progress and vitality to this field. A remarkable model
called convolutional neural network (CNN), as proposed by research, is capable of automatically
extracting rich and interconnected features from images. Furthermore, it can achieve significant
recognition accuracy, as research has shown that it achieves an impressive 99.87% recognition accuracy
on the MNIST dataset [2]. This paper reviews the development and application of neural network models
Proceedings of the 6th International Conference on Computing and Data Science
DOI: 10.54254/2755-2721/92/20241736
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
169
in handwritten character recognition, and explores various neural network architectures including CNNs,
RNNs and hybrid models, discussing their methodologies and performance metrics. A suggested
direction for future research is considered in this paper as well. This research enhances the efficiency
and accuracy of operations in postal services, banking, and document preservation, reducing errors and
saving resources. In the AI field, it advances the development of more robust recognition systems.
2. Early handwritten character recognition development
Handwriting recognition systems traditionally comprise three main components: data preprocessing,
feature extraction, and classification.
Data preprocessing involves several steps to optimize input data, including sample normalization,
noise removal, and geometric transformations such as rotation and scaling to correct distortions.
Additionally, techniques like generating pseudo-samples and adding virtual strokes are employed.
Feature extraction is crucial in handwriting recognition, and it can be divided into structural and
statistical features. Structural features analyze the character's structure, strokes, or components to extract
shape and layout information. However, for handwritten characters, statistical features have proven to
be more effective because handwriting is highly variable and inconsistent, with differences in stroke
thickness, slant, and style across different writers. Statistical features, such as directional features like
Gabor and gradient features, capture the distribution and directionality of pixel intensities, which makes
them more robust in handling the variations inherent in handwritten text. These features are widely used
for offline handwriting character recognition (HCCR) because they can effectively capture the essential
characteristics of handwriting despite its variability.
Classification involves using models like the Modified Quadratic Discriminant Function (MQDF),
Support Vector Machines (SVM), Hidden Markov Models (HMM), Discriminative Learning Quadratic
Discriminant Function (DLQDF), and Learning Vector Quantization (LVQ). These models classify
characters based on the extracted features. Text line recognition is another critical aspect, which can be
approached using segmentation-based or segmentation-free strategies. Segmentation-based methods use
projection and connected component analysis to segment text lines into individual characters, which are
then recognized using character classifiers.
Segmentation-free methods employ sliding window techniques, where a window moves across the
text line, and character recognition is performed within the window, often combined with statistical
language models in a Bayesian framework to model the context and generate the final recognition result.
Despite significant advancements, early offline handwriting recognition systems faced challenges in
handling diverse handwriting styles and large-scale datasets. However, these traditional methods laid
the foundation for subsequent research and inspired modern neural network-based approaches that have
further improved recognition accuracy and robustness.
3. Convolutional Neural Networks (CNNs)
3.1. Convolutional Neural Networks (CNNs) model architecture and updates
CNNs have become the standard for image recognition tasks, including Handwritten Character
Recognition (HCR).
LeNet-5, proposed by Yann LeCun et al. in 1998, is a classic CNN architecture primarily used for
handwritten digit recognition, such as the MNIST dataset. It consists of two convolutional layers
followed by subsampling layers, a fully connected layer, and an output layer. This architecture was
revolutionary at its time and laid the groundwork for future CNN development.
AlexNet, introduced by Alex Krizhevsky et al. in 2012, is a deep CNN architecture that gained fame
by winning the ImageNet Large Scale Visual Recognition Challenge. Although it was designed for
complex image classification tasks, AlexNet can also be applied to handwritten character recognition.
Its deeper structure, consisting of five convolutional layers and three fully connected layers, allows it to
capture more complex features of the input data.
Proceedings of the 6th International Conference on Computing and Data Science
DOI: 10.54254/2755-2721/92/20241736
170
ResNet, developed by Kaiming He et al. in 2015, addresses the vanishing gradient problem in deep
networks by introducing residual blocks. These blocks enable the training of much deeper networks by
allowing the smooth flow of gradients through skip connections. ResNet's ability to train extremely deep
models has proven to be advantageous in achieving high accuracy for various recognition tasks,
including HCR. The field has been further advanced by improved models such as Relaxation CNN and
ART CNN.
3.2. Benchmark performance evaluation
The performance of CNN architectures can be evaluated using benchmark datasets. For example, in the
Arabic handwritten character dataset, the ResNet architecture achieved an accuracy of 0.72, a precision
of 0.74, and a recall of 0.70. AlexNet, in contrast, achieved higher scores with an accuracy of 0.8107, a
precision of 0.8270, and a recall of 0.8024 [3]. LeNet, while foundational, showed lower performance
metrics with an accuracy of 0.6435, a precision of 0.8489, and a recall of 0.6381. These results highlight
the advancements in CNN architectures over time. While LeNet provided a solid foundation for CNN
applications in HCR, more recent architectures such as AlexNet and ResNet have significantly improved
performance by capturing more complex features and enabling the use of deeper networks. Future
research and model improvements, such as Relaxation CNN and ART CNN, continue to push the
boundaries of what is achievable in handwritten character recognition, holding the promise of even
greater accuracy and robustness in diverse and complex datasets.
4. Recurrent Neural Network (RNN)
4.1. General model
Recurrent Neural Networks (RNNs) have been widely used for sequence modeling in handwriting
recognition. RNNs are particularly useful in recognizing cursive writing, where context understanding
is crucial. In the standard RNN architecture, modifications are often made to the basic framework to
improve its ability to capture dependencies across sequences. These modifications include altering the
depth of the network, adjusting the number of hidden units, and fine-tuning the activation functions to
better handle the nuances of cursive script. However, despite these improvements, basic RNNs still have
several limitations. One of the most significant issues is the vanishing gradient problem, which hampers
the network's ability to effectively learn long-range dependencies. Additionally, RNNs are prone to
overfitting, especially when the training data is limited, and they struggle with high computational
demands, which reduces their efficiency in processing very long sequences.
4.2. Improved model
Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRUs) represent significant
advancements over traditional Recurrent Neural Networks (RNNs), particularly in tasks such as
handwriting recognition where context and long-term dependencies play a crucial role. LSTMs
introduce memory cells equipped with three gating mechanismsinput, forget, and output gatesthat
regulate the flow of information. These gates ensure that relevant data is retained over long sequences,
while irrelevant information is discarded, effectively solving the vanishing gradient problem that
hampers traditional RNNs.
Conversely, GRUs simplify the LSTM architecture by combining the functions of the input and
forget gates into a single update gate, thus eliminating the necessity for separate memory cells. GRUs
also include a reset gate, which controls the influence of previous states on the current output. This
simpler structure allows GRUs to achieve similar performance to LSTMs but with fewer parameters,
resulting in lower computational costs.
The vanishing gradient problem in RNNs occurs when gradients used to update the network's weights
diminish as they are propagated backward through time, leading to extremely slow or stalled learning in
the earlier layers of the network. This problem is particularly pronounced in tasks requiring the learning
of long-term dependencies, such as handwriting recognition. LSTM networks address this issue by
Proceedings of the 6th International Conference on Computing and Data Science
DOI: 10.54254/2755-2721/92/20241736
171
introducing memory cells that maintain a constant error, allowing gradients to remain effective across
numerous time steps. Similarly, GRUs simplify this architecture while maintaining effectiveness,
making them computationally more efficient [4]. These improved models have become essential tools
in sequence modeling, particularly in applications like handwriting recognition, where understanding
the sequential nature of input data is critical.
4.3. Benchmark performance evaluation
In comparing the performance of GRU and LSTM models across multiple handwriting recognition
datasets, the GRU-based model (Att-BGRU-V) consistently outperforms LSTM-based models. The
GRU model achieves lower RMSE and ED values while maintaining higher or comparable recognition
rates. For instance, on the IRONOFF lower-case dataset, the GRU model records an RMSE of 2.9 and
a recognition rate of 93.2%, outperforming LSTM models which have higher RMSEs and lower
accuracy. Similarly, on the LMCA dataset, the GRU model not only matches the LSTM’s recognition
accuracy of 98.9% but also demonstrates lower RMSE. This indicates that GRU models offer superior
performance in handwriting recognition tasks, likely due to their simplified architecture and lower
computational requirements [5].
5. Hybrid models
Recent studies have investigated hybrid models that combine CNN and RNN architectures in order to
capitalize on the strengths of both. These models are designed to enhance accuracy by capturing both
the spatial and sequential dependencies of handwritten text.
5.1. Typical architecture
The specific architecture is based on a sequence-to-sequence (Seq2Seq) model with an attention
mechanism. This hybrid model utilizes a CNN for feature extraction, which efficiently captures spatial
hierarchies in the handwriting. Convolutional layers process the input images to extract detailed features
such as edges, curves, and textures. These features are crucial for understanding the spatial aspects of
the handwriting.
Following feature extraction, the model employs a bidirectional gated recurrent unit (BGRU) for
encoding and decoding the extracted features. The BGRU processes the sequential data in both forward
and backward directions, which allows it to capture the context from both past and future sequences.
This bidirectional approach is particularly beneficial for recognizing and interpreting complex
handwriting patterns where context is essential.
The attention mechanism within the Seq2Seq model further enhances the hybrid architecture by
focusing on relevant parts of the input sequence during the decoding process. This mechanism
dynamically weights the importance of different features, allowing the model to concentrate on critical
aspects of the handwriting while generating the output sequence.
5.2. Performance evaluation and benchmarking
On the RIMES dataset, the H2TR (CNN-RNN) model achieved a character accuracy of 98.14%,
demonstrating superior performance compared to standalone CNN and RNN models [6]. More
specifically, the H2TR model demonstrated a 2% improvement over the CNN model and an 11%
improvement over the RNN model.
The performance of the H2TR model can be attributed to its ability to effectively capture both spatial
and sequential dependencies. By combining the strengths of CNNs and RNNs, the hybrid model can
better understand and interpret the nuances of handwritten text. This leads to more accurate recognition
and classification of characters, even in complex and varied handwriting styles.
In addition to character accuracy, the hybrid model also shows improvements in other metrics such
as word error rate (WER) and sequence error rate (SER). These improvements demonstrate the
robustness and versatility of the hybrid CNN-RNN architecture in effectively handling a wide range of
handwriting recognition tasks.
Proceedings of the 6th International Conference on Computing and Data Science
DOI: 10.54254/2755-2721/92/20241736
172
6. Discussion
Despite significant strides in handwritten character recognition, several challenges persist across
different neural network architectures. Convolutional Neural Networks (CNNs), while effective in
image recognition tasks, are limited by their fixed receptive fields and reliance on large datasets for
optimal performance. Recurrent Neural Networks (RNNs), on the contrary, face issues such as vanishing
gradients and high computational complexity, particularly in long sequences.
Hybrid models that combine CNNs and RNNs have been proposed to leverage their respective
strengths, but these models often suffer from complexity and tuning difficulties. An emerging approach
that shows promise in overcoming these challenges involves the use of Generative Adversarial Networks
(GANs) for data augmentation.
GANs consist of two neural networks: a generator and a discriminator. The generator synthesizes
artificial data samples, while the discriminator learns to distinguish between real and fake data. This
adversarial process leads to the generation of highly realistic synthetic data that can effectively expand
the training dataset for recognition tasks.
By leveraging GANs for data augmentation, they offer a robust solution to improve recognition
accuracy and efficiency. The synthetic data generated by GANs enriches the training dataset, and
enhances the overall performance of recognition systems by reducing overfitting and improving
generalization [7]. This approach addresses the inherent limitations of traditional CNN and RNN.
In parallel, federated learning emerges as a crucial technique for privacy-preserving handwriting
recognition. In scenarios where sensitive handwriting data, such as personal signatures or medical notes,
are involved, federated learning allows models to be trained across multiple decentralized devices
without the need to share raw data. This ensures the maintenance of privacy while still benefiting from
the diverse datasets across different sources, thereby improving the robustness and generalization of the
models. The use of federated learning could have a significant impact in industries like banking or
healthcare, where data privacy is paramount [8].
Meanwhile, Meta-learning with its ability to improve learning algorithms based on multiple tasks,
offers several advantages that can address some challenges in HCR. One of the significant benefits of
meta-learning is its ability to perform well with limited data, which is particularly useful in HCR where
annotated data can be scarce [9]. Few-shot learning techniques within meta-learning allow models to
learn new characters or styles with very few examples. This capability can be crucial for expanding the
recognition system to new alphabets or styles without the need for extensive retraining. Meta-learning
can also optimize the learning algorithms themselves, enhancing their efficiency and efficacy in
handling HCR tasks. This includes learning the best initialization parameters, optimization strategies,
and even the most suitable hyperparameters for the HCR models.
Future research should focus on data augmentation and standardization, model optimization and
architecture improvement, and the integration of advanced techniques like meta-learning and
reinforcement learning. It is crucial to enhance model robustness through multimodal fusion and testing
on noisy data. Additionally, fostering cross-disciplinary collaboration and sharing models, data, and
research outcomes. Additionally, fostering cross-disciplinary collaboration and sharing models, data,
and research outcomes will drive progress. Key future directions include utilizing GANs for data
augmentation, exploring Transformers and hybrid models, advancing few-shot and zero-shot learning,
applying self-supervised and unsupervised learning, enhancing model interpretability and real-time
processing capabilities, integrating multimodal information, and adopting federated learning for privacy
protection. These efforts will significantly improve the accuracy, efficiency, and adaptability of HCR
systems.
7. Conclusion
This comprehensive review explored various neural network architectures and their applications in
offline HCR. CNNs have emerged as the mainstream approach due to their robust feature extraction
capabilities. However, hybrid models and advanced techniques such as Meta-learning and GANs are
gaining attention for their potential to address existing challenges in HCR. The review concentrated
Proceedings of the 6th International Conference on Computing and Data Science
DOI: 10.54254/2755-2721/92/20241736
173
heavily on well-established models such as CNNs, RNNs, LSTMs, and GRUs, potentially neglecting
emerging techniques and less explored hybrid models that could offer alternative perspectives or
innovative solutions. Looking forward, future research could address these limitations by expanding the
scope of the literature review to include a broader range of studies, especially those that explore novel
or underrepresented techniques. Additionally, incorporating a wider variety of datasets and evaluation
metrics would provide a more holistic understanding of model performance. Furthermore, exploring the
integration of advanced techniques such as meta-learning, transfer learning, and generative models could
offer new avenues for enhancing HCR systems. Finally, fostering cross-disciplinary collaboration and
focusing on the practical deployment of HCR technologies in real-world settings will be essential for
translating research advancements into tangible societal benefits.
Acknowledgement
I would like to express my sincere gratitude to everyone who contributed to the completion of this paper.
My deepest thanks go to my academic supervisor for their expert guidance and invaluable feedback. I
am also grateful to my colleagues and peers for their insightful discussions and support, and to my family
and friends for their unwavering encouragement. Lastly, I extend my appreciation to the university staff
for their assistance and resources, which greatly facilitated my research process. Thank you all for your
contributions.
References
[1] Jin, L., Zhong, Z., & Yang, Z. (2016). Applications of Deep Learning for Handwritten Chinese
Character Recognition: A Review. ACTA AUTOMATICA SINICA, 42(8), 11251141.
[2] Ahlawat, S., Choudhary, A., Nayyar, A., Singh, S., & Yoon, B. (2020). Improved handwritten
digit recognition using convolutional neural networks (CNN). Sensors, 20(12), 3344.
[3] Nugraha, G. S., Darmawan, M. I., & Dwiyansaputra, R. (2023). Comparison of CNN’s
architecture GoogleNet, Alexnet, VGG-16, Lenet -5, resnet-50 in Arabic handwriting pattern
recognition. Kinetik: Game Technology, Information System, Computer Network, Computing,
Electronics, and Control.
[4] Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., & Schmidhuber, J. (2009). A
novel connectionist system for unconstrained handwriting recognition. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 31(5), 855868.
[5] Rabhi, B., Elbaati, A., Boubaker, H., Hamdi, Y., Hussain, A., & Alimi, A. M. (2021). Multi-
lingual character handwriting framework based on an integrated deep learning based
sequence-to-sequence attention model. Memetic Computing, 13(4), 459475.
[6] Geetha, R., Thilagam, T., & Padmavathy, T. (2021). Effective offline handwritten text recognition
model based on a sequence-to-sequence approach with CNNRNN networks. Neural
Computing and Applications, 33(17), 1092310934.
[7] Elaraby, N., Barakat, S., & Rezk, A. (2022). A conditional GAN-based approach for enhancing
transfer learning performance in few-shot HCR tasks. Scientific Reports, 12(1).
[8] Zhuofan Mei, The Recognition of Tibetan Handwritten Numbers Based on Federated Learning.
Journal of Artificial Intelligence Practice (2021) Vol. 4: 1-12. DOI:
[9] Hospedales, T., Antoniou, A., Micaelli, P., & Storkey, A. (2021). Meta-learning in Neural
Networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11.
Proceedings of the 6th International Conference on Computing and Data Science
DOI: 10.54254/2755-2721/92/20241736
174
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The Arabic script is written from right to left and consists of 28 characters, with no capital or lowercase letters. The Arabic script has several orthographic and morphological properties that make handwriting recognition of the Arabic script challenging. In addition, one of the biggest challenges in recognizing Arabic script patterns is the different handwriting styles and characters of each person's writing. The authors propose a study to compare the accuracy of handwriting pattern recognition in Arabic script which has been done previously by comparing five CNN architectures, namely GoogleNet, AlexNet, VGG-16, LeNet-5, and ResNet-50. Considering that previous research has not obtained excellent accuracy. The number of datasets used is 8400 image data and the most optimal comparison of testing and training data is 80:20. Based on the research that has been done, there are several things that the author can conclude. The model is made using 64 filters for each convolution layer because the optimal size is used for 5 architectures, kernel size is 3x3, neurons is 128, dropout weight is 50% to reduce overfitting, learning rate is 0.001, image size is 64x64, the normalization method with the ReLU activation function, and 1-dimensional input image (grayscale), and with a comparison of testing and training data of 80:20. The VGG-16 architectural model is the architecture that gets the highest score, namely 83.99%. This can have good potential to be developed as a medium for learning Arabic script.
Article
Full-text available
Supervised learning with the restriction of a few existing training samples is called Few-Shot Learning. FSL is a subarea that puts deep learning performance in a gap, as building robust deep networks requires big training data. Using transfer learning in FSL tasks is an acceptable way to avoid the challenge of building new deep models from scratch. Transfer learning methodology considers borrowing the architecture and parameters of a previously trained model on a large-scale dataset and fine-tuning it for low-data target tasks. But practically, fine-tuning pretrained models in target FSL tasks suffers from overfitting. The few existing samples are not enough to correctly adjust the pretrained model’s parameters to provide the best fit for the target task. In this study, we consider mitigating the overfitting problem when applying transfer learning in few-shot Handwritten Character Recognition (HCR) tasks. A data augmentation approach based on Conditional Generative Adversarial Networks is introduced. CGAN is a generative model that can create artificial instances that appear more real and indistinguishable from the original samples. CGAN helps generate extra samples that hold the possible variations of human handwriting instead of applying traditional image transformations. These transformations are low-level, data-independent operations, and only produce augmented samples with limited diversity. The introduced approach was evaluated in fine-tuning the three pretrained models: AlexNet, VGG-16, and GoogleNet. The results show that the samples generated by CGAN can enhance transfer learning performance in few-shot HCR tasks. This is by achieving model fine-tuning with fewer epochs and by increasing the model’s F1-score and decreasing the Generalization Error (Etest).
Article
Full-text available
Online signals are rich in dynamic features such as trajectory chronology, velocity, pressure and pen up/down movements. Their offline counterparts consist of a set of pixels. Thus, online handwriting recognition accuracy is generally better than offline. In this paper, we propose an original framework for recovering temporal order and pen velocity from offline multi-lingual handwriting. Our framework is based on an integrated sequence-to-sequence attention model. The proposed system involves extracting a hidden representation from an image using a convolutional neural network (CNN) and a bidirectional gated recurrent unit (BGRU), and decoding the encoded vectors to generate dynamic information using a BGRU with temporal attention. We validate our framework using an online recognition system applied to a benchmark Latin, Arabic and Indian On/Off dual-handwriting character database. The performance of the proposed multi-lingual system is demonstrated through a low error rate of point coordinates and high accuracy system rate.
Article
Full-text available
Automatic text recognition system might serve as an important factor in creating a paperless environment through digitizing and processing the existing paper documents in the upcoming days. Handwritten recognition using deep learning methods has been widely explored by many researchers. The existence of large quantity of data and a variety of algorithmic innovations enable the ease of training deep neural networks. Different techniques have been initiated in the literature for recognizing text from handwritten documents. This paper proposes a hybrid handwritten text recognition (H2TR) model using deep neural networks that use the sequence-to-sequence (Seq2Seq) approach. This hybrid model makes use of the salient features of convolution neural network (CNN) and recurrent neural network (RNN) with long–short-term memory network (LSTM). It uses CNN to extract the features from the handwritten image. The features that are extracted are later modelled with a sequence-to-sequence approach and fed to RNN–LSTM for encoding the visual features and decoding the sequence of letters that are available in the handwritten image. The proposed model is tested with IAM and RIMES handwritten databases, which shows competitive letter accuracy and word accuracy results.
Article
Full-text available
Traditional systems of handwriting recognition have relied on handcrafted features and a large amount of prior knowledge. Training an Optical character recognition (OCR) system based on these prerequisites is a challenging task. Research in the handwriting recognition field is focused around deep learning techniques and has achieved breakthrough performance in the last few years. Still, the rapid growth in the amount of handwritten data and the availability of massive processing power demands improvement in recognition accuracy and deserves further investigation. Convolutional neural networks (CNNs) are very effective in perceiving the structure of handwritten characters/words in ways that help in automatic extraction of distinct features and make CNN the most suitable approach for solving handwriting recognition problems. Our aim in the proposed work is to explore the various design options like number of layers, stride size, receptive field, kernel size, padding and dilution for CNN-based handwritten digit recognition. In addition, we aim to evaluate various SGD optimization algorithms in improving the performance of handwritten digit recognition. A network’s recognition accuracy increases by incorporating ensemble architecture. Here, our objective is to achieve comparable accuracy by using a pure CNN architecture without ensemble architecture, as ensemble architectures introduce increased computational cost and high testing complexity. Thus, a CNN architecture is proposed in order to achieve accuracy even better than that of ensemble architectures, along with reduced operational complexity and cost. Moreover, we also present an appropriate combination of learning parameters in designing a CNN that leads us to reach a new absolute record in classifying MNIST handwritten digits. We carried out extensive experiments and achieved a recognition accuracy of 99.87% for a MNIST dataset.
Article
Full-text available
Recognizing lines of unconstrained handwritten text is a challenging task. The difficulty of segmenting cursive or overlapping characters, combined with the need to exploit surrounding context, has led to low recognition rates for even the best current recognizers. Most recent progress in the field has been made either through improved preprocessing or through advances in language modeling. Relatively little work has been done on the basic recognition algorithms. Indeed, most systems rely on the same hidden Markov models that have been used for decades in speech and handwriting recognition, despite their well-known shortcomings. This paper proposes an alternative approach based on a novel type of recurrent neural network, specifically designed for sequence labeling tasks where the data is hard to segment and contains long-range bidirectional interdependencies. In experiments on two large unconstrained handwriting databases, our approach achieves word recognition accuracies of 79.7 percent on online data and 74.1 percent on offline data, significantly outperforming a state-of-the-art HMM-based system. In addition, we demonstrate the network's robustness to lexicon size, measure the individual influence of its hidden layers, and analyze its use of context. Last, we provide an in-depth discussion of the differences between the network and HMMs, suggesting reasons for the network's superior performance.
Article
Handwritten Chinese character recognition (HCCR) is an important research filed of pattern recognition, which has attracted extensive studies during the past decades. With the emergence of deep learning, new breakthrough progresses of HCCR have been obtained in recent years. In this paper, we review the applications of deep learning models in the field of HCCR. First, the research background and current state-of-the-art HCCR technologies are introduced. Then, we provide a brief overview of several typical deep learning models, and introduce some widely used open source tools for deep learning. The approaches of online HCCR and offline HCCR based on deep learning are surveyed, with the summaries of the related methods, technical details, and performance analysis. Finally, further research directions are discussed.
The Recognition of Tibetan Handwritten Numbers Based on Federated Learning
  • Zhuofan Mei
Zhuofan Mei, The Recognition of Tibetan Handwritten Numbers Based on Federated Learning. Journal of Artificial Intelligence Practice (2021) Vol. 4: 1-12. DOI:
Meta-learning in Neural Networks: A survey
  • T Hospedales
  • A Antoniou
  • P Micaelli
  • A Storkey
Hospedales, T., Antoniou, A., Micaelli, P., & Storkey, A. (2021). Meta-learning in Neural Networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1-1. Proceedings of the 6th International Conference on Computing and Data Science DOI: 10.54254/2755-2721/92/20241736