A preview of this full-text is provided by Springer Nature.
Content available from International Journal on Document Analysis and Recognition (IJDAR)
This content is subject to copyright. Terms and conditions apply.
International Journal on Document Analysis and Recognition (IJDAR) (2023) 26:131–147
https://doi.org/10.1007/s10032-022-00422-7
ORIGINAL PAPER
Refocus attention span networks for handwriting line recognition
Mohammed Hamdan1·Himanshu Chaudhary2·Ahmed Bali3·Mohamed Cheriet1
Received: 1 June 2022 / Revised: 24 July 2022 / Accepted: 9 December 2022 / Published online: 25 December 2022
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022
Abstract
Recurrent neural networks have achieved outstanding recognition performance for handwriting identification despite the
enormous variety observed across diverse handwriting structures and poor-quality scanned documents. We initially proposed
a BiLSTM baseline model with a sequential architecture well-suited for modeling text lines due to its ability to learn probability
distributions over character or word sequences. However, employing such recurrent paradigms prevents parallelization and
suffers from vanishing gradients for long sequences during training. To alleviate these limitations, we propose four significant
contributions to this work. First, we devised an end-to-end model composed of a split-attention CNN-backbone that serves as
a feature extraction method and a self-attention Transformer encoder–decoder that serves as a transcriber method to recognize
handwriting manuscripts. The multi-head self-attention layers in an encoder–decoder transformer-based enhance the model’s
ability to tackle handwriting recognition and learn the linguistic dependencies of character sequences. Second, we conduct
various studies on transfer learning (TL) from large datasets to a small database, determining which model layers require
fine-tuning. Third, we attained an efficient paradigm by combining different strategies of TL with data augmentation (DA).
Finally, since the robustness of the proposed model is lexicon-free and can recognize sentences not presented in the training
phase, the model is only trained on a few labeled examples with no extra cost of generating and training on synthetic datasets.
We recorded comparable and outperformed Character and Word Error Rates CER/WER on four benchmark datasets to the
most recent (SOTA) models.
Keywords Split attention convolutional network ·Multi-head attention transformer ·Seq2Seq-model ·BiLSTM ·Line
handwriting recognition
BMohammed Hamdan
mohammed.hamdan.1@ens.etsmtl.ca
Himanshu Chaudhary
him4318@gmail.com
Ahmed Bali
ahmed.bali.1@ens.etsmtl.ca
Mohamed Cheriet
mohamed.cheriet@etsmtl.ca
1Synchromedia Lab, System Engineering, University of
Quebec (ETS), 1100 Notre-Dame St W, Montreal, Quebec
H3C 1K3, Canada
2Data Science, Dr. A.P.J. Abdul Kalam Technical University,
CDRI Rd, Naya Khera, Jankipuram, Lucknow, Uttar Pradesh
226031, India
3Department of Software and IT Engineering, University of
Quebec (ETS), 1100 Notre-Dame St W, Montreal, Quebec
H3C 1K3, Canada
1 Introduction
Handwriting Text Recognition (HTR) systems allow comput-
ers to read and understand human handwriting. HTR is useful
for digitizing the textual contents of old document images in
historical records and contemporary administrative material
such as cheques, law letters, forms, and other documents.
While HTR research has been ongoing since the early 1960s
[34], it remains a challenging and unsolved research prob-
lem. The fundamental problem is the wide range of variations
and ambiguity encountered by different writers when crafting
words. Because the words to be deciphered usually adhere
to well-defined grammar rules, it is possible to eliminate
gibberish hypotheses and enhance recognition accuracy by
modeling the linguistic practices. HTR is usually embarked
with a blend of computer vision and natural language pro-
cessing (NLP).
In nature, handwritten text is a signal that follows a par-
ticular sequence. Texts in Latin languages are written in
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.