A preview of this full-text is provided by Springer Nature.
Content available from International Journal of Speech Technology
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
International Journal of Speech Technology (2025) 28:53–65
https://doi.org/10.1007/s10772-024-10164-y
Exploring data augmentation forAmazigh speech recognition
withconvolutional neural networks
HossamBoulal1· FaridaBouroumane1· MohamedHamidi2 · JamalBarkani1· MustaphaAbarkan1
Received: 10 September 2024 / Accepted: 26 October 2024 / Published online: 14 November 2024
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024
Abstract
In the field of speech recognition, enhancing accuracy is paramount for diverse linguistic communities. Our study addresses
this necessity, focusing on improving Amazigh speech recognition through the implementation of three distinct data aug-
mentation methods: Audio Augmentation, FilterBank Augmentation, and SpecAugment. Leveraging Convolutional Neural
Networks (CNNs) for speech recognition, we utilize Mel Spectrograms extracted from audio files. The study specifically
targets the recognition of the initial ten Amazigh digits. We conducted experiments with a speaker-independent approach
involving 42 participants. A total of 27 experiments were conducted, utilizing both original and augmented data. Among the
different CNN models employed, the VGG19 model showcased significant promise. Our results demonstrate a maximum
accuracy of 95.66%. Furthermore, the most notable improvement achieved through data augmentation was 4.67%. These
findings signify a substantial enhancement in speech recognition accuracy, indicating the efficacy of the proposed methods.
Keywords Speech recognition· Data augmentation· Deep learning· Feature extraction· Amazigh digits
1 Introduction
The domain of speech recognition (Jean Louis etal., 2022;
Huang & Deng, 2010) wields significant influence in reshap-
ing our interactions with computers (Yadava & Jayanna,
2017), not only catering to the needs of individuals with
specific requirements (Mayer, 2018) but also extending its
reach to a broader audience (Besacier etal., 2014). Advanc-
ing the computer’s comprehension of speech not only serves
users but also facilitates technology in adapting to the inher-
ently speech-centric nature of human communication. While
numerous researchers primarily concentrate on enhancing
the computer’s ability to understand widely spoken lan-
guages such as English, our aspiration is to address the gap
in the identification of lesser-known languages, particu-
larly those with limited resources. An exemplary instance
of such a language is Amazigh. Despite recent attention
drawn towards these languages, there remains a scarcity
of resources dedicated to their recognition. Our aim is to
contribute towards filling this void, thereby enriching the
broader landscape of speech recognition research and fos-
tering inclusivity and diversity in technology. In our current
framework, our focus lies on the Amazigh language, spoken
by millions across North Africa, predominantly in Morocco,
Algeria, Tunisia, and Libya. This language, characterized by
its diverse dialects and rich oral heritage, remains largely
underserved by speech recognition technology (Table1).
The Berber language, also referred to as Amazigh
(Chaker, 1984; Ouakrim, 1995; Ridouane, 2003; Boukous,
2014), holds the distinction of being considered the indige-
nous language of the Maghreb region by many. According to
numerous linguistic studies, it is believed to have been in use
for approximately 5000 years (Boukous, 1995). Spanning
across a vast expanse of North Africa, including countries
such as Algeria, Morocco, and Tunisia, as well as certain
areas in neighboring Sub-Saharan nations, Amazigh serves
as a means of communication (Idhssaine & El Kirat, 2021).
Notably, Morocco has granted official status to Amazigh
due to a significant proportion of its population being pro-
ficient in the language. The standardization process for the
language commenced in 2001 with the establishment of the
Royal Institute of Amazigh Culture (IRCAM). This insti-
tute has played a pivotal role in various endeavors aimed at
* Mohamed Hamidi
m.hamidi@ump.ac.ma
1 LSI Laboratory, FP Taza, USMBA University, Taza,
Morocco
2 Team ofModeling andScientific Computing, FPN, UMP,
Nador, Morocco
Content courtesy of Springer Nature, terms of use apply. Rights reserved.