Figure 1 - uploaded by Lyes Demri
Content may be subject to copyright.
Recording conditions for the first 4 speakers. 

Recording conditions for the first 4 speakers. 

Source publication
Article
Full-text available
In this article we present the methodology employed for the design and evaluation of a Basic Arabic Expressive Speech corpus (BAES-DB). The corpus, which has a total length of approximately 150 minutes, is constituted of 13 speakers uttering a set of 10 sentences while simulating 3 emotions (joy, anger and sadness) in addition to a neutral utteranc...

Context in source publication

Context 1
... ceil- ing is painted concrete, and the floor is made of ceramic tiles. Figure 1 shows the equipment used and the record- ing configuration in Lab 66. The speakers were recorded late in the afternoon when the laboratory was empty and there were less people crossing the faculty hall. ...

Similar publications

Article
Full-text available
With the increase of Arabic textual information via internet websites and services, a tools for processing Arabic text information are needed to extract knowledge from them. Name Entity recognition aims to extract name entities such as: person names, locations and organizations from a given text. Name Entity recognition approaches were classified i...

Citations

... The corpus consists of 100 phonetically balanced sentences with emotionally neutral content. In 2015, L. Demri [40] selected a subset of 10 sentences from this database to create an expressive speech database. These sentences were recorded from 13 naive speakers. ...
Article
Full-text available
The present study focuses on the evaluation of the degradation of emotional expression in speech generated by a wireless telephone network. Two assessment approaches emerged: an objective one, deploying convolutional neural networks (CNNs) fed with spectrograms across three scales (Linear, Logarithmic, Mel), and a subjective method grounded in human perception. The study gathered expressive phrases in two different languages: from novice Arabic and proficient German speakers. These utterances underwent transmission on a real 4G network a rarity, as usual focus lies on bandwidth (BW) reduction or compression. Our innovation lies in utilizing the complete 4G infrastructure, accounting for all possible impairments. The results obtained indeed reveal a significant impact of transmission via the real 4G network on emotion recognition. Prior to transmission, the highest recognition rates, measured by the objective method using the Mel frequency scale, were 76% for Arabic and 91% for German. After transmission, these rates significantly decreased, reaching 70% for Arabic and 82% for German (a degradation of 6% and 9%), respectively. As for the subjective method, the recognition rates were 75% for Arabic and 70% for German before transmission and dropped to 67% for Arabic and 68% for German after transmission (a degradation of 8% and 2%). Our results were also compared to those found in the literature that used the same database.
... DB 19 [183] DB 59 [184] DB 20 [185] DB 60 [186] DB 21 [187] DB 61 [188] [189] , [190] , [191] , [192] DB 22 [193] DB 62 [194] [195] , [196] , [197] , [198] , [199] , [194] , [200] , [201] , [202] DB 23 [203] DB 63 [204] DB 24 [205] DB 64 [206] DB 25 [207] DB 65 [208] DB 26 [209] DB 66 [210] DB 27 [211] DB 67 [212] DB 28 [213] [214] DB 68 [215] DB 29 [216] [216] DB 69 [217] DB 30 [218] DB 70 [219] DB 31 [220] DB 71 [221] DB 32 [222] [223] DB 72 [224] DB 33 [225] DB 73 [226] DB 34 [227] [228] , [229] DB 74 [230] [231] , [232] DB 35 [233] DB 75 [234] DB 36 [235] DB 76 [236] DB 37 [237] [238] , [239] , [240], [241] , [31] , [242] DB 77 [243] DB 38 [244] DB 78 [245] DB 39 [246] [247] , [248] , [249] , [250] DB 79 [251] DB 40 [252] DB 80 [253] ...
... Figure 10 shows the frequency distribution of autonomous databases based on their applications, and the details are in Table 6. Speech recognition [164], [166], [170], [176], [185], [187], [207], [209], [211], [214], [216], [220], [216], [222], [223], [227], [228], [229], [238], [241], [244], [248], [252], [249], [143], [153], [155], [157], [163], [167], [169], [173], [182], [181], [184], [189], [188], [195], [196], [201], [204], [206], [190], [191], [192], [217], [219], [221], [231], [234], [230], [232], [251], [253], [215] Corpus description/evaluation [142], [149], [148], [158], [160], [162], [168], [174], [183], [205], [213], [218], [235], [31], [239], [240], [237], [246], [247], [146], [43], [44], [177], [186], [194], [194], [200] Speech Synthesis / TTS [144], [147], [145], [152], [154], [156], [161], [178], [208], [212], [245] Speech Classification [150], [233], [159], [165], [197], [199], [202] Other applications [180], [172], [198], [224] Speech segmentation [225], [151], [171], [250] language/dialect identification [193], [226], [236] Speaker recognition [242], [179] Data Augmentation [175] Emotional speech recognition [243] General purpose [210] Speaker sex/age/native identification [203] ...
... Note that the largest self-database in terms of the number of speakers did not exceed 400 speakers. [142], [144], [147], [149], [150], [152], [154], [156], [158], [160], [164], [166], [168], [170], [174], [176], [178], [180], [183], [185], [193], [203], [205], [206], [207], [209], [211], [213], [216], [218], [220], [222], [225], [227], [233], [235], [237], [244], [246], [252], [143], [146], [43], [44], [151], [153], [155], [157], [159], [163], [165], [167], [169], [171], [253] 101-200 DB 55, DB 56, DB 57, DB 58, DB 59, DB 68 [175], [177], [179], [181], [184], [215] 201-300 DB 60, DB 61, DB 62 [186], [188], [194] 301-400 DB 63 [204] No information DB [208], [210], [212], [217], [219], [221], [224], [226], [230], [234], [236], [243], [251] ...
Article
Full-text available
Speech processing applications have become integral components across various domains of modern life. The design and preparation of a reliable recognition system rely heavily on the availability of suitable speech databases. While numerous speech databases exist for English and other languages, the availability of comprehensive resources for Arabic language remains limited. In light of this, we conducted a systematic review aiming to identify, analyse, and classify existing Modern Standard Arabic speech databases. Through our review, we identified 27 publicly available databases and analysed an additional 80 subjective databases. These databases were thoroughly studied, classified based on their characteristics, and subjected to a detailed analysis of research trends in the field. This paper provides a comprehensive discussion on the diverse speech databases developed for various speech processing applications. It sheds light on the purposes and unique characteristics of Arabic speech databases, enabling researchers to easily access suitable resources for their specific applications. The findings of this review contribute to bridging the gap in available Arabic speech databases and serve as a valuable resource for researchers in the field.
... It is a corpus of Arabic sentences phonetically balanced developed by Boudraa [13]. A set of 10 sentences were recorded by 13 students of our institution in 4 different emotions (neutral, joy, anger, sadness) to build database of a speech emotion corpus, by Demri [14]. The sentences were recorded in an anechoic chamber at 16 kHz sampling frequency. ...
Conference Paper
Abstract—This work is a contribution to the enhancement of the emotional regardless of speaker recognition rate for the Arabic language, which is a poorly endowed language in this field. This problem is still relevant, especially for databases created by non-professional speakers(the emotion produced by professionals is standard whereas that produced by naive people is more like a spontaneous emotion because the emotion depends very much on the speaker). This is why a comparative study between two databases: one in Modern Standards Arabic (MSA) which was carried out by non-professionals recorders (students) and the other in German simulated by actors. The study was carried out using statistical methods (K-nearest neighbors, Support Vector Machine, Extratress, Random Forest, Gradient Boosting) as well as a Neural Network approaches which are the Convolutional Neural Network and Multi-Layer Perceptron, to identify four emotions (Anger, Happiness, Neutral and Sadness). The latter showed a clear improvement in terms of accuracy from 47.4% to 64% for MSA. Index Terms—Emotion recognition, Spectral features, Acoustic features, CNN, KNN, SVM, Extratrees, Random Forest, Gradient Boosting.
... The expressive speech corpus consists of a phonetically balanced set of 10 sentences in the Arabic language, [9]. These sentences were then recorded in 4 different emotions (neutral, joy, anger, sadness) by 13 speakers to constitute a database of expressive speech corpus [10]. The sentences were recorded at the sampling frequency of 44 kHz in an anechoic chamber. ...
Conference Paper
The general objective of this paper is to build a system in order to automatically recognize emotion in speech. The linguistic material used is a corpus of Arabic expressive sentences phonetically balanced. The dependence of the system on speaker is an encountered problem in this field; in this work we will study the influence of this phenomenon on our result. The targeted emotions are joy, sadness, anger and neutral. After an analytical study of a large number of speech acoustic parameters, we chose the cepstral parameters, their first and second derivatives, the Shimmer, the Jitter and the duration of the sentence. A classifier based on a multilayer perceptron neural network to recognize emotion on the basis of the chosen feature vector that has been developed. The recognition rate could reach more than 98% in the case of an intra-speaker classification and 54.75% in inter-speaker classification. We can see the system’s dependence on speaker clearly.
... Despite the geographical differences among the speakers, the pronunciations are not too variable, because Standard Arabic has a tendency to standardize pronunciations of speakers of different regions. [9]. For Kabyle all sentences have been integrated into meaningful dialogues, so that their pronunciation is as natural as possible. ...
Conference Paper
This paper describes the design of an automatic identification system for the distinction between two common languages in Algeria which are MSA (Modern Standard Arabic) and Kabyle which is an Algerian Berber dialect. The characteristics used for this are prosodic (melody and stress) and cepstral (Mel Frequency Cepstral Coefficients) features extracted from a bilingual database formed by combining two databases. For the classification step Support vector machines are trained for language modeling because it is a recognition of two languages which puts him in a better position to establish this task. Experiments have shown the superior reliability of our system when using both types of characteristics when compared to the use of each type separately. It was found that when using prosodic characteristics the system gives an encouraging rate of 95.42% which was improved to 95.83% when adding acoustic features and 97.5%( so an improvement of 1.67%) when combining prosodic and acoustic features which has shown the superiority of this vector over those previously cited.
... The signals used for these experiments are extracted from an Arabic database [41], these signals are sampled at 44.1 kHz, we down-sampled them to 8 kHz (to reduce the size of data for processing and memory constraints), and we added a white Gaussian noise (AWGN) at different SNRs, we obtained, hence, noisy signals that will be denoised using SS, Wiener filter, Wavelet denoising, and CS-based denoising methods. Signals are processed by segments, we have chosen a hamming window of length 25 ms. ...
Thesis
Full-text available
Cette thèse décrit nos contributions à la réalisation d‘un système de synthèse de parole expressive en langue arabe à partir du texte. L‘approche adoptée pour la synthèse est la concaténation de diphones de styles expressifs différents pour la génération des diverses expressivités. L‘algorithme TD-PSOLA (Time-Domain Pitch Synchronous Overlap-Add) permet l‘application de la prosodie à travers une analyse synthèse du signal de parole synthétisé dans un premier temps sans prosodie. Notre contribution principale est la constitution d‘un corpus de parole expressive pour la langue arabe, qui jusqu‘à 2014 était quasi inexistante dans la littérature. Ce corpus a été réalisé en vue d‘une modélisation des différents styles expressifs enregistrés. Nous décrivons la méthodologie adoptée pour l‘enregistrement du corpus de phrases sélectionnées. Un site web mis en place spécifiquement nous a permis d‘évaluer le corpus réalisé à travers 3 tests faisant intervenir des évaluateurs non-experts. Ce site web a également permis l‘évaluation des signaux de parole produits par le système de synthèse réalisé, et permet un téléchargement libre du corpus réalisé. Les résultats de l‘évaluation du corpus montrent que les styles expressifs enregistrés sont bien reconnus par les évaluateurs et que les corrélats acoustiques sont bien présents dans les signaux enregistrés. L‘évaluation du système de synthèse montre quant à elle que les auditeurs reconnaissent de manière satisfaisante les styles expressifs synthétisés par le système. Par ailleurs, une interface graphique sous l‘environnement MATLAB a été réalisée afin de rendre le système de synthèse réalisé accessible à des personnes non-expertes.