Conference Paper

Detection of questions in Berber language using prosodic features

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper focuses on the acquisition of the tonal and prosodic structure of affirmative and question sentences in Berber language. The study on the prosodic differences between these two types of sentences in Berber language, the detection and classification of sentence type is the main subject of this paper. We've realized a system for segmentation and automatic detection of sentence type based on both prosodic, in Berber language, a language where all studies until now are still preliminary. To this end, we developed a corpus made of 720 utterances that were extracted from 6 Berber spoken lectures. Prosodic features are, then, extracted from each sentence. These features are used as input to two different classifiers to classify each sentence into either a question or affirmative sentence. We classified questions with an accuracy of 93%. A feature-specific analysis further reveals that energy and fundamental frequency (F0) features are mainly responsible for discriminating between question and affirmative sentences.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... For instance [8] created a Berber speaker identification system using some speech signal information as features. Also [9] have used prosodic information to discriminate between affirmative and interrogative sentences in Berber. Both works were done at the speaker level. ...
... In terms of data source distribution, for RA, the majority of the content are comments collected from popular TV-show YouTube channels (9800 documents, 49% of the data), content of blogs and forums (3600 documents, 18% of the data), news websites (2800 documents, 14 % of the data), the rest comes from Twitter (2400 documents, 12% of the data) and Facebook (1000 documents, 5% of the data). For RB, most content comes from Berber websites promoting Berber culture and language (4900 documents, 70%), YouTube (910 documents, 13%), news websites (700 9 Habash [11] suggested to breakdown Arabic dialects into five groups Egyptian, Levantine, Gulf, Iraqi and Maghrebi. 10 Berber has 13 distinguished varieties. ...
... There is some work done to identify spoken Berber. For instance Halimouche et al., (2014) discriminated between affirmative and interrogative Berber sentences using prosodic information , and Chelali et al., (2015) used speech signal information to automatically identify Berber speaker. We are not aware of any work which deals with automatic identification of written Arabicized Berber. ...
Conference Paper
Full-text available
Automatic Language Identification (ALI) is the detection of the natural language of an input text by a machine. It is the first necessary step to do any language-dependent natural language processing task. Various methods have been successfully applied to a wide range of languages, and the state-of-the-art automatic language identifiers are mainly based on character n-gram models trained on huge corpora. However, there are many languages which are not yet automatically processed, for instance minority and informal languages. Many of these languages are only spoken and do not exist in a written format. Social media platforms and new technologies have facilitated the emergence of written format for these spoken languages based on pronunciation. The latter are not well represented on the Web, commonly referred to as under-resourced languages, and the current available ALI tools fail to properly recognize them. In this paper, we revisit the problem of ALI with the focus on Arabicized Berber and dialectal Arabic short texts. We introduce new resources and evaluate the existing methods. The results show that machine learning models combined with lexicons are well suited for detecting Arabicized Berber and different Arabic varieties and distinguishing between them, giving a macro-average F-score of 92.94%.
... There has been some work done for Berber automatic language identification, for instance Chelali et al. (2015) created a Berber speaker identification system using some speech signal information as features. Also Halimouche et al. (2014) have used prosodic information to discriminate between affirmative and interrogative sentences in Berber. Both sets of work were done at the speaker level. ...
Conference Paper
Full-text available
The identification of the language of text/speech input is the first step to be able to properly do any language-dependent natural language processing. The task is called Automatic Language Identification (ALI). Being a well-studied field since early 1960’s, various methods have been applied to many standard languages. The ALI standard methods require datasets for training and use character/word-based n-gram models. However, social media and new technologies have contributed to the rise of informal and minority languages on the Web. The state-of-the-art automatic language identifiers fail to properly identify many of them. Romanized Arabic (RA) and Romanized Berber (RB) are cases of these informal languages which are under-resourced. The goal of this paper is twofold: detect RA and RB, at a document level, as separate languages and distinguish between them as they coexist in North Africa. We consider the task as a classification problem and use supervised machine learning to solve it. For both languages, character-based 5-grams combined with additional lexicons score the best, F-score of 99.75% and 97.77% for RB and RA respectively.
Article
Full-text available
Although intensity has been reported as a reliable acoustical correlate of stress, it is generally considered a weak cue in the perception of linguistic stress. In natural speech stressed syllables are produced with more vocal effort. It is known that, if a speaker produces more vocal effort, higher frequencies increase more than lower frequencies. In this study, the effects of lexical stress on intensity are examined in the abstraction from the confounding accent variation. A production study was carried out in which ten speakers produced Dutch lexical and reiterant disyllabic minimal stress pairs spoken with and without an accent in a fixed carrier sentence. Duration, overall intensity, formant frequencies, and spectral levels in four contiguous frequency bands were measured. Results revealed that intensity differences as a function of stress are mainly located above 0.5 kHz, i.e., a change in spectral balance emphasizing higher frequencies for stressed vowels. Furthermore, we showed that the intensity differences in the higher regions are caused by an increase in physiological effort rather than by shifting formant frequencies due to stress. The potential of each acoustic correlate of stress to differentiate between initial- and final-stressed words was examined by linear discriminant analysis. Duration proved the most reliable correlate of stress. Overall intensity and vowel quality are the poorest cues. Spectral balance, however, turned out to be a reliable cue, close in strength to duration.
Article
Full-text available
This paper presents an algorithm for the automatic modelling of fundamental frequency curves. The raw curves are factored into two components: a macro-prosodic component (modelled using a quadratic spline function) and a residual microprosodic component. The macroprosodic component, which is assumed to reflect the linguistic contribution of the intonation pattern, can be represented as a sequence of target points each defined by its time and f0 value. The algorithm uses a technique called asymmetric modal quadratic regression.
Article
Full-text available
We present a straightforward and robust algorithm for periodicity detection, working in the lag (autocorrelation) domain. When it is tested for periodic signals and for signals with additive noise or jitter, it proves to be several orders of magnitude more accurate than the methods commonly used for speech analysis. This makes our method capable of measuring harmonics-to-noise ratios in the lag domain with an accuracy and reliability much greater than that of any of the usual frequency-domain methods. By definition, the best candidate for the acoustic pitch period of a sound can be found from the position of the maximum of the autocorrelation function of the sound, while the degree of periodicity (the harmonics-to-noise ratio) of the sound can be found from the relative height of this maximum. However, sampling and windowing cause problems in accurately determining the position and height of the maximum. These problems have led to inaccurate timedomain and cepstral methods for p...
Book
Preface Alan Cruttenden 1. A survey of intonation systems Daniel Hirst and Albert Di Cristo 2. Intonation in American English Dwight Bolinger 3. Intonation in British English Daniel Hirst 4. Intonation in German Dafydd Gibbon 5. Intonation in Dutch Johan 't Hart 6. Intonation in Swedish Eva Garding 7. Intonation in Danish Nina Gronnum 8. Intonation in Spanish Santiago Alcoba and Julio Murillo 9. Intonation in European Portuguese Madalena Cruz-Ferreira 10. Intonation in Brazilian Portuguese Joao Antonio de Moraes 11. Intonation in French Albert Di Cristo 12. Intonation in Italian Mario Rossi 13. Intonation in Romanian Laurentia Dascalu-Jinga 14. Intonation in Russian Natalia Svetozarova 15. Intonation in Bulgarian Anastasia Misheva and Michel Nikov 16. Intonation in Greek Antonis Botinis 17. Intonation in Finnish Annti Iivonen 18. Intonation in Hungarian Ivan Fonagy 19. Intonation in Moroccan Arabic Thami Benkirane 20. Intonation in Japanese Isamu Abe 21. Intonation in Thai Sudaporn Luksaneeyanawin 22. Intonation in Vietnamese Do The Dung, Tran Thien Huong and Georges Boulakia 23. Intonation in Beijing Chinese Paul Kratochvil References Indexes.
Article
Building multiple automatic speech recognition (ASR) systems and combining their outputs using voting techniques such as ROVER is an effective technique for lowering the overall word error rate. A successful system combination approach requires the construction of multiple systems with complementary errors, or the combination will not outperform any of the individual systems. In general, this is achieved empirically, for example by building systems on differ-ent input features. In this paper, we present a systematic approach for building multiple ASR systems in which the decision tree state-tying procedure that is used to specify context-dependent acoustic models is randomized. Experiments carried out on two large vocab-ulary recognition tasks, MALACH and DARPA EARS, illustrate the effectiveness of the approach.
Article
Identifying whether an utterance is a statement, question, greeting, and so forth is integral to effective automatic understanding of natural dialog. Little is known, however, about how such dialog acts (DAs) can be automatically classified in truly natural conversation. This study asks whether current approaches, which use mainly word information, could be improved by adding prosodic information. The study examines over 1000 conversations from the Switchboard corpus. DAs were handannotated, and prosodic features (duration, pause, F0, energy and speakingrate features) were automatically extracted for each DA. In training, decision trees based on these features were inferred; trees were then applied to unseen test data to evaluate performance. For an allway classification as well as three subtasks, prosody allowed highly significant classification over chance. Featurespecific analyses further revealed that although canonical features (such as F0 for questions) were important, less obvious features could compensate if canonical features were removed. Finally, in each task, integrating the prosodic model with a DAspecific statistical language model improved performance over that of the language model alone. Results suggest that DAs are redundantly marked in natural conversation, and that a variety of automatically extractable prosodic features could aid dialog processing in speech applications.
Article
Identifying whether an utterance is a statement, question, greeting, and so forth is integral to effective automatic understanding of natural dialog. Little is known, however, about how such dialog acts (DAs) can be automatically classified in truly natural conversation. This study asks whether current approaches, which use mainly word information, could be improved by adding prosodic information. The study is based on more than 1000 conversations from the Switchboard corpus. DAs were hand-annotated, and prosodic features (duration, pause, F0, energy, and speaking rate) were automatically extracted for each DA. In training, decision trees based on these features were inferred; trees were then applied to unseen test data to evaluate performance. Performance was evaluated for prosody models alone, and after combining the prosody models with word information--either from true words or from the output of an automatic speech recognizer. For an overall classification task, as well as three subtasks, prosody made significant contributions to classification. Feature-specific analyses further revealed that although canonical features (such as F0 for questions) were important, less obvious features could compensate if canonical features were removed. Finally, in each task, integrating the prosodic model with a DA-specific statistical language model improved performance over that of the language model alone, especially for the case of recognized words. Results suggest that DAs are redundantly marked in natural conversation, and that a variety of automatically extractable prosodic features could aid dialog processing in speech applications.
Article
An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds. It is based on the well-known autocorrelation method with a number of modifications that combine to prevent errors. The algorithm has several desirable features. Error rates are about three times lower than the best competing methods, as evaluated over a database of speech recorded together with a laryngograph signal. There is no upper limit on the frequency search range, so the algorithm is suited for high-pitched voices and music. The algorithm is relatively simple and may be implemented efficiently and with low latency, and it involves few parameters that must be tuned. It is based on a signal model (periodic signal) that may be extended in several ways to handle various forms of aperiodicity that occur in particular applications. Finally, interesting parallels may be drawn with models of auditory processing.
Conference Paper
What features are helpful for Chinese question detection? Which of them are more important? What are the differences between Chinese and English regarding feature importance? We study these questions by building question detectors for Chinese and English conversational speech, and performing analytic studies and feature selection experiments. As in English, we find that both textual and prosodic features are helpful for Chinese question detection. Among textual features, word identities, especially the utterance-final word, are more useful than the global (N-gram) sentence likelihood. Unlike in English, where final pitch rise is a good cue for questions, we find in Chinese that utterance final pitch behavior is not a good feature. Instead, the most useful prosodic feature is the spectral balance, i.e., the distribution of energy over the frequency spectrum, of the final syllable. We also find effects of tone, e.g., that treating interjection words as having a special tone is helpful. Our final classifier achieves an error rate of 14.9% with respect to a 50% chance-level rate
Article
In this paper, we propose a formalism, called vector ltering of spectral trajectories, which allows to integrate under a common formalism a lot of speech parameterization approaches. We then propose a new ltering in this framework, called time-frequency principal components (TFPC) of speech. We apply this new ltering in the framework of speaker identi- cation, using a subset of the POLYCOST database. The results show an improvement of roughly 20 % compared to the use of the classical cepstral coefcients augmented by their coefcients. 1. INTRODUCTION Cepstral coefcients [8] have been widely used for decades in speech processing. Although they provide a good set of feature vectors with nice properties, like a good decorrelation of the coef cients, or their ability to decorrelate in theory the vocal source and the vocal tract ltering [8], we are convinced that they are not the ultimate solution to represent speech signals in most of the situations. To nd a good alternative t...
The intonation, the French system: description and modeling
  • rossi
Classification of speech and Non-Question Question by decision tree
  • E Mq Given
  • A Castelli
  • Boucher
  • L Besacier
Classification of speech and Non-Question Question by decision tree
  • given
Learning With Kernels: Support Vector Machines, Regularization, Optimization, and Beyond.
  • schlkopf