Conference Paper

Infant cry analysis and detection

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this paper we propose an algorithm for automatic detection of an infant cry. A particular application of this algorithm is the identification of a physical danger to babies, such as situations in which parents leave their children in vehicles. The proposed algorithm is based on two main stages. The first stage involves feature extraction, in which pitch related parameters, MFC (mel-frequency cepstrum) coefficients and short-time energy parameters are extracted from the signal. In the second stage, the signal is classified using the k-NN algorithm and is later verified as a cry signal, based on the pitch and harmonics information. In order to evaluate the performance of the algorithm in real world scenarios, we checked the robustness of the algorithm in the presence of several types of noise, and especially noises such as car horns and car engines that are likely to be present in vehicles. In addition, we addressed real time and low complexity demands during the development of the algorithm. In particular, we used a voice activity detector, which disabled the operation of the algorithm when voice activity was not present. A database of baby cry signals was used for performance evaluation. The results showed good performance of the proposed algorithm, even at low SNR.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... An effective cry detection algorithm was presented in the study reported in [14]. Recent infant cry classification tasks employing Gaussian Mixture Models (GMMs) as classifiers have also incorporated state-of-the-art cepstral features, such as Mel Frequency Cepstral Coefficients (MFCC) [15], [18]. ...
... Pitch and MFCC Classification [14] 2012 GMM-Based Infant Cry Classification [15] 2016 ...
... The likelihood of a feature vector given the GMM can be evaluated using eq. (14). Acoustic feature vectors in the speech literature are generally assumed to be statistically-independent [30]. ...
Article
Infant cry classification is an important area of research that involves analyzing cry to detect and classify between normal vs . pathological cries. However, signal processing based state-of-the-art feature sets, such as Short-Time Fourier Transform (STFT) representations and Mel Frequency Cepstral Coefficients (MFCC), have been earlier reported for this task. Quasi-periodic sampling of the vocal tract spectrum by high pitch source harmonics results in poor spectral resolution in the STFT and hence, these feature sets fail to produce a satisfactory classification performance. Contrary to the linearly-spaced frequency bins, this study proposes to use geometrically-spaced frequency bins employed in the CQT-based features, namely, Constant Q Cepstral Coefficients (CQCC) to systematically emphasize the required fundamental frequency (F0) and its harmonics (kF0, k ∊ Z) for infant cry classification. For a comprehensive evaluation of the proposed feature set, two datasets have been considered in this work, namely, Baby Chilanto and In-House DA-IICT datasets. The performance of the proposed CQCC feature set is compared against state-of-the-art MFCC, Linear Frequency Cepstral Coefficients (LFCC), and Cepstral feature sets. Experiments were performed using 10 -fold cross-validation on two traditional classifiers, namely, Gaussian Mixture Model (GMM) and Support Vector Machine (SVM). Our study finds that better results were obtained using CQCC-GMM architecture with classification accuracies of 99.8% and 98.24% on the Baby Chilanto and In-House DA-IICT datasets, respectively. Further, this work also illustrates the effectiveness of the form-invariance property of the CQT over the traditional narrowband STFTbased spectrogram. Furthermore, this study also presents the effect of parameter tuning and parameter dimension of the feature vector. Furthermore, this study presents the first-ever cross-database and combined dataset scenarios with an overall improvement of 1.59% on the proposed CQCC feature set. Additionally, the robustness of CQCC is evaluated under signal degradation conditions with additive babble noise having various Signal-to-Noise Ratio (SNR) levels on both datasets. Next, the performance of the proposed CQCC was compared with the other feature sets using statistical measures, such as F 1-score, J-statistics, violin plots, and analysis of latency period for the deployment of the practical system. Finally, this study compares the best obtained results of CQCC with the existing studies on the Baby Chilanto dataset.
... Our workflow is an easy and cost-effective solution to building an infant-cry database for early diagnosis by medics and automatic systems. Differently from other systems [52][53][54], it adapts to the particular noise, infants, medical operators' speech, and other sound sources present in the recording environment. The software is entirely open source. ...
... The workflow is a machine-learningbased system, mostly unsupervised, which requires a minimal annotated training set (21 s) and self-trains on the analysed audio recording. It uses acoustic features extracted from $ 100-ms segments-a much larger window than the phonetic-scale window used by other approaches [22,24,[52][53][54][55]-whose spectral structure is more robust to noise. The workflow embeds three machine learning models: an unsupervised model (cluster analysis), a minimally supervised model (a Hidden Markov Model), and a self-training model (a long short-term memory model with a self-attention layer). ...
... Our approach is innovative and cost-effective, considering the difficulty of accessing infant-cry databases and representing the large variability of infant speech through huge and expensive annotated databases. We demonstrate that our workflow performance is comparable with that of a reference supervised infant-cry detection system [52], which would require re-calibration in new operational conditions. Moreover, we use syllabic-scale acoustic features, which are more robust to noise [56][57][58] than the phonetic-scale features used by other approaches [52][53][54]. ...
Article
Full-text available
Infant cry is one of the first distinctive and informative life signals observed after birth. Neonatologists and automatic assistive systems can analyse infant cry to early-detect pathologies. These analyses extensively use reference expert-curated databases containing annotated infant-cry audio samples. However, these databases are not publicly accessible because of their sensitive data. Moreover, the recorded data can under-represent specific phenomena or the operational conditions required by other medical teams. Additionally, building these databases requires significant investments that few hospitals can afford. This paper describes an open-source workflow for infant-cry detection, which identifies audio segments containing high-quality infant-cry samples with no other overlapping audio events (e.g. machine noise or adult speech). It requires minimal training because it trains an LSTM-with-self-attention model on infant-cry samples automatically detected from the recorded audio through cluster analysis and HMM classification. The audio signal processing uses energy and intonation acoustic features from 100-ms segments to improve spectral robustness to noise. The workflow annotates the input audio with intervals containing infant-cry samples suited for populating a database for neonatological and early diagnosis studies. On 16 min of hospital phone-audio recordings, it reached sufficient infant-cry detection accuracy in 3 neonatal care environments (nursery—69%, sub-intensive—82%, intensive—77%) involving 20 infants subject to heterogeneous cry stimuli, and had substantial agreement with an expert’s annotation. Our workflow is a cost-effective solution, particularly suited for a sub-intensive care environment, scalable to monitor from one to many infants. It allows a hospital to build and populate an extensive high-quality infant-cry database with a minimal investment.
... considering the similarities and differences in classification accuracy. Since in NICU, it requires detection of the accurate reason for infant cry for immediate medical follows up. Furthermore, the study of different approaches shows that the most commonly used features by the researchers were MFCCs, pitch, LPCCs, formants (R. P. Balandong, 2013;R. Cohen, 2012), and few researchers used zero-crossing rate and short term energy as their features. The Deep neural network models were used by the researchers for many applications for classification tasks (R. P. Balandong, 2013;J. Orozco, 2003;N. Wahid, 2016;Y. Lavner, 2016). Generally, the multilayer perceptrons are popular when deep neural networ ...
... , the three-layered feed-forward neural network known as Radial Basis Function Network (RBFN) was used for increasing accuracy of classifier (N. Wahid, 2016). As shown in Table 1, the higher accuracy in classification was achieved by many researchers. But, most of the researcher has used binary classification of infant cries (R. P. Balandong, 2013;R. Cohen & Y. Lavner, 2012;S. E. Barajas-Montiel 2005), very few researchers have used an approach of multiclass classification (J. Saraswathy, 2013;N. Wahid, 2016). Hence, there is a lot of scope in the multiclass classification of reasons behind infant cry. Accuracy results can also be improved as compared with previous accuracy results achieved. The Baby Chilla ...
... MFCCs are extracted to represent the acoustic characteristics of the signal. Short term power spectrum of a signal is represented by MFCCs (R. Cohen, 2012). MFCCs are mostly used for the speaker recognition task. ...
Article
Full-text available
A Nurse Rostering Problem is a highly-constrained combinatorial optimization problem, where we assign several nurses to shifts by violating minimum constraints. Due to massive number of constraints, these problems are difficult to handle manually. The advantage of automating the task is to generate a roster having not only high quality but also more flexibility by reducing the workload, time and effort of head nurses. The PSO algorithm is extremely dependent upon settings of control parameters and balance the exploration and exploitation in search space. These problems are avoided by proposed an ES-PSO Algorithm. Highly constrained problems have huge search space to find an optimal solution, hence to cover that space and find the solution in the stipulated time necessary to increase population. However, it may take more time for particle updating and fitness evaluations. To improve execution time of the compute-intensive task, we have used OpenMP and CUDA framework. The adapted algorithm improves the outcome by minimizing penalty and reduces stuff of compute-intensive tasks.
... considering the similarities and differences in classification accuracy. Since in NICU, it requires detection of the accurate reason for infant cry for immediate medical follows up. Furthermore, the study of different approaches shows that the most commonly used features by the researchers were MFCCs, pitch, LPCCs, formants (R. P. Balandong, 2013;R. Cohen, 2012), and few researchers used zero-crossing rate and short term energy as their features. The Deep neural network models were used by the researchers for many applications for classification tasks (R. P. Balandong, 2013;J. Orozco, 2003;N. Wahid, 2016;Y. Lavner, 2016). Generally, the multilayer perceptrons are popular when deep neural networ ...
... , the three-layered feed-forward neural network known as Radial Basis Function Network (RBFN) was used for increasing accuracy of classifier (N. Wahid, 2016). As shown in Table 1, the higher accuracy in classification was achieved by many researchers. But, most of the researcher has used binary classification of infant cries (R. P. Balandong, 2013;R. Cohen & Y. Lavner, 2012;S. E. Barajas-Montiel 2005), very few researchers have used an approach of multiclass classification (J. Saraswathy, 2013;N. Wahid, 2016). Hence, there is a lot of scope in the multiclass classification of reasons behind infant cry. Accuracy results can also be improved as compared with previous accuracy results achieved. The Baby Chilla ...
... MFCCs are extracted to represent the acoustic characteristics of the signal. Short term power spectrum of a signal is represented by MFCCs (R. Cohen, 2012). MFCCs are mostly used for the speaker recognition task. ...
Article
Full-text available
The infants admitted in the Neonatal Intensive Care Unit (NICU) always need a Hygienic environment and round the clock observations. Infants or the just born babies always express their physical and emotional needs through cry. Thus, the detection of the reasons behind the infant cry plays a vital role in monitoring the health of the babies in the NICU. In this paper, we have proposed a novel approach for detecting the reasons for Infant's cry. In the proposed approach the cry signal of the infant is captured and from this signal, the unique set of features are extracted using MFCCs, LPCCs, and Pitch. This set of features is used to differentiates the patters signals to recognize the reasons for the cry. The reasons for cry such as hunger, pain, sleep, and discomfort are used to represent different classes. The Neural Network Multilayer classifier is designed to recognize the reasons for the cry using the standard dataset of infant cry. The proposed classifier can achieve accuracy of 93.24% from the combined features of MFCCs, LPCCs and Pitch using
... VAD also faces the challenge of separating the cry and noise. Pan et al. uses it to detect the presence or absence of baby cry in a noisy environment to improve the overall baby cry recognition rate [56] and it is used to detect the sections of the audio with sufficient audio activity [57]. In [41], authors implemented a basic VAD algorithm, which uses short-time features of audio frames and a decision strategy for determining sound and silence frames. ...
... It is a cepstral representation of the audio signals. Researchers use it to test proposed approaches [17,29,49,52,57,[60][61][62] and often use it for baseline experiments [13,15,22,31,37,63]. Liu et al. used MFCC along with two other cepstral features Linear Prediction Cepstral Coefficients (LPCC) and Bark Frequency Cepstral Coefficients (BFCC) for infant cry reason classification. ...
... Since the amplitude of an audio signal varies with time, the short-time energy can serve to differentiate voiced and unvoiced segments. It is used in [20,57,70] for infant cry detection and classification. Torres et al. used voiced-unvoiced counter, which counts all frames having a significant periodic content, as one of the features for cry detection [27]. ...
Article
Full-text available
This paper reviews recent research works in infant cry signal analysis and classification tasks. A broad range of literatures are reviewed mainly from the aspects of data acquisition, cross domain signal processing techniques, and machine learning classification methods. We introduce pre-processing approaches and describe a diversity of features such as MFCC, spectrogram, and fundamental frequency, etc. Both acoustic features and prosodic features extracted from different domains can discriminate frame-based signals from one another and can be used to train machine learning classifiers. Together with traditional machine learning classifiers such as KNN, SVM, and GMM, newly developed neural network architectures such as CNN and RNN are applied in infant cry research. We present some significant experimental results on pathological cry identification, cry reason classification, and cry sound detection with some typical databases. This survey systematically studies the previous research in all relevant areas of infant cry and provides an insight on the current cutting-edge works in infant cry signal analysis and classification. We also propose future research directions in data processing, feature extraction, and neural network classification fields to better understand, interpret, and process infant cry signals.
... Past work [3], [4] has analyzed infant cries to evaluate affect from segments of audio data. Additionally, video-based methods [5], [6] have extracted facial landmarks to classify infant affect from individual frames. ...
... Manual coding is labor intensive, posing a scalability challenge; therefore, researchers have begun exploring automated approaches for recognizing infant affect. Cohen and Lavner [4] used voice activity detection and k-nearest neighbors to detect infant cries. Model performance was evaluated based on the fraction of 1-second and 10second segments that were accurately classified. ...
Preprint
Agents must monitor their partners' affective states continuously in order to understand and engage in social interactions. However, methods for evaluating affect recognition do not account for changes in classification performance that may occur during occlusions or transitions between affective states. This paper addresses temporal patterns in affect classification performance in the context of an infant-robot interaction, where infants' affective states contribute to their ability to participate in a therapeutic leg movement activity. To support robustness to facial occlusions in video recordings, we trained infant affect recognition classifiers using both facial and body features. Next, we conducted an in-depth analysis of our best-performing models to evaluate how performance changed over time as the models encountered missing data and changing infant affect. During time windows when features were extracted with high confidence, a unimodal model trained on facial features achieved the same optimal performance as multimodal models trained on both facial and body features. However, multimodal models outperformed unimodal models when evaluated on the entire dataset. Additionally, model performance was weakest when predicting an affective state transition and improved after multiple predictions of the same affective state. These findings emphasize the benefits of incorporating body features in continuous affect recognition for infants. Our work highlights the importance of evaluating variability in model performance both over time and in the presence of missing data when applying affect recognition to social interactions.
... With the development of machine learning algorithms, k-nearest Neighbor (kNN), support vector machine (SVM), random forest (RF) and neural network (NN) [7][8][9][10] have been applied to acoustic event recognition. An automatic system is developed to detect the infant cry in car [11], which is classified by kNN with high realtime and low complexity. Problematically, the k-NN have high requirement on data and would perform well only if the data samples are similar to each other. ...
... There has been a great volume of work on AED for many years, using a myriad of techniques and features. The works in [11] used pitch parameters, STE and MFCCs as features to automatically identify and detect baby cry, which proved the effectiveness of the method. However, STE can only provide a representation of the change in amplitude, and cannot characterize the non-stationary property of the acoustic signal. ...
Article
Full-text available
This paper presents a feature extraction approach for surveillance system aimed at achieving the automatic detection and recognition of public security events. The proposed approach first generates a Gabor dictionary based on the human auditory critical frequency bands, and then uses the orthogonal matching pursuit (OMP) algorithm to sparse abnormal audio signal. We select the optimal several important atoms from the Gabor dictionary and extract the scale, frequency, and translation parameters of the atoms to form the OMP feature. The performance of OMP feature is compared with traditional acoustic features and their joint features, using support vector machine (SVM) and random forest (RF) classifiers. Experiments have been performed to evaluate the effectiveness of the OMP feature for supplementing traditional acoustic features. The results show the superior performance classifier for abnormal acoustic event detection (AAED) is RF. Furthermore, the introduction of the combined features addresses the problems of low recognition accuracy and poor robustness for the surveillance system in practical applications.
... The audio signal was divided into sections of 100 milliseconds and a set of audio features, known to distinguish between different types of audio signals, was computed from each segment. The features include the Mel-Frequency Cepstrum coefficients (Quatieri, 2002), the fundamental frequency of the signal (pitch), harmonicity factor (Cohen & Lavner, 2012), harmonic-to-average power ratio (Cohen & Lavner, 2012), short-time energy, and zero-crossing rate. These features provide temporal and frequency measures that are useful for the detection of cry signals. ...
... The audio signal was divided into sections of 100 milliseconds and a set of audio features, known to distinguish between different types of audio signals, was computed from each segment. The features include the Mel-Frequency Cepstrum coefficients (Quatieri, 2002), the fundamental frequency of the signal (pitch), harmonicity factor (Cohen & Lavner, 2012), harmonic-to-average power ratio (Cohen & Lavner, 2012), short-time energy, and zero-crossing rate. These features provide temporal and frequency measures that are useful for the detection of cry signals. ...
Preprint
Full-text available
Psychological science is in a transitional period: Many findings do not replicate and theories appear not as robust as previously presumed. We suspect that a main reason for theories not appearing as robust is because they are too simple. In this paper, we provide an important step towards this transition in the field of interpersonal relationship research by 1) providing an overarching theoretical framework grounded in existing relationship science, and 2) outlining a novel approach - mobile social physiology – that relies on intelligent technologies like wearable sensors, actuators, and modern analytical methods. At the core of our theoretical principles is co-regulation (one partner’s [statistical] co-dependency on the other partner). Co-regulation has long existed in the literature, but has to date been largely untested. To test the outlined principles, we 3) present a newly programmed app – the Bio-App for Bonding (available on GitHub: https://github.com/co-relab/bioapp). By providing a paradigm shift for relationship research, the field can not only increase the accuracy of measurement and the generalizability of findings, it also allows for moving from the lab to real life situations. We discuss how the mobile social physiology approach is rooted in existing theoretical principles (e.g., Social Baseline and Attachment Theory), extends the concept of co-regulation to allow for specific measurements, and provides a research agenda to develop a model of interpersonal relationships that we hope will stand the test of time.
... The crying of the infant is a common phenomenon and probably is one of the most difficult problems which babysitter have to face when taking care of a baby. Currently, there are many monitoring solutions to detect the crying of infants, such as wireless video camera systems, wireless audio microphone systems, etc [1], [2], [3]. Among them, the most preferred solution is the wireless audio microphone system designed by Lavner [3]. ...
... To solve this problem, Rami [2] proposed a sound-based infants' crying detection system which firstly introduce the method of sound event detection (SED) in detecting infants' crying. In this design, microphones are used to collect infants' sounds and KNN classifier on the server is used to identify infants' crying. ...
Chapter
Full-text available
Infant crying is a main trouble for baby caring in homes. Without an effective monitoring technology, a babysitter may need to stay with the baby all day long. One of the solutions is to design an intelligent system which is able to detect the sound of infant crying automatically. For this purpose, we present a novel infant crying detection system (AICDS in short), which is designed in the client-server framework. In the client side, a robot prototype bought in the market is installed beside the baby carriage, which is equipped a small microphone array to capture sound signals and transmit it to the cloud server with a Wi-Fi module. In the cloud server side, a lightweight convolution neural network model is proposed to identify infant crying or non-infant crying event. Experiments show that our AICDS achieves 86% infant crying detection accuracy, which is valuable to reduce the workload of the babysitters.
... Typically, cry signal detection is done by taking distinct features from different audio signal segments. These include spectral and temporal characteristics in addition to pitch and formants, such as short-time energy, Mel-frequency cepstrum (MFC) coefficients, and others [4]- [6]. An example of an infant cry signal waveform is shown in figure 1. ...
... Recent studies have also highlighted the centrality of these features in contexts where prosody is the primary information source, such as infant cry detection and classification [137,138]. The highly prosodic nature of infant cry indeed makes syllablescale acoustic features central for these tasks, especially when extracted through deep learning models [99], and allows interpreting a newborn's psychological and clinical status [139][140][141]. Finally, another field of application of syllabic-scale features is the improvement of ASR robustness to adversarial attacks (e.g. ...
Article
Full-text available
Automatic speech recognition systems based on end-to-end models (E2E-ASRs) can achieve comparable performance to conventional ASR systems while reproducing all their essential parts automatically, from speech units to the language model. However, they hide the underlying perceptual processes modelled, if any, and they have lower adaptability to multiple application contexts, and, furthermore, they require powerful hardware and an extensive amount of training data. Model-explainability techniques can explore the internal dynamics of these ASR systems and possibly understand and explain the processes conducting to their decisions and outputs. Understanding these processes can help enhance ASR performance and reduce the required training data and hardware significantly. In this paper, we probe the internal dynamics of three E2E-ASRs pre-trained for English by building an acoustic-syllable boundary detector for Italian and Spanish based on the E2E-ASRs’ internal encoding layer outputs. We demonstrate that the shallower E2E-ASR layers spontaneously form a rhythmic component correlated with prominent syllables, central in human speech processing. This finding highlights a parallel between the analysed E2E-ASRs and human speech recognition. Our results contribute to the body of knowledge by providing a human-explainable insight into behaviours encoded in popular E2E-ASR systems.
... Pitch and formants also contain spectral and temporal properties like short-time energy, Mel-frequency cepstrum (MFC) coefficients, and others. [24][25][26] An example of an infant cry signal waveform is shown in Figure 1. ...
Article
Full-text available
Infants are vulnerable to several health problems and cannot express their needs clearly. Whenever they are in a state of urgency and require immediate attention, they cry, which is a form of communication for them. Therefore, the parents of the infants always need to be alert and keep continuous supervision of their infants. However, parents cannot monitor their infants all the time. An infant monitoring system could be a possible solution to monitor the infants, determine when the infants are crying, and notify the parents immediately. Although many such systems are available, most cannot detect infant cries. Some systems have infant cry detection mechanisms, but those mechanisms are not very accurate in detecting infant cries because the mechanisms either include obsolete approaches or machine learning (ML) models that cannot identify infant cries from noisy household settings. To address this limitation, in this research, different conventional and hybrid ML models were developed and analyzed in detail to find out the best model for detecting infant cries in a household setting. A stacked classifier is proposed using different state‐of‐the‐art technologies, outperforming all other developed models. The proposed CNN‐SCNet's (CNN‐Stacked Classifier Network) precision, recall, and f1‐score were found to be 98.72%, 98.05%, and 98.39%, respectively. Infant monitoring systems can use this classifier to detect infant cries in noisy household settings.
... Rami Cohen et al. [8] proposed an algorithm that can automatically detect infant's cry during physical dangers like in a situation where parents leave their children in vehicles. The model is divided into 2 stages where primarily the feature extraction is done using MFCC and short-time energy parameters. ...
Conference Paper
Full-text available
Cry is the only medium where infants can communicate their pain to the world. They try to express various emotions and feelings through cry, where sadness, grief, pain, and loneliness are few of them to mention. Parents often make an effort to understand their baby's pain and attempt to make them cheerful back again. Young parents who are new to parenting have less experience in understanding their infant's cry. Due to their hectic schedule, they are often busy which leaves them less amount of time to spend with their kids. Due to this situation, these parents often feel irritated or frustrated with long period of cry. Lack of understanding the reason makes them feel confused and helpless to stop them from crying and bring back to normal condition. Therefore, a solution that can predict why a baby is crying by using modern machine learning techniques is the main aim of this paper. We intend to build a system that can capture the baby's cry in the form of audio. By performing audio pre-processing methods and feature extraction using Fast Fourier Transform (FFT), the CNN model is trained which obtained an accuracy of 86.4%. Later, the Random Forest Classifier is used to classify the baby's cry and predict reasons for the same. The model generates 7 types of outputs namely: hunger, sleep, scared, temperature, burp, lonely and discomfort. Finally, to make the system interactive we built a visual interface that can help parents to record their baby cry in real time and get the reason for the same using the intended mobile application.
... Besides pain classification, MFCC has also been used to detect cries using audio features. 9 This research divided an audio sequence into segments with 10 seconds duration. Instead of using infant voices, some studies also used facial expression features. ...
Article
Background: Babies cannot communicate their pain properly. Several pain scores are developed, but they are subjective and have high variability inter-observer agreement. The aim of this study was to construct models that use both facial expression and infant voice in classifying pain levels and cry detection. Methods: The study included a total of 23 infants below 12-months who were treated at Dr Soetomo General Hospital. The the Face Leg Activity Cry and Consolability (FLACC) pain scale and recordings of the baby's cries were taken in the video format. A machine-learning-based system was created to detect infant cries and pain levels. Spectrograms with the Short-Time Fourier Transform were used to convert the audio data into a time-frequency representation. Facial features combined with voice features extracted by using the Deep Learning Autoencoders was used for the classification of infant pain levels. Two types of autoencoders: Convolutional Autoencoder and Variational Autoencoder were used for both faces and voices. Result: The goal of the autoencoder was to produce a latent-vector with much smaller dimensions that was still able to recreate the data with minor losses. From the latent-vectors, a multimodal data representation for Convolutional Neural Network (CNN) was used for producing a relatively high F1 score, higher than single data modal such as the voice or facial expressions alone. Two major parts of the experiment were: 1. Building the three autoencoder models, which were autoencoder for the infant’s face, amplitude spectrogram, and dB-scaled spectrogram of infant’s voices. 2. Utilising the latent-vector result from the autoencoders to build the cry detection and pain classification models. Conclusion: In this paper, four pain classifier models with a relatively good F1 score was developed. These models were combined by using ensemble methods to improve performance, which resulted in a better F1 score.
... The baby communicates its needs by emanating different types of "cries (sounds)". This needs to be accurately decoded in order to understand the requirement properly [1]. This poses a big challenge to the young parents and this problem gets magnified manyfold if in addition, the young parent / caregiver has hearing impairments. ...
... Notwithstanding these concerns, prior work provides some insight into effective approaches for cry detection [4]. Most approaches use machine learning models with acoustic input features including fundamental frequency (F0) and mel-frequency cepstrum coefficients (MFCCs) [15] and convolutional neural network models (CNNs) [7]. In direct comparisons, CNNs yield better results than classic ML approaches [16]. ...
Conference Paper
Full-text available
Most existing cry detection models have been tested with data collected in controlled settings. Thus, the extent to which they generalize to noisy and lived environments is unclear. In this paper, we evaluate several established machine learning approaches including a model leveraging both deep spectrum and acoustic features. This model was able to recognize crying events with F1 score 0.613 (Precision: 0.672, Recall: 0.552), showing improved external validity over existing methods at cry detection in everyday real-world settings. As part of our evaluation, we collect and annotate a novel dataset of infant crying compiled from over 780 hours of labeled real-world audio data, captured via recorders worn by infants in their homes, which we make publicly available. Our findings confirm that a cry detection model trained on in-lab data underperforms when presented with real-world data (in-lab test F1: 0.656, real-world test F1: 0.236), highlighting the value of our new dataset and model.
... Although the feasibility of audio-based infant cry detection has been investigated, there are still limitations. For instance, data used in those previous studies were either recorded in a controlled environment with a fixed microphone placement [6], manually selected with a good balance between cry and non-cry sounds [8], [10], or having relatively short recordings [9], [11]. Algorithms developed on such data are likely impractical for long-term home monitoring with the presence of other baby voices (such as moaning or whining, coughing, and laughing) and various sounds (caused by, e.g., human talking, music, and car engine), as well as with different microphone placements depending on the layout of the baby room and the own preference. ...
Article
Cry is an important signal in early infancy for parents to understand needs of their baby and thereby to provide timely parenting/soothing or to be reassured. Thanks to the recent advancement of signal processing, deep learning, and internet-of-things technologies, smart baby monitors with a microphone and/or a video camera have attracted a lot of attention to be used in a baby room to assist parental activities. In this paper, we propose a two-step approach to detect infant cries automatically with continuous audio signals. We first identify and remove the segments without clear sounds (background noise) using a volume-based thresholding algorithm, followed by convolutional neural network (CNN) models to further detect infant cries. The CNN operates on the log linear-scale filterbank energies of audio signals to extract features for cry detection. In this study, a large set of audio data (151.8 hours) collected from five infants in home settings were included. Our proposed approach achieved a mean accuracy of 98.6% in identifying background noise (with only 2 out of 3209 cry segments missed) and a mean accuracy of 92.2% in detecting cries from other non-background sounds.
... More recent works are based on machine-learning methods that learn to identify cry signals directly from data. Melfrequency cepstral coefficients (MFCCs) and k-nearest neighbors have been used in [16] to classify cry and non-cry units and to alert parents when infants are being left alone (either in apartments or vehicles). Abou-Abbas et al. [17] proposed Hidden Markov Models (HMMs) to detect and classify the inspiratory and expiratory phases of the cry. ...
... An infant's cry is a reflection of the child's physiological state and emotional well-being. It differs in different situations and an experienced person can draw useful inferences from it[33]. Naturally, speech related signals have attracted the largest part of the attention of the scientific community as they are our most common tool for communication. ...
... Baby's cry can be characterized according to its natural periodic tone and the change of voice. It has a base frequency (pitch) in range 250Hz to 600Hz [3]. This study of sound recognition has two main processes, the first process is feature extraction and the second process is classification or determining the sound pattern [4][5] [6] [7]. ...
Conference Paper
Full-text available
Cry is a form of communication for children to express their feeling. Baby's cry can be characterized according to its natural periodic tone and the change of voice. It has a base frequency (pitch) in range 250Hz to 600Hz. Through their baby's cries detection, parents can monitor their baby remotely only in important condition. This study of sound recognition has two main processes, the first process is feature extraction and the second process is classification or determining the sound pattern. In the Linear Frequency Cepstral Coefficient (LFCC) method, the analysis of changes in pre-emphasis, numbers of filter bank and numbers of cepstral are conducted. The selection of the filter bank value which applied must be greater than the cepstral value which applied. Cepstral values is adjusted to get the better accuracy. The highest percentage of accuracy is 90% when this system uses 8 as the cepstral value and 3 as the nearest neighbor value, and all rules are considered the best value based on the test results. The use of LFCC as feature extraction method and K-Nearest Neighbor (K-NN) classification can be implemented to detect the baby is crying or not so that it can be applied as a solution for parents to monitor their children remotely only in certain condition.
... The acoustic features of an infant's cry are of great importance in this field of signal processing. This acoustic signal contains valuable information about their physical, emotional and psychological condition, such as health, identity, gender and emotions according to Cohen and Lavner [6]. These information was used in developing models for infant cries using supervised learning algorithm. ...
Conference Paper
Full-text available
Infants cry to express their emotional, psychological and physiological states. The research paper investigates if cepstral and prosodic audio features are enough to classify the infants’ physiological states such as hunger, pain and discomfort. Dataset from our previous paper was used to train the classification algorithm. The results showed that the audio features could classify an infant’s physiological state. We used three classification algorithms, Decision Tree (J48), Neural Network and Support Vector Machine in developing the infant physiological model. To evaluate the performance of the infant physiological state model, Precision, Recall and F-measure were used as performance metrics. Comparison of the cepstral and prosodic audio feature is presented in the paper. Our findings revealed that Decision Tree and Multilayer Perceptron performed better both for cepstral and prosodic feature. It is noted the cepstral feature yielded better result compare with prosodic feature for the given dataset with correctly classified instances ranging from 87.64% to 90.80 with an overall kappa statistic ranging from 0.47–0.64 using cepstral feature.
... Several works in the literature addressed this task [10], [11], and more recently machine learning methods have been proposed [2]- [4]. Among them, Cohen and Lavner [12] proposed an algorithm based on knearest neighbors to classify each frame as cry or non-cry for alerting parents when infants are being left alone in closed apartments or vehicles. Several acoustic features have been used, such as the fundamental frequency, mel-frequency cepstral coefficients (MFCCs) [13], among others. ...
Conference Paper
The amount of time an infant cries in a day helps the medical staff in the evaluation of his/her health conditions. Extracting this information requires a cry detection algorithm able to operate in environments with challenging acoustic conditions, since multiple noise sources, such as interferent cries, medical equipments, and persons may be present. This paper proposes an algorithm for detecting infant cries in such environments. The proposed solution is a multiple stage detection algorithm: the first stage is composed of an eight-channel filter-and-sum beamformer, followed by an Optimally Modified Log-Spectral Amplitude estimator (OMLSA) post-filter for reducing the effect of interferences. The second stage is the Deep Neural Network (DNN) based cry detector, having audio Log-Mel features as inputs. A synthetic dataset mimicking a real neonatal hospital scenario has been created for training the network and evaluating the performance. Additionally, a dataset containing cries acquired in a real neonatology department has been used for assessing the performance in a real scenario. The algorithm has been compared to a popular approach for voice activity detection based on Long-Term Spectral Divergence, and the results show that the proposed solution achieves superior detection performance both on synthetic data and on real data.
... In the literature, detection of cry signals is commonly followed through extraction of features from recorded audio segments. These include pitch and formants or other spectral features such as short-time energy, MFCCs and others [15]. In the second stage, the signal is mainly classified using the traditional algorithms such as nearest neighbor or support vector machines (SVM) [16]. ...
... The children with severe hearing or voice articulatory impairment have an obvious delay of language acquisition capability [9]. Apart from the groups at risk, analysing the mood-related infant vocalisation also helps for tracking the children's daily state (e. g., comfortability, pain degree, environment sensitivity) variation, and assists the caregivers for their judgement [10,11,12]. Despite the necessity for infant vocalisation analysis, the conventional analysis process is quite laborious and time-consuming, since human annotators or even linguistic experts are required to manually track the information of interests from daylong recordings [7]. ...
Conference Paper
Full-text available
Infant vocalisation analysis plays an important role in the study of the development of pre-speech capability of infants, while machine-based approaches nowadays emerge with an aim to advance such an analysis. However, conventional machine learning techniques require heavy feature-engineering and refined architecture designing. In this paper, we present an evolving learning framework to automate the design of neural network structures for infant vocalisation analysis. In contrast to manually searching by trial and error, we aim to automate the search process in a given space with less interference. This framework consists of a controller and its child networks, where the child networks are built according to the controller’s estimation. When applying the framework to the Interspeech 2018 Computational Paralinguistics (ComParE) Crying Subchallenge, we discover several deep recurrent neural network structures, which are able to deliver competitive results to the best ComParE baseline method.
... On the classifier side, baby cry studies (including our own previous research [17,18]) use only a handful of classifiers at a time [14,19]. ...
... Artificial Neural Networks (ANN), hybrid networks, statistical classifiers and others with respective predictive results [87][88][89][90][91][92][93][94][95][96][97][98][99][100]. In fact, there has been no agreement on which classifier is the most suitable for infant cry classification. ...
Article
Automatic infant cry classification is one of the crucial studies under biomedical engineering scope, adopting the medical and engineering techniques for the classification of diverse physical and physiological conditions of the infants by their cry signal. Subsequently, plentiful studies have executed and issued, broadened the potential application of cry analyses. As yet, there is no ultimate literature documentation composed by performing a longitudinal study, emphasizing on the boast trend of automatic classification of infant cry. A review of literature is performed using the key words “infant cry” AND “automatic classification” from different online resources, regardless of the year of published in order to produce a comprehensive review. Review papers were excluded. Results of search reported about more than 300 papers and after some exclusion 101 papers were selected. This review endeavors at reporting an overview about recent advances and developments in the field of automated infant cry classification, specifically focusing on the developed infant cry databases and approaches involved in signal processing and recognition phases. Eventually, this article was accomplished with some possible implications which may lead for development of an advanced automated cry based classification systems for real time applications.
Article
Ever increasing global birth rate, increased working women ratio and innovative technologies motivate the engineering society to introduce novel ideas for the baby caring. Considering this, we present hardware implementation of an automatic cradle for smart and remote monitoring. In contrast to previous implementations, we present the design insights such as swing angle, required force and torque, arc length and speed of the cradle for the first time in the literature. Furthermore, to make the cradle more comfortable and smooth swing, we introduce a novel Slider Crank Mechanism to control the cradle swing for the first time in the literature. This provides flexibility, and improved energy efficiency, while maintaining controlled and predictable reciprocating motion. The proposed solution is equipped with a number of technological features, that is, cry detection based swing, collision avoidance, WiFi based remote video monitoring, wet detection and baby health monitoring system.
Chapter
Rehabilitation devices like exoskeletons may be useful to promote walking for children with cerebral palsy (CP). However, exoskeletons can generate discomfort to the children. The contribution of this paper is to propose a protocol adapted to children to simulate the wearing of an exoskeleton, and to develop a stress detection model for healthy children. This is a first step towards discomfort detection on CP children. Fifteen healthy children participated in three test conditions (fear of falling, fear of novelty, physical discomfort) through two independent measurement campaigns (2021 and 2022). Six children from 2021 were used on a machine learning model as training set and nine from 2022 constituted the test set. Only the fear of falling condition generated stress effectively on the children. Linear regression was used to detect stress with two heart rate features. The model achieved a balanced accuracy of 84.44% when applied on the test set.
Article
Background: Babies cannot communicate their pain properly. Several pain scores are developed, but they are subjective and have high variability inter-observer agreement. The aim of this study was to construct models that use both facial expression and infant voice in classifying pain levels and cry detection. Methods: The study included a total of 23 infants below 12-months who were treated at Dr Soetomo General Hospital. The the Face Leg Activity Cry and Consolability (FLACC) pain scale and recordings of the baby's cries were taken in the video format. A machine-learning-based system was created to detect infant cries and pain levels. Spectrograms with the Short-Time Fourier Transform were used to convert the audio data into a time-frequency representation. Facial features combined with voice features extracted by using the Deep Learning Autoencoders was used for the classification of infant pain levels. Two types of autoencoders: Convolutional Autoencoder and Variational Autoencoder were used for both faces and voices. Result: The goal of the autoencoder was to produce a latent-vector with much smaller dimensions that was still able to recreate the data with minor losses. From the latent-vectors, a multimodal data representation for Convolutional Neural Network (CNN) was used for producing a relatively high F1 score, higher than single data modal such as the voice or facial expressions alone. Two major parts of the experiment were: 1. Building the three autoencoder models, which were autoencoder for the infant’s face, amplitude spectrogram, and dB-scaled spectrogram of infant’s voices. 2. Utilising the latent-vector result from the autoencoders to build the cry detection and pain classification models. Conclusion: In this paper, four pain classifier models with a relatively good F1 score was developed. These models were combined by using ensemble methods to improve performance, which resulted in a better F1 score.
Article
Audio signals are temporally-structured data, and learning their discriminative representations containing temporal information is crucial for the audio classification. In this work, we propose an audio representation learning method with a hierarchical pyra-mid structure called pyramidal temporal pooling (PTP) which aims to capture the temporal information of an entire audio sample. By stacking a global temporal pooling layer on multiple local temporal pooling layers, the PTP can capture the high-level temporal dynamics of the input feature sequence in an unsuper-vised way. Furthermore, in the top global temporal pooling layer, we jointly optimize a learnable discriminative mapping (DM) and a softmax classifier. Such that, a joint learning method for the discriminative audio representations and the classifier called DM-PTP is also presented. By treating the temporal encoding as a low-level constraint of a bi-level optimization problem, the DM-PTP can produce the discriminative representation while maintaining the temporal information of the whole sequence. For an audio sample with an arbitrary time duration, both our PTP and DM-PTP can encode the input feature sequence with arbi-trary length into a fixed-length representation. Without using any data augmentation and ensemble learning methods, both PTP and DM-PTP outperform the state-of-the-art CNNs on the audio event recognition (AER) dataset, and can achieve comparable performance on the DCASE 2018 acoustic scene classification (ASC) dataset compared with other best models in the challenge.
Preprint
In this chapter, we compare deep learning and classical approaches for detection of baby cry sounds in various domestic environments under challenging signal-to-noise ratio conditions. Automatic cry detection has applications in commercial products (such as baby remote monitors) as well as in medical and psycho-social research. We design and evaluate several convolutional neural network (CNN) architectures for baby cry detection, and compare their performance to that of classical machine-learning approaches, such as logistic regression and support vector machines. In addition to feed-forward CNNs, we analyze the performance of recurrent neural network (RNN) architectures, which are able to capture temporal behavior of acoustic events. We show that by carefully designing CNN architectures with specialized non-symmetric kernels, better results are obtained compared to common CNN architectures.
Chapter
Full-text available
In this chapter, we compare deep learning and classical approaches for detection of baby cry sounds in various domestic environments under challenging signal-to-noise ratio conditions. Automatic cry detection has applications in commercial products (such as baby remote monitors) as well as in medical and psycho-social research. We design and evaluate several convolutional neural network (CNN) architectures for baby cry detection, and compare their performance to that of classical machine-learning approaches, such as logistic regression and support vector machines. In addition to feed-forward CNNs, we analyze the performance of recurrent neural network (RNN) architectures, which are able to capture temporal behavior of acoustic events. We show that by carefully designing CNN architectures with specialized non-symmetric kernels, better results are obtained compared to common CNN architectures.
Chapter
Crying is an infant behavior, a part of behavioral system in human which assures persuade of the helpless neonate by eliciting others to meet their basic needs. It is one of the way of communications and a positive sign of healthy life for the infant. The reasons involved for infant’s cry includes hungry, unhappy, discomfort, sadness, stomach pain, has colic or any other diseased conditions. The health of new born babies are effectively identified by the analysis of infant cry. Researchers made a huge analysis of infants by using methods like spectrography, melody shape method, and inverse filtering etc. The paper proposes a procedure to detect the emotion of infant cry by using Feature Extraction techniques including Mel-frequency and Linear predictive coding methods. A statistical tool is used to compare the efficiency of the two techniques (Mel-frequency and linear predictive coding). Present work is carried out mainly for five reasons which includes infant crying, has colic, hungry, sad, stomach pain, unhappy.
Article
Full-text available
Cry detection is an important facility in both residential and public environments, which can answer to different needs of both private and professional users. In this paper, we investigate the problem of cry detection in professional environments, such as Neonatal Intensive Care Units (NICUs). The aim of our work is to propose a cry detection method based on deep neural networks (DNNs) and also to evaluate whether a properly designed synthetic dataset can replace on-field acquired data for training the DNN-based cry detector. In this way, a massive data collection campaign in NICUs can be avoided, and the cry detector can be easily retargeted to different NICUs. The paper presents different solutions based on single-channel and multi-channel DNNs. The experimental evaluation is conducted on the synthetic dataset created by simulating the acoustic scene of a real NICU, and on a real dataset containing audio acquired on the same NICU. The evaluation revealed that using real data in the training phase allows achieving the overall highest performance, with an Area Under Precision-Recall Curve (PRC-AUC) equal to 87.28%, when signals are processed with a beamformer and a post-filter and a single-channel DNN is used. The same method, however, reduces the performance to 70.61% when training is performed on the synthetic dataset. On the contrary, under the same conditions, the new single-channel architecture introduced in this paper achieves the highest performance with a PRC-AUC equal to 80.48%, thus proving that the acoustic scene simulation strategy can be used to train a cry detection method with positive results.
Chapter
The infant cry is the only means of communication of a baby and carries information about its physical and mental state. The analysis of the acoustic infant cry waveform opens the possibility of extracting this information, useful in supporting the diagnosis of pathologies since the first days of birth.
Article
Our voices are distinct from one another's and they can define who we are. Here, we look at the development of children's voices during the pre-school years and how we can keep them limber.
Book
Full-text available
Pitch extraction (also called fundamental frequency estimation) has been a popular topic in many fields of research since the age of computers. Yet in the course of some 50 years of study, current techniques are still not to a desired level of accuracy and robustness. When presented with a single clean pitched signal, most techniques do well, but when the signal is noisy, or when there are multiple pitch streams, many current pitch algorithms still fail to perform well. This report presents a discussion of the history of pitch detection techniques, as well as a survey of the current state of the art in pitch detection technology.
Conference Paper
Full-text available
We present results on applying a novel machine learning approach for learning auditory moods in natural environments [1] to the problem of detecting crying episodes in preschool classrooms. The resulting system achieved levels of performance approaching that of human coders and also significantly outperformed previous approaches to this problem [2].
Article
Full-text available
In this letter, we develop a robust voice activity detector (VAD) for the application to variable-rate speech coding. The developed VAD employs the decision-directed parameter estimation method for the likelihood ratio test. In addition, we propose an effective hang-over scheme which considers the previous observations by a first-order Markov process modeling of speech occurrences. According to our simulation results, the proposed VAD shows significantly better performances than the G.729B VAD in low signal-to-noise ratio (SNR) and vehicular noise environments.
Conference Paper
This paper reviews the some of significant works on infant cry signal analysis proposed in the past two decades and reviews the recent progress in this field. The cry of baby cannot be predicted accurately where it is very hard to identify for what it cries for. Experienced parents and specialists in the area of child care such as pediatrician and pediatric nurse can distinguish different sort of cries by just making use their individual perception on auditory sense. This is totally subjective evaluation and not suitable for clinical use. Non-invasive method has been widely used in infant cry signal analysis and has shown very promising results. Various feature extraction and classification algorithms used in infant cry analysis are briefly described. This review gives an insight on the current state of the art works in infant cry signal analysis and concludes with thoughts about the future directions for better representation and interpretation of infant cry signals.
Article
From an investigation of a statistical model-based voice activity detection (VAD), it is discovered that a simple heuristic way like a geometric mean has been adopted for a decision rule based on the likelihood ratio (LR) test. For a successful VAD operation, the authors first review the behaviour mechanism of support vector machine (SVM) and then propose a novel technique, which employs the decision function of SVM using the LRs, while the conventional techniques perform VAD comparing the geometric mean of the LRs with a given threshold value. The proposed SVM-based VAD is compared to the conventional statistical model-based scheme, and shows better performances in various noise environments.
Article
The acoustic feedback problem has intrigued researchers over the past five decades, and a multitude of solutions has been proposed. In this survey paper, we aim to provide an overview of the state of the art in acoustic feedback control, to report results of a comparative evaluation with a selection of existing methods, and to cast a glance at the challenges for future research.
Conference Paper
Human biological signals convey precious information about the physiological and neurological state of the body. Crying is a vocal signal through which babies communicate their needs to their parents who should then satisfy them properly. Most of the researches dealing with infant’s cry intend mainly to establish a relationship between the acoustic properties of a cry and the state of the baby such as hunger, pain, illness and discomfort. In this work, we are interested in recognizing babies only by analyzing their cries through the use of an automatic analysis and recognition system using a real cry database.
Article
A new method of pitch determination similar to the cepstrum except that both the time signal and the log power spectrum are infinitely peak clipped before spectrum analysis has been simulated on a digital computer. Although this new method itself, called the “clipstrum,” is inferior to cepstrum pitch determination, the clipstrum of center‐clippedspeech performs surprisingly well and sometimes is superior to the cepstrum. The clipping in the clipstrum might offer some advantages over cepstrum analysis in certain digital hardware implementations since multiplications could be replaced with additions or subtractions.
Article
The cepstrum, defined as the power spectrum of the logarithm of the power spectrum, has a strong peak corresponding to the pitch period of the voiced‐speech segment being analyzed. Cepstra were calculated on a digital computer and were automatically plotted on microfilm. Algorithms were developed heuristically for picking those peaks corresponding to voiced‐speech segments and the vocal pitch periods. This information was then used to derive the excitation for a computer‐simulated channel vocoder. The pitch quality of the vocoded speech was judged by experienced listeners in informal comparison tests to be indistinguishable from the original speech.
Article
Infant crying signals distress to potential caretakers who can alleviate the aversive conditions that gave rise to the cry. The cry signal results from coordination among several brain regions that control respiration and vocal cord vibration from which the cry sounds are produced. Previous work has shown a relationship between acoustic characteristics of the cry and diagnoses related to neurological damage, SIDS, prematurity, medical conditions, and substance exposure during pregnancy. Thus, assessment of infant cry provides a window into the neurological and medical status of the infant. Assessment of infant cry is brief and noninvasive and requires recording equipment and a standardized stimulus to elicit a pain cry. The typical protocol involves 30 seconds of crying from a single application of the stimulus. The recorded cry is submitted to an automated computer analysis system that digitizes the cry and either presents a digital spectrogram of the cry or calculates measures of cry characteristics. The most common interpretation of cry measures is based on deviations from typical cry characteristics. Another approach evaluates the pattern across cry characteristics suggesting arousal or under-arousal or difficult temperament. Infants with abnormal cries should be referred for a full neurological evaluation. The second function of crying--to elicit caretaking--involves parent perception of the infant's needs. Typically, parents are sensitive to deviations in cry characteristics, but their perception can be altered by factors in themselves (e.g., depression) or in the context (e.g., culture). The potential for cry assessment is largely untapped. Infant crying and parental response is the first language of the new dyadic relationship. Deviations in the signal and/or misunderstanding the message can compromise infant care, parental effectiveness, and undermine the budding relationship. (c) 2005 Wiley-Liss, Inc. MRDD Research Reviews 2005;11:83-93.
Article
As the speech of a normal hearing and a deaf person are different, author expects differences between the crying sound of normal hearing and hard-of-hearing infants as well. In this study the author determined by computerized algorithms the melody of 2762 crying sounds from 316 infants, and compared the results between infants with hearing disorders and normal hearing. The analysis of the crying sounds is aimed to work out a new, cheaper hearing screening method, which would give a new potential to the early detection of hearing disorders. All the applied steps were developed by automatic, computer-executed methods providing reproducible, objective results in contradistinction to some previous studies, which had applied manual methods and reached subjective results. Several possible ways for digital signal processing of the infant cry are discussed. A novel melody shape classification system was created to obtain a more precise distribution of the melodies by their shapes. The system determined 77 different categories, where the first 20 categories covered the 95% of the melodies. The applied methods were created and tested in a huge number of melodies.
Conference Paper
This work presents the development of an automatic recognition system of infant cry, with the objective to classify two types of cry: normal and pathological cry from deaf babies. In this study, we used acoustic characteristics obtained by the mel-frequency cepstrum technique and as a classifier a feedforward neural network that was trained with several learning methods, resulting in a better scaled conjugate gradient algorithm. Current results are shown, which, at the moment, are very encouraging with an accuracy up to 97.43%.
Astatistical model-basedvoice activity detection Signal
  • J Sohn
  • N Kim
Fiftyyear sofacousticfeed backcontrol: Stateoftheart and future challenges
Assessm entofinfantcry:acou sticcryanal ysisandparental perception
  • L Lagasse
  • Lester Neal
  • Assessmentofinfantcry
Automaticclas sificationo finfantcry:Areview
  • J Saraswathy
  • Hariharan
  • Khairunizam Yaacob
  • Automaticclassificationofinfantcry
Cepstrumpitchdetermination the Journalof the Acoustical Society of America
Pitchextraction and fundamen talfrequency
  • D Gerhard
Editors Biobehavioral AssessmentoftheInfant
  • L Singerandp
  • S Zeskind
Themelodyofcrying International Journal of Pediatric Otorhinolaryngology