Conference Paper

Infant cry analysis and detection

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this paper we propose an algorithm for automatic detection of an infant cry. A particular application of this algorithm is the identification of a physical danger to babies, such as situations in which parents leave their children in vehicles. The proposed algorithm is based on two main stages. The first stage involves feature extraction, in which pitch related parameters, MFC (mel-frequency cepstrum) coefficients and short-time energy parameters are extracted from the signal. In the second stage, the signal is classified using the k-NN algorithm and is later verified as a cry signal, based on the pitch and harmonics information. In order to evaluate the performance of the algorithm in real world scenarios, we checked the robustness of the algorithm in the presence of several types of noise, and especially noises such as car horns and car engines that are likely to be present in vehicles. In addition, we addressed real time and low complexity demands during the development of the algorithm. In particular, we used a voice activity detector, which disabled the operation of the algorithm when voice activity was not present. A database of baby cry signals was used for performance evaluation. The results showed good performance of the proposed algorithm, even at low SNR.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Few studies have been conducted specifically on the automatic segmentation of cry signals [14][15][16][17]. Two novel algorithms were introduced by modifying the Harmonic Product Spectrum (HPS) method [14]. ...
... The authors showed that it is possible to check the regularity structure of the spectrum using the HPS method and classify its content by detecting the meaningful parts of the cry sounds. Another study on the segmentation of cry signals was conducted by Cohen [16] with the purpose of labeling each successive segment as a cry/non-cry/non-activity. However, with the methods presented in [16], the inspiration parts as well as the dysphonic vocalizations of the cry spectrum that could be presented with irregular or non-harmonic structure were ignored. ...
... Another study on the segmentation of cry signals was conducted by Cohen [16] with the purpose of labeling each successive segment as a cry/non-cry/non-activity. However, with the methods presented in [16], the inspiration parts as well as the dysphonic vocalizations of the cry spectrum that could be presented with irregular or non-harmonic structure were ignored. ...
Article
An analysis of newborn cry signals, either for the early diagnosis of neonatal health problems or to determine the category of a cry (e.g., pain, discomfort, birth cry, and fear), requires a primary and preliminary preprocessing step to quantify the important expiratory and inspiratory parts of the audio recordings of newborn cries. Data typically contain clean cries interspersed with sections of other sounds (generally, the sounds of speech, noise, or medical equipment) or silence. The purpose of signal segmentation is to differentiate the important acoustic parts of the cry recordings from the unimportant acoustic activities that compose the audio signals. This paper reports on our research to establish an automatic segmentation system for newborn cry recordings based on Hidden Markov Models using the HTK (Hidden Markov Model Toolkit). The system presented in this report is able to detect the two basic constituents of a cry, which are the audible expiratory and inspiratory parts, using a two-stage recognition architecture. The system is trained and tested on a real database collected from normal and pathological newborns. The experimental results indicate that the system yields accuracies of up to 83.79%.
... VAD also faces the challenge of separating the cry and noise. Pan et al. uses it to detect the presence or absence of baby cry in a noisy environment to improve the overall baby cry recognition rate [56] and it is used to detect the sections of the audio with sufficient audio activity [57]. In [41], authors implemented a basic VAD algorithm, which uses short-time features of audio frames and a decision strategy for determining sound and silence frames. ...
... It is a cepstral representation of the audio signals. Researchers use it to test proposed approaches [17,29,49,52,57,[60][61][62] and often use it for baseline experiments [13,15,22,31,37,63]. Liu et al. used MFCC along with two other cepstral features Linear Prediction Cepstral Coefficients (LPCC) and Bark Frequency Cepstral Coefficients (BFCC) for infant cry reason classification. ...
... Since the amplitude of an audio signal varies with time, the short-time energy can serve to differentiate voiced and unvoiced segments. It is used in [20,57,70] for infant cry detection and classification. Torres et al. used voiced-unvoiced counter, which counts all frames having a significant periodic content, as one of the features for cry detection [27]. ...
Article
Full-text available
This paper reviews recent research works in infant cry signal analysis and classification tasks. A broad range of literatures are reviewed mainly from the aspects of data acquisition, cross domain signal processing techniques, and machine learning classification methods. We introduce pre-processing approaches and describe a diversity of features such as MFCC, spectrogram, and fundamental frequency, etc. Both acoustic features and prosodic features extracted from different domains can discriminate frame-based signals from one another and can be used to train machine learning classifiers. Together with traditional machine learning classifiers such as KNN, SVM, and GMM, newly developed neural network architectures such as CNN and RNN are applied in infant cry research. We present some significant experimental results on pathological cry identification, cry reason classification, and cry sound detection with some typical databases. This survey systematically studies the previous research in all relevant areas of infant cry and provides an insight on the current cutting-edge works in infant cry signal analysis and classification. We also propose future research directions in data processing, feature extraction, and neural network classification fields to better understand, interpret, and process infant cry signals.
... There is a limited number of stress detection studies directly conducted for infants or children. Most of them utilized audio signal collected from children for crying detection [2][3][4][5], which is relatively easy to obtain. Using only audio signal for monitoring a child is not practical, as distinguishing the voice or sound of a child in real circumstances, where many children are gathered in one place, is almost impossible. ...
... Only a few studies have been conducted for infants or children. Audio signal collected from children has mainly been used in previous studies [2][3][4][5], attributed to the ease of data collection. Most studies adopted machine learning methods such as k-nearest neighbor [4] and hidden Markov model [3], since they can learn stress detection classifiers from data composed of multiple features without using explicitly defined rules or indices. ...
... Audio signal collected from children has mainly been used in previous studies [2][3][4][5], attributed to the ease of data collection. Most studies adopted machine learning methods such as k-nearest neighbor [4] and hidden Markov model [3], since they can learn stress detection classifiers from data composed of multiple features without using explicitly defined rules or indices. ...
Article
Full-text available
The safety of children has always been an important issue, and several studies have been conducted to determine the stress state of a child to ensure the safety. Audio signals and biological signals including heart rate are known to be effective for stress state detection. However, collecting those data requires specialized equipment, which is not appropriate for the constant monitoring of children, and advanced data analysis is required for accurate detection. In this regard, we propose a stress state detection framework which utilizes both audio signal and heart rate collected from wearable devices, and adopted machine learning methods for the detection. Experiments using real-world data were conducted to compare detection performances across various machine learning methods and noise levels of audio signal. Adopting the proposed framework in the real-world will contribute to the enhancement of child safety.
... The crying of the infant is a common phenomenon and probably is one of the most difficult problems which babysitter have to face when taking care of a baby. Currently, there are many monitoring solutions to detect the crying of infants, such as wireless video camera systems, wireless audio microphone systems, etc [1], [2], [3]. Among them, the most preferred solution is the wireless audio microphone system designed by Lavner [3]. ...
... To solve this problem, Rami [2] proposed a sound-based infants' crying detection system which firstly introduce the method of sound event detection (SED) in detecting infants' crying. In this design, microphones are used to collect infants' sounds and KNN classifier on the server is used to identify infants' crying. ...
Chapter
Full-text available
Infant crying is a main trouble for baby caring in homes. Without an effective monitoring technology, a babysitter may need to stay with the baby all day long. One of the solutions is to design an intelligent system which is able to detect the sound of infant crying automatically. For this purpose, we present a novel infant crying detection system (AICDS in short), which is designed in the client-server framework. In the client side, a robot prototype bought in the market is installed beside the baby carriage, which is equipped a small microphone array to capture sound signals and transmit it to the cloud server with a Wi-Fi module. In the cloud server side, a lightweight convolution neural network model is proposed to identify infant crying or non-infant crying event. Experiments show that our AICDS achieves 86% infant crying detection accuracy, which is valuable to reduce the workload of the babysitters.
... The audio signal was divided into sections of 100 milliseconds and a set of audio features, known to distinguish between different types of audio signals, was computed from each segment. The features include the Mel-Frequency Cepstrum coefficients (Quatieri, 2002), the fundamental frequency of the signal (pitch), harmonicity factor (Cohen & Lavner, 2012), harmonic-to-average power ratio (Cohen & Lavner, 2012), short-time energy, and zero-crossing rate. These features provide temporal and frequency measures that are useful for the detection of cry signals. ...
... The audio signal was divided into sections of 100 milliseconds and a set of audio features, known to distinguish between different types of audio signals, was computed from each segment. The features include the Mel-Frequency Cepstrum coefficients (Quatieri, 2002), the fundamental frequency of the signal (pitch), harmonicity factor (Cohen & Lavner, 2012), harmonic-to-average power ratio (Cohen & Lavner, 2012), short-time energy, and zero-crossing rate. These features provide temporal and frequency measures that are useful for the detection of cry signals. ...
Preprint
Full-text available
Psychological science is in a transitional period: Many findings do not replicate and theories appear not as robust as previously presumed. We suspect that a main reason for theories not appearing as robust is because they are too simple. In this paper, we provide an important step towards this transition in the field of interpersonal relationship research by 1) providing an overarching theoretical framework grounded in existing relationship science, and 2) outlining a novel approach - mobile social physiology – that relies on intelligent technologies like wearable sensors, actuators, and modern analytical methods. At the core of our theoretical principles is co-regulation (one partner’s [statistical] co-dependency on the other partner). Co-regulation has long existed in the literature, but has to date been largely untested. To test the outlined principles, we 3) present a newly programmed app – the Bio-App for Bonding (available on GitHub: https://github.com/co-relab/bioapp). By providing a paradigm shift for relationship research, the field can not only increase the accuracy of measurement and the generalizability of findings, it also allows for moving from the lab to real life situations. We discuss how the mobile social physiology approach is rooted in existing theoretical principles (e.g., Social Baseline and Attachment Theory), extends the concept of co-regulation to allow for specific measurements, and provides a research agenda to develop a model of interpersonal relationships that we hope will stand the test of time.
... With the development of machine learning algorithms, k-nearest Neighbor (kNN), support vector machine (SVM), random forest (RF) and neural network (NN) [7][8][9][10] have been applied to acoustic event recognition. An automatic system is developed to detect the infant cry in car [11], which is classified by kNN with high realtime and low complexity. Problematically, the k-NN have high requirement on data and would perform well only if the data samples are similar to each other. ...
... There has been a great volume of work on AED for many years, using a myriad of techniques and features. The works in [11] used pitch parameters, STE and MFCCs as features to automatically identify and detect baby cry, which proved the effectiveness of the method. However, STE can only provide a representation of the change in amplitude, and cannot characterize the non-stationary property of the acoustic signal. ...
Article
Full-text available
This paper presents a feature extraction approach for surveillance system aimed at achieving the automatic detection and recognition of public security events. The proposed approach first generates a Gabor dictionary based on the human auditory critical frequency bands, and then uses the orthogonal matching pursuit (OMP) algorithm to sparse abnormal audio signal. We select the optimal several important atoms from the Gabor dictionary and extract the scale, frequency, and translation parameters of the atoms to form the OMP feature. The performance of OMP feature is compared with traditional acoustic features and their joint features, using support vector machine (SVM) and random forest (RF) classifiers. Experiments have been performed to evaluate the effectiveness of the OMP feature for supplementing traditional acoustic features. The results show the superior performance classifier for abnormal acoustic event detection (AAED) is RF. Furthermore, the introduction of the combined features addresses the problems of low recognition accuracy and poor robustness for the surveillance system in practical applications.
... LFCC is supported by linear-frequency cepstral coefficients instead of MFCC as a short-time feature. LFCC effectively take into custody the lower as well as higher frequency characteristics than MFCC [2].Also, mel-frequency cepstral coefficients (MFCCs) and short-time energy were accustomed to build up a noise-robust crying detection structure [3] Motivated by this, we use LFCC for strong performances than MFCC. We look forward, by capturing more spectral details in the high frequency section, the linear scale in frequency make available some advantages in speaker recognition over the mel scale. ...
Conference Paper
Full-text available
In this paper, we mainly paying attention on mechanization of Infant's Cry. For this implementation we use LFCC for feature extraction and VQ codebook for toning samples using LBG algorithm. The newborn crying samples composed from various crying baby having 0–6 months age. There are 27 babie's sound as training data, each of which represents the 7 hungry infant cries, 4 sleepy infant cries, 10 in pain infant cries, and 6 uncomfortable infant cries. The testing data is one of the traning newborn crying sample. The discovery of infant cries based the least amount distance of Euclidean distance. The, classification of the cry in four classes neh for hunger owh for sleepy, heh for discomfort, eair for lower gas. Here for classification of the cry our system is alienated into two phases. First is training phase, in which LFCC is used for feature extraction, and then VQ codebooks are engender to compress the feature vectors. Second is the testing stage in which features extraction and codebook production of samples are recurring. Here, estimation of the codebook blueprint of samples to the all the existing patterns in the database are carried based on Euclidian distance between them. LFCC efficiently take into safekeeping the lower as well as higher frequency characteristics than MFCC, hence we will get high-quality results over MFCC.
... The children with severe hearing or voice articulatory impairment have an obvious delay of language acquisition capability [9]. Apart from the groups at risk, analysing the mood-related infant vocalisation also helps for tracking the children's daily state (e. g., comfortability, pain degree, environment sensitivity) variation, and assists the caregivers for their judgement [10,11,12]. Despite the necessity for infant vocalisation analysis, the conventional analysis process is quite laborious and time-consuming, since human annotators or even linguistic experts are required to manually track the information of interests from daylong recordings [7]. ...
Conference Paper
Full-text available
Infant vocalisation analysis plays an important role in the study of the development of pre-speech capability of infants, while machine-based approaches nowadays emerge with an aim to advance such an analysis. However, conventional machine learning techniques require heavy feature-engineering and refined architecture designing. In this paper, we present an evolving learning framework to automate the design of neural network structures for infant vocalisation analysis. In contrast to manually searching by trial and error, we aim to automate the search process in a given space with less interference. This framework consists of a controller and its child networks, where the child networks are built according to the controller’s estimation. When applying the framework to the Interspeech 2018 Computational Paralinguistics (ComParE) Crying Subchallenge, we discover several deep recurrent neural network structures, which are able to deliver competitive results to the best ComParE baseline method.
... The MFCC and LPCC, which are the spectral features have been widely applied in the field of automatic speech recognition (ASR) since the mid-eighties. In addition, MFCC and LPCC have been proven to be the appropriate representations of infant cry signals [5], [26]. Figure 2 illustrates the extraction process of MFCC and LPCC features. ...
Article
Crying is the only way of communication for infants to express their physical and emotional needs. Automatic infant cry analysis that provides fast and non-invasive process is suitable to assess the physical and emotional states of infants. The cry analysis provides an opportunity to understand infants' needs. It is also beneficial in clinical environment for identifying specific pathologies through infant cry. This paper presents an automatic infant cry classification system for a multiclass problem. The cry classification system consists of three stages: (1) feature extraction, (2) feature selection, and (3) pattern classification. We extracted spectral features, such as Mel Frequency Cepstral Coefficients (MFCC) and Linear Prediction Cepstral Coefficients (LPCC) to represent the acoustic characteristics of the cry signals. In addition, the combination of spectral and dynamic features was also investigated. Due to the high dimensionality of data resulting from the feature extraction stage, we selected relevant features to perform feature selection to reduce the data dimensionality. In this stage, five different feature selection techniques were experimented. In the pattern classification stage, two Artificial Neural Network (ANN) architectures: Multilayer Perceptron (MLP) and Radial Basis Function Network (RBFN) were used for classifying the cry signals into five categories: asphyxia, pain, hunger, deaf, and normal. Experimental results show that the best classification accuracy of 93.43% (Kappa value of 0.91) was obtained from MFCC + ΔMFCC + ΔΔMFCC feature set, when using CFS selection technique and RBFN.
... The acoustic features of an infant's cry are of great importance in this field of signal processing. This acoustic signal contains valuable information about their physical, emotional and psychological condition, such as health, identity, gender and emotions according to Cohen and Lavner [6]. These information was used in developing models for infant cries using supervised learning algorithm. ...
Conference Paper
Full-text available
Infants cry to express their emotional, psychological and physiological states. The research paper investigates if cepstral and prosodic audio features are enough to classify the infants’ physiological states such as hunger, pain and discomfort. Dataset from our previous paper was used to train the classification algorithm. The results showed that the audio features could classify an infant’s physiological state. We used three classification algorithms, Decision Tree (J48), Neural Network and Support Vector Machine in developing the infant physiological model. To evaluate the performance of the infant physiological state model, Precision, Recall and F-measure were used as performance metrics. Comparison of the cepstral and prosodic audio feature is presented in the paper. Our findings revealed that Decision Tree and Multilayer Perceptron performed better both for cepstral and prosodic feature. It is noted the cepstral feature yielded better result compare with prosodic feature for the given dataset with correctly classified instances ranging from 87.64% to 90.80 with an overall kappa statistic ranging from 0.47–0.64 using cepstral feature.
... On the classifier side, baby cry studies (including our own previous research [17,18]) use only a handful of classifiers at a time [14,19]. ...
... Baby's cry can be characterized according to its natural periodic tone and the change of voice. It has a base frequency (pitch) in range 250Hz to 600Hz [3]. This study of sound recognition has two main processes, the first process is feature extraction and the second process is classification or determining the sound pattern [4][5] [6] [7]. ...
Conference Paper
Full-text available
Cry is a form of communication for children to express their feeling. Baby's cry can be characterized according to its natural periodic tone and the change of voice. It has a base frequency (pitch) in range 250Hz to 600Hz. Through their baby's cries detection, parents can monitor their baby remotely only in important condition. This study of sound recognition has two main processes, the first process is feature extraction and the second process is classification or determining the sound pattern. In the Linear Frequency Cepstral Coefficient (LFCC) method, the analysis of changes in pre-emphasis, numbers of filter bank and numbers of cepstral are conducted. The selection of the filter bank value which applied must be greater than the cepstral value which applied. Cepstral values is adjusted to get the better accuracy. The highest percentage of accuracy is 90% when this system uses 8 as the cepstral value and 3 as the nearest neighbor value, and all rules are considered the best value based on the test results. The use of LFCC as feature extraction method and K-Nearest Neighbor (K-NN) classification can be implemented to detect the baby is crying or not so that it can be applied as a solution for parents to monitor their children remotely only in certain condition.
... Artificial Neural Networks (ANN), hybrid networks, statistical classifiers and others with respective predictive results [87][88][89][90][91][92][93][94][95][96][97][98][99][100]. In fact, there has been no agreement on which classifier is the most suitable for infant cry classification. ...
Article
Automatic infant cry classification is one of the crucial studies under biomedical engineering scope, adopting the medical and engineering techniques for the classification of diverse physical and physiological conditions of the infants by their cry signal. Subsequently, plentiful studies have executed and issued, broadened the potential application of cry analyses. As yet, there is no ultimate literature documentation composed by performing a longitudinal study, emphasizing on the boast trend of automatic classification of infant cry. A review of literature is performed using the key words “infant cry” AND “automatic classification” from different online resources, regardless of the year of published in order to produce a comprehensive review. Review papers were excluded. Results of search reported about more than 300 papers and after some exclusion 101 papers were selected. This review endeavors at reporting an overview about recent advances and developments in the field of automated infant cry classification, specifically focusing on the developed infant cry databases and approaches involved in signal processing and recognition phases. Eventually, this article was accomplished with some possible implications which may lead for development of an advanced automated cry based classification systems for real time applications.
... In the literature, detection of cry signals is commonly followed through extraction of features from recorded audio segments. These include pitch and formants or other spectral features such as short-time energy, MFCCs and others [15]. In the second stage, the signal is mainly classified using the traditional algorithms such as nearest neighbor or support vector machines (SVM) [16]. ...
... More recent works are based on machine-learning methods that learn to identify cry signals directly from data. Melfrequency cepstral coefficients (MFCCs) and k-nearest neighbors have been used in [16] to classify cry and non-cry units and to alert parents when infants are being left alone (either in apartments or vehicles). Abou-Abbas et al. [17] proposed Hidden Markov Models (HMMs) to detect and classify the inspiratory and expiratory phases of the cry. ...
... Les auteurs ont montré qu'en utilisant ses méthodes, il est en fait possible de classifier la structure spectrale d'un signal donné en détectant ainsi les parties de cri voisées parmi autres activités acoustiques. Une autre recherche [13] portante spécifiquement sur la segmentation des signaux de cris a été menée en 2012. Le but était de marquer chaque segment qu'il soit un cri/non cri/non-activité. ...
Conference Paper
Full-text available
This paper proposes a method for the segmentation of newborn's cry signals recorded in real conditions using the Teager-Kaiser energy operator (TKEO). Based on the wavelet packet analysis, the audio signals are divided into different frequency channels, and then the TKEO and the energy are estimated within each band. The Hidden Markov Models have been used as a classification tool to distinguish the voiced cry parts from the irrelevant acoustic activities that compose the audio signals. The proposed method divided the audio signal containing newborns' cry sounds into different periods showing the audible Expiration and Inspiration of the cry. Different levels of wavelet packet transform have been used to verify the performance of the proposed method on crying signals segmentation and have shown that based on wavelet packet decomposition, the TKEO measure is more effective than the traditional energy measure in detecting important parts of cry signal in a very noisy environment. The proposed features have shown to achieve an accuracy rate of 84.08 %.
... Several works in the literature addressed this task [10], [11], and more recently machine learning methods have been proposed [2]- [4]. Among them, Cohen and Lavner [12] proposed an algorithm based on knearest neighbors to classify each frame as cry or non-cry for alerting parents when infants are being left alone in closed apartments or vehicles. Several acoustic features have been used, such as the fundamental frequency, mel-frequency cepstral coefficients (MFCCs) [13], among others. ...
Conference Paper
The amount of time an infant cries in a day helps the medical staff in the evaluation of his/her health conditions. Extracting this information requires a cry detection algorithm able to operate in environments with challenging acoustic conditions, since multiple noise sources, such as interferent cries, medical equipments, and persons may be present. This paper proposes an algorithm for detecting infant cries in such environments. The proposed solution is a multiple stage detection algorithm: the first stage is composed of an eight-channel filter-and-sum beamformer, followed by an Optimally Modified Log-Spectral Amplitude estimator (OMLSA) post-filter for reducing the effect of interferences. The second stage is the Deep Neural Network (DNN) based cry detector, having audio Log-Mel features as inputs. A synthetic dataset mimicking a real neonatal hospital scenario has been created for training the network and evaluating the performance. Additionally, a dataset containing cries acquired in a real neonatology department has been used for assessing the performance in a real scenario. The algorithm has been compared to a popular approach for voice activity detection based on Long-Term Spectral Divergence, and the results show that the proposed solution achieves superior detection performance both on synthetic data and on real data.
Conference Paper
Baby cry sound detection allows parents to be automatically alerted when their baby is crying. Current solutions in home environment ask for a client-server architecture where an end-node device streams the audio to a centralized server in charge of the detection. Even providing the best performances, these solutions raise power consumption and privacy issues. For these reasons, interest has recently grown in the community for methods which can run locally on battery-powered devices. This work presents a new set of features tailored to baby cry sound recognition, called hand crafted baby cry (HCBC) features. The proposed method is compared with a baseline using mel-frequency cepstrum coefficients (MFCCs) and a state-of-the-art convolutional neural network (CNN) system. HCBC features result to be on par with CNN, while requiring less computation effort and memory space at the cost of being application specific.
Conference Paper
Speech processing techniques help improving real-time communication between human to human, human and machine, machine to machine. If these techniques are used integrated with robots, then their physical flexibility and wider reach can enable a wide range of real-time applications. In this paper, we propose an ‘Intelligent Cry Detection Robotic System’ (ICDRS) for real-monitoring of child-beating in classrooms, in order to facilitate prevention of child-abuse prevalent in this form. The proposed system has two major modules: the ‘Cry-Detection System’ (CDS) and a ‘Smart Robotic System’ (SRS) equipped with audio-visual sensors. The CDS unit present in the classroom, consist of three parts. First, the cry-recording unit (CRU) records the audio signals and sends to ‘Signal Processing Unit’ (SPU). Then SPU applies signal processing techniques, and intelligently detects the cry events using the features extracted from the acoustic signal. If the system detects a cry, then it further sends the control commands to the ‘Signal Transmission Unit’ (STU), which sends an automatic SMS to the Vice-Principal or Supervisor-Teacher, i.e., the person-in-charge, thereby alerting him/her about the child cry in a particular classroom. The controlling person can give control-commands to the SRS from a web-application and can get the live-stream of the video from the classroom. A Wi-Fi module acts to facilitate the communication between this controller and the Robot (SRS). The initial performance evaluation results are very much encouraging. The proposed system can have potential applications in the schools, hospitals and child care centers etc. Hopefully, this prototype can be a useful step towards preventing child-abuse, prevalent in different forms in our society.
Conference Paper
Infant crying provides importance information about the baby's physical and physiological conditions, such as health, gender and emotions. The anatomy of infant's glottal causes different fundamental frequency (F0) production in its cry sound. It is generally accepted that the assumption of an adult voice has quasi-stationary in short period (about 100 ms). However, infant cry has much shorter stationary period (about 5 ms), therefore it is necessary to evaluate several F0 extraction techniques, namely MB-formant tracking, STRAIGHT, YAAPT, and YIN. The results showed that STRAIGHT and YAAPT were successfully extracted F0 of infant crying with accurate voiced and unvoiced regions. The results also showed that span of the F0 was within the range from 190Hz to 600Hz. This suggests that the infant F0 is higher than those of adult. We also evaluated formant of infant cry and the Wavsurfer was more accurate than that of Mustafa-Bruce Formant Tracker.
Conference Paper
Full-text available
Crying is a communication method used by infants given the limitations of language. Parents or nannies who have never had the experience to take care of the baby will experience anxiety when the infant is crying. Therefore, we need a way to understand about infant's cry and apply the formula. This research develops a system to classify the infant's cry sound using MACF (Mel-Frequency Cepstrum Coefficients) feature extraction and BNN (Backpropagation Neural Network) based on voice type. It is classified into 3 classes: hungry, discomfort, and tired. A voice input must be ascertained as infant's cry sound which using 3 features extraction (pitch with 2 approaches: Modified Autocorrelation Function and Cepstrum Pitch Determination, Energy, and Harmonic Ratio). The features coefficients of MFCC are furthermore classified by Backpropagation Neural Network. The experiment shows that the system can classify the infant's cry sound quite well, with 30 coefficients and 10 neurons in the hidden layer.
Conference Paper
Full-text available
Automatic detection of a baby cry in audio signals is an essential step in applications such as remote baby monitoring. It is also important for researchers, who study the relation between baby cry patterns and various health or developmental parameters. In this paper, we propose two machine-learning algorithms for automatic detection of baby cry in audio recordings. The first algorithm is a low-complexity logistic regression classifier, used as a reference. To train this classifier, we extract features such as Mel-frequency cepstrum coefficients, pitch and formants from the recordings. The second algorithm uses a dedicated convolutional neural network (CNN), operating on log Mel-filter bank representation of the recordings. Performance evaluation of the algorithms is carried out using an annotated database containing recordings of babies (0-6 months old) in domestic environments. In addition to baby cry, these recordings contain various types of domestic sounds, such as parents talking and door opening. The CNN classifier is shown to yield considerably better results compared to the logistic regression classifier, demonstrating the power of deep learning when applied to audio processing.
Conference Paper
Developmental disorders are a group of neurolog-ical conditions originating in early neural development. These disorders involve serious impairments in language, learning, and motor skills. An early detection of developmental disorders is crucial, as it enables early intervention (e.g., speech-language and occupational therapy) that may reduce neurological and functional deficits. In this work, we propose a k-nearest neighbours (k-nn) classifier for an early identification of developmental disorders in infants based on their cry. The classifier is based on temporal and spectral features extracted from the cry signal, where the contribution of each feature is estimated in an optimization process. Performance is evaluated against a database of diagnosed infants, with 89% accuracy in cross-validation testing.
Article
We present a novel approach to detect infant cry in actual outdoor and indoor settings. Using computationally inexpensive features like Mel Frequency Cepstral Coefficients (MFCCs) and timbre-related features, the proposed algorithm yields very high recall rates for detecting infant cry in challenging settings such as café, street, playground, office and home environments, even when Signal to Noise Ratio (SNR) is as low as 6dB, while maintaining high precision. The results indicate that our approach is highly accurate, robust and works in real-time.
Article
Languages are used as a tool for human to communicate their needs to one another. To be able to use any language human beings need time to learn to achieve understanding. The newborn babies use their cries by their instinct to communicate their needs. The difference cries of the infant can indicate different requirements. This work proposes a method to determine the meanings of infant cries according to the baby expert. It applies the novel Neuro-fuzzy techniques for the classification and Perceptual Linear Prediction for recognition the infant cries. The results showed that the classification performance obtained by using the Neuro-fuzzy yielded the most desirable accuracy than others popular methods. In addition, The Neuro-fuzzy structure designed in this paper can be applied to speech recognition of other further research.
Article
Full-text available
Cries of infants can be seen as an indicator for several developmental diseases. Different types of classification algorithms have been used in the past to classify infant cries of healthy infants and those with developmental diseases. To determine the ability of classification models to discriminate between healthy infant cries and various cries of infants suffering from several diseases, a literature search for infant cry classification models was performed; 9 classification models were identified that have been used for infant cry classification in the past. These classification models, as well as 3 new approaches were applied to a reference dataset containing cries of healthy infants and cries of infants suffering from laryngomalacia, cleft lip and palate, hearing impairment, asphyxia and brain damage. Classification models were evaluated according to a rating schema, considering the aspects accuracy, degree of overfitting and conformability. Results indicate that many models have issues with accuracy and conformability. However, some of the models, like C5.0 decision trees and J48 classification trees provide promising results in infant cry classification for diagnostic purpose.
Article
Background: Babies cannot communicate their pain properly. Several pain scores are developed, but they are subjective and have high variability inter-observer agreement. The aim of this study was to construct models that use both facial expression and infant voice in classifying pain levels and cry detection. Methods: The study included a total of 23 infants below 12-months who were treated at Dr Soetomo General Hospital. The the Face Leg Activity Cry and Consolability (FLACC) pain scale and recordings of the baby's cries were taken in the video format. A machine-learning-based system was created to detect infant cries and pain levels. Spectrograms with the Short-Time Fourier Transform were used to convert the audio data into a time-frequency representation. Facial features combined with voice features extracted by using the Deep Learning Autoencoders was used for the classification of infant pain levels. Two types of autoencoders: Convolutional Autoencoder and Variational Autoencoder were used for both faces and voices. Result: The goal of the autoencoder was to produce a latent-vector with much smaller dimensions that was still able to recreate the data with minor losses. From the latent-vectors, a multimodal data representation for Convolutional Neural Network (CNN) was used for producing a relatively high F1 score, higher than single data modal such as the voice or facial expressions alone. Two major parts of the experiment were: 1. Building the three autoencoder models, which were autoencoder for the infant’s face, amplitude spectrogram, and dB-scaled spectrogram of infant’s voices. 2. Utilising the latent-vector result from the autoencoders to build the cry detection and pain classification models. Conclusion: In this paper, four pain classifier models with a relatively good F1 score was developed. These models were combined by using ensemble methods to improve performance, which resulted in a better F1 score.
Chapter
Crying is an infant behavior, a part of behavioral system in human which assures persuade of the helpless neonate by eliciting others to meet their basic needs. It is one of the way of communications and a positive sign of healthy life for the infant. The reasons involved for infant’s cry includes hungry, unhappy, discomfort, sadness, stomach pain, has colic or any other diseased conditions. The health of new born babies are effectively identified by the analysis of infant cry. Researchers made a huge analysis of infants by using methods like spectrography, melody shape method, and inverse filtering etc. The paper proposes a procedure to detect the emotion of infant cry by using Feature Extraction techniques including Mel-frequency and Linear predictive coding methods. A statistical tool is used to compare the efficiency of the two techniques (Mel-frequency and linear predictive coding). Present work is carried out mainly for five reasons which includes infant crying, has colic, hungry, sad, stomach pain, unhappy.
Article
Audio signals are temporally-structured data, and learning their discriminative representations containing temporal information is crucial for the audio classification. In this work, we propose an audio representation learning method with a hierarchical pyra-mid structure called pyramidal temporal pooling (PTP) which aims to capture the temporal information of an entire audio sample. By stacking a global temporal pooling layer on multiple local temporal pooling layers, the PTP can capture the high-level temporal dynamics of the input feature sequence in an unsuper-vised way. Furthermore, in the top global temporal pooling layer, we jointly optimize a learnable discriminative mapping (DM) and a softmax classifier. Such that, a joint learning method for the discriminative audio representations and the classifier called DM-PTP is also presented. By treating the temporal encoding as a low-level constraint of a bi-level optimization problem, the DM-PTP can produce the discriminative representation while maintaining the temporal information of the whole sequence. For an audio sample with an arbitrary time duration, both our PTP and DM-PTP can encode the input feature sequence with arbi-trary length into a fixed-length representation. Without using any data augmentation and ensemble learning methods, both PTP and DM-PTP outperform the state-of-the-art CNNs on the audio event recognition (AER) dataset, and can achieve comparable performance on the DCASE 2018 acoustic scene classification (ASC) dataset compared with other best models in the challenge.
Preprint
In this chapter, we compare deep learning and classical approaches for detection of baby cry sounds in various domestic environments under challenging signal-to-noise ratio conditions. Automatic cry detection has applications in commercial products (such as baby remote monitors) as well as in medical and psycho-social research. We design and evaluate several convolutional neural network (CNN) architectures for baby cry detection, and compare their performance to that of classical machine-learning approaches, such as logistic regression and support vector machines. In addition to feed-forward CNNs, we analyze the performance of recurrent neural network (RNN) architectures, which are able to capture temporal behavior of acoustic events. We show that by carefully designing CNN architectures with specialized non-symmetric kernels, better results are obtained compared to common CNN architectures.
Chapter
Full-text available
In this chapter, we compare deep learning and classical approaches for detection of baby cry sounds in various domestic environments under challenging signal-to-noise ratio conditions. Automatic cry detection has applications in commercial products (such as baby remote monitors) as well as in medical and psycho-social research. We design and evaluate several convolutional neural network (CNN) architectures for baby cry detection, and compare their performance to that of classical machine-learning approaches, such as logistic regression and support vector machines. In addition to feed-forward CNNs, we analyze the performance of recurrent neural network (RNN) architectures, which are able to capture temporal behavior of acoustic events. We show that by carefully designing CNN architectures with specialized non-symmetric kernels, better results are obtained compared to common CNN architectures.
Chapter
The infant cry is the only means of communication of a baby and carries information about its physical and mental state. The analysis of the acoustic infant cry waveform opens the possibility of extracting this information, useful in supporting the diagnosis of pathologies since the first days of birth.
Article
The infants admitted in the Neonatal Intensive Care Unit (NICU) always need a Hygienic environment and round the clock observations. Infants or the just born babies always express their physical and emotional needs through cry. Thus, the detection of the reasons behind the infant cry plays a vital role in monitoring the health of the babies in the NICU. In this paper, we have proposed a novel approach for detecting the reasons for Infant's cry. In the proposed approach the cry signal of the infant is captured and from this signal, the unique set of features are extracted using MFCCs, LPCCs, and Pitch. This set of features is used to differentiates the patters signals to recognize the reasons for the cry. The reasons for cry such as hunger, pain, sleep, and discomfort are used to represent different classes. The Neural Network Multilayer classifier is designed to recognize the reasons for the cry using the standard dataset of infant cry. The proposed classifier can achieve accuracy of 93.24% from the combined features of MFCCs, LPCCs and Pitch using
Article
Full-text available
Automatic extraction of acoustic regions of interest from recordings captured in realistic clinical environments is a necessary preprocessing step in any cry analysis system. In this study, we propose a hidden Markov model (HMM) based audio segmentation method to identify the relevant acoustic parts of the cry signal (i.e., expiratory and inspiratory phases) from recordings made in natural environments with various interfering acoustic sources. We examine and optimize the performance of the system by using different audio features and HMM topologies. In particular, we propose using fundamental frequency and aperiodicity features. We also propose a method for adapting the segmentation system trained on acoustic material captured in a particular acoustic environment to a different acoustic environment by using feature normalization and semi-supervised learning (SSL). The performance of the system was evaluated by analyzing a total of 3 h and 10 min of audio material from 109 infants, captured in a variety of recording conditions in hospital wards and clinics. The proposed system yields frame-based accuracy up to 89.2%. We conclude that the proposed system offers a solution for automated segmentation of cry signals in cry analysis applications.
Conference Paper
Full-text available
This paper addresses the problem of automatically detecting infant crying sounds. Infant crying sounds show the distinct and regular time-frequency patterns that include a clear harmonic structure and a unique melody. Therefore, extracting appropriate features to properly represent these characteristics is important in achieving a good performance. In this paper, we propose weighted segment-based two-dimensional linear-frequency cepstral coefficients to characterize the time-frequency patterns within a long-range segment of the target signal. A Gaussian mixture model is adopted to statistically represent the crying and non-crying sounds, and test sounds are classified by using a likelihood ratio test. Evaluation of the proposed feature extraction method on a database of several hundred crying and non-crying sound clips yields an average equal error rate of 4.42% in various noisy environments, showing over 20% relative improvements compared to conventional feature extraction methods.
Book
Full-text available
Pitch extraction (also called fundamental frequency estimation) has been a popular topic in many fields of research since the age of computers. Yet in the course of some 50 years of study, current techniques are still not to a desired level of accuracy and robustness. When presented with a single clean pitched signal, most techniques do well, but when the signal is noisy, or when there are multiple pitch streams, many current pitch algorithms still fail to perform well. This report presents a discussion of the history of pitch detection techniques, as well as a survey of the current state of the art in pitch detection technology.
Conference Paper
Full-text available
We present results on applying a novel machine learning approach for learning auditory moods in natural environments [1] to the problem of detecting crying episodes in preschool classrooms. The resulting system achieved levels of performance approaching that of human coders and also significantly outperformed previous approaches to this problem [2].
Article
Full-text available
In this letter, we develop a robust voice activity detector (VAD) for the application to variable-rate speech coding. The developed VAD employs the decision-directed parameter estimation method for the likelihood ratio test. In addition, we propose an effective hang-over scheme which considers the previous observations by a first-order Markov process modeling of speech occurrences. According to our simulation results, the proposed VAD shows significantly better performances than the G.729B VAD in low signal-to-noise ratio (SNR) and vehicular noise environments.
Conference Paper
This paper reviews the some of significant works on infant cry signal analysis proposed in the past two decades and reviews the recent progress in this field. The cry of baby cannot be predicted accurately where it is very hard to identify for what it cries for. Experienced parents and specialists in the area of child care such as pediatrician and pediatric nurse can distinguish different sort of cries by just making use their individual perception on auditory sense. This is totally subjective evaluation and not suitable for clinical use. Non-invasive method has been widely used in infant cry signal analysis and has shown very promising results. Various feature extraction and classification algorithms used in infant cry analysis are briefly described. This review gives an insight on the current state of the art works in infant cry signal analysis and concludes with thoughts about the future directions for better representation and interpretation of infant cry signals.
Article
From an investigation of a statistical model-based voice activity detection (VAD), it is discovered that a simple heuristic way like a geometric mean has been adopted for a decision rule based on the likelihood ratio (LR) test. For a successful VAD operation, the authors first review the behaviour mechanism of support vector machine (SVM) and then propose a novel technique, which employs the decision function of SVM using the LRs, while the conventional techniques perform VAD comparing the geometric mean of the LRs with a given threshold value. The proposed SVM-based VAD is compared to the conventional statistical model-based scheme, and shows better performances in various noise environments.
Article
The acoustic feedback problem has intrigued researchers over the past five decades, and a multitude of solutions has been proposed. In this survey paper, we aim to provide an overview of the state of the art in acoustic feedback control, to report results of a comparative evaluation with a selection of existing methods, and to cast a glance at the challenges for future research.
Conference Paper
Human biological signals convey precious information about the physiological and neurological state of the body. Crying is a vocal signal through which babies communicate their needs to their parents who should then satisfy them properly. Most of the researches dealing with infant’s cry intend mainly to establish a relationship between the acoustic properties of a cry and the state of the baby such as hunger, pain, illness and discomfort. In this work, we are interested in recognizing babies only by analyzing their cries through the use of an automatic analysis and recognition system using a real cry database.
Article
A new method of pitch determination similar to the cepstrum except that both the time signal and the log power spectrum are infinitely peak clipped before spectrum analysis has been simulated on a digital computer. Although this new method itself, called the “clipstrum,” is inferior to cepstrum pitch determination, the clipstrum of center‐clippedspeech performs surprisingly well and sometimes is superior to the cepstrum. The clipping in the clipstrum might offer some advantages over cepstrum analysis in certain digital hardware implementations since multiplications could be replaced with additions or subtractions.
Article
The cepstrum, defined as the power spectrum of the logarithm of the power spectrum, has a strong peak corresponding to the pitch period of the voiced‐speech segment being analyzed. Cepstra were calculated on a digital computer and were automatically plotted on microfilm. Algorithms were developed heuristically for picking those peaks corresponding to voiced‐speech segments and the vocal pitch periods. This information was then used to derive the excitation for a computer‐simulated channel vocoder. The pitch quality of the vocoded speech was judged by experienced listeners in informal comparison tests to be indistinguishable from the original speech.
Article
Infant crying signals distress to potential caretakers who can alleviate the aversive conditions that gave rise to the cry. The cry signal results from coordination among several brain regions that control respiration and vocal cord vibration from which the cry sounds are produced. Previous work has shown a relationship between acoustic characteristics of the cry and diagnoses related to neurological damage, SIDS, prematurity, medical conditions, and substance exposure during pregnancy. Thus, assessment of infant cry provides a window into the neurological and medical status of the infant. Assessment of infant cry is brief and noninvasive and requires recording equipment and a standardized stimulus to elicit a pain cry. The typical protocol involves 30 seconds of crying from a single application of the stimulus. The recorded cry is submitted to an automated computer analysis system that digitizes the cry and either presents a digital spectrogram of the cry or calculates measures of cry characteristics. The most common interpretation of cry measures is based on deviations from typical cry characteristics. Another approach evaluates the pattern across cry characteristics suggesting arousal or under-arousal or difficult temperament. Infants with abnormal cries should be referred for a full neurological evaluation. The second function of crying--to elicit caretaking--involves parent perception of the infant's needs. Typically, parents are sensitive to deviations in cry characteristics, but their perception can be altered by factors in themselves (e.g., depression) or in the context (e.g., culture). The potential for cry assessment is largely untapped. Infant crying and parental response is the first language of the new dyadic relationship. Deviations in the signal and/or misunderstanding the message can compromise infant care, parental effectiveness, and undermine the budding relationship. (c) 2005 Wiley-Liss, Inc. MRDD Research Reviews 2005;11:83-93.
As the speech of a normal hearing and a deaf person are different, author expects differences between the crying sound of normal hearing and hard-of-hearing infants as well. In this study the author determined by computerized algorithms the melody of 2762 crying sounds from 316 infants, and compared the results between infants with hearing disorders and normal hearing. The analysis of the crying sounds is aimed to work out a new, cheaper hearing screening method, which would give a new potential to the early detection of hearing disorders. All the applied steps were developed by automatic, computer-executed methods providing reproducible, objective results in contradistinction to some previous studies, which had applied manual methods and reached subjective results. Several possible ways for digital signal processing of the infant cry are discussed. A novel melody shape classification system was created to obtain a more precise distribution of the melodies by their shapes. The system determined 77 different categories, where the first 20 categories covered the 95% of the melodies. The applied methods were created and tested in a huge number of melodies.
Conference Paper
This work presents the development of an automatic recognition system of infant cry, with the objective to classify two types of cry: normal and pathological cry from deaf babies. In this study, we used acoustic characteristics obtained by the mel-frequency cepstrum technique and as a classifier a feedforward neural network that was trained with several learning methods, resulting in a better scaled conjugate gradient algorithm. Current results are shown, which, at the moment, are very encouraging with an accuracy up to 97.43%.
Astatistical model-basedvoice activity detection Signal
  • J Sohn
  • N Kim
Fiftyyear sofacousticfeed backcontrol: Stateoftheart and future challenges
Assessm entofinfantcry:acou sticcryanal ysisandparental perception
  • L Lagasse
  • Lester Neal
  • Assessmentofinfantcry
Automaticclas sificationo finfantcry:Areview
  • J Saraswathy
  • Hariharan
  • Khairunizam Yaacob
  • Automaticclassificationofinfantcry
Cepstrumpitchdetermination the Journalof the Acoustical Society of America
Pitchextraction and fundamen talfrequency
  • D Gerhard
Editors Biobehavioral AssessmentoftheInfant
  • L Singerandp
  • S Zeskind
Themelodyofcrying International Journal of Pediatric Otorhinolaryngology