Article

Parkinson disease prediction using intrinsic mode function based features from speech signal

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Abstract— Parkinson’s disease (PD) is a progressive neurological disorder prevalent in old age. Past researches have shown that speech can be used as an early marker for identification of PD. It affects a number of speech components such as phonation, speech intensity, articulation, and respiration, which alters the speech intelligibility. Speech feature extraction and classification always have been challenging tasks due to the existence of non-stationary and discontinuity in the speech signal. In this study, Empirical mode decomposition (EMD) based features are demonstrated to capture the mentioned characteristics. A new feature, intrinsic mode function cepstral coefficient (IMFCC) is proposed to efficiently represent the characteristics of Parkinson speech. The performances of proposed features are assessed with two different datasets: dataset1 and dataset 2 each having 20 normal and 25 Parkinson affected peoples. From the results, it is demonstrated that the proposed intrinsic mode function cepstral coefficient feature provides the superior classification accuracy of both datasets. There is a significant increase of 10-20 % in accuracy compared to the standard acoustic and Mel frequency cepstral coefficient (MFCC) features.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In recent years, approaches based on energy direction features based on empirical mode decomposition [68], instantaneous energy deviation cepstral coefficient [26,27] and intrinsic mode function cepstral coefficient [26,27] have been proposed to determine PD. Since these approaches are traditional pipeline systems, their performance is affected by the architecture used, feature selection methods, and handdesigned speech features. ...
... In recent years, approaches based on energy direction features based on empirical mode decomposition [68], instantaneous energy deviation cepstral coefficient [26,27] and intrinsic mode function cepstral coefficient [26,27] have been proposed to determine PD. Since these approaches are traditional pipeline systems, their performance is affected by the architecture used, feature selection methods, and handdesigned speech features. ...
... Individual sound waveforms were subjected to the Q-factor wavelet transform (QWT) in this investigation. Karan et al. [26,27] suggested a new model for the diagnosis of PD using 150 sound measurements taken from 45 individuals and SVM and RF algorithm. In the study, intrinsic mode function cepstral coefficient (IMFCC) is introduced as a new feature to characterize the characteristics of a voice sample taken from a Parkinson's patient. ...
Article
Full-text available
Parkinson's is one of the most rapidly increasing neurological diseases in the world, caused by the deficiency of dopamine-producing cells in the brain. Voice disorders are a significant finding in the early stage of Parkinson's disease (PD). Detection of this finding at an early stage of the disease allows early treatment of the disease. Therefore, in this study, using sound data, a hybrid model for detecting PD has been designed. In the developed method, first of all, the sound data were converted into spectrograms. Then, the feature maps of the obtained spectrogram images were extracted using 3 different CNN architectures. Feature maps with different features obtained by utilizing the accumulation of different architectures were combined. Then, these features were selected using the arithmetic optimization algorithm (AOA), one of the most recent metaheuristic optimization algorithms, and then classified by support vector machine (SVM) and K-nearest neighbors (KNN). One of the important novelties in the study is the reduction of the size of the acquired feature maps with AOA, a new and high-performance metaheuristic approach. The success of the proposed model in diagnosing Parkinson's disease reached up to 98.19%. In addition, feature maps of the sound data in the dataset were acquired by using the MFCC method to compare the performance of the proposed model. Eight different classifiers were used to categorize the acquired feature maps. The highest accuracy value obtained in this method was obtained in the Random Forest classifier with 93.98%.
... EMD is a signal decomposition method that is not subject to the Heisenberg uncertainty principle and is particularly suitable for processing nonlinear and nonstationary signals [19]. It has been demonstrated that the intrinsic mode functions (IMFs) obtained after voice signal decomposition using EMD carry information about the vocal tract and vocal folds [25]. Karan et al. [25] proposed intrinsic mode function cepstral coefficient (IMFCC) based on EMD from sustained vowels to effectively characterize the PD patients' voice, improving the accuracy by 10% over MFCC-based features. ...
... It has been demonstrated that the intrinsic mode functions (IMFs) obtained after voice signal decomposition using EMD carry information about the vocal tract and vocal folds [25]. Karan et al. [25] proposed intrinsic mode function cepstral coefficient (IMFCC) based on EMD from sustained vowels to effectively characterize the PD patients' voice, improving the accuracy by 10% over MFCC-based features. However, traditional EMD algorithm suffers from mode mixing, end effects, sensitivity to noise, and lack of complete mathematical theory [19,26]. ...
... According to previous studies, we set the number k of IMFs obtained based on traditional EMD and VMD methods to 6 and 4, respectively [22,25,26]. To determine the CEEMDANbased k value, we divided the 120-sample dataset into training and test sets by 7 to 3, then used the support vector machine (SVM), random forest, and multilayer perceptron under the scikit-learn's default parameters to model the training set [31]. Figure 3 shows that both CEEMDAN-based HCCs and DMFCC features have the highest classification accuracy on the test set when k is 8. ...
... classification accuracy. Karan et al. [47] used two different datasets for a total of 90 subjects (40 normal, 50 Parkinson's patients). Significant features were obtained from the sound samples with the proposed intrinsic mode function cepstral coefficient (IMFCC, Mel-frequency cepstral coefficient (MFCC)) algorithm. ...
... Removing features from CNN models with spectrogram images positively affected the model performance. Karan et al. [47] empirical mode decomposition (EMD) based attributes are shown to capture speech characteristics. He proposed a new feature selection, the IMFCC model, to efficiently represent the features of Parkinson's speech. ...
Article
Parkinson which occurs because of affected motor system by central nervous system is a neurodegener-ative disease which is often seen in community. This disease, which is frequently seen especially in the elderly, brings problems such as speech disorders in patients. It is seen that with the rapidly developing deep learning and machine learning methods in recent years, it is possible to distinguish speech disorders in PD patients at a high rate and quickly. In this study, PD diagnosis was performed using datasets containing voice signals of healthy individuals and PD patients (PD_Dataset and PDO_Dataset). Current con-volutional neural networks (CNN) and machine learning (ML) algorithms for PD diagnosis have been examined and a comparative performance analysis has been made. In addition, a different method called SkipConNet + RF based on CNN and random forest (RF) has been proposed for PD diagnosis. With the proposed SkipConNet, important features were obtained from the speech signals; then, the estimation process was performed using the RF algorithm. The proposed method provided an improvement between 3% and 17.19% in the performance of RF algorithms. In addition, the SkipConNet + RF method showed the highest success with 99.11% accuracy in the PD_Dataset dataset and 98.30% in the PDO_Dataset dataset.
... Reduced speech intensity, variation in frequency components, hoarseness in voice, and inconsistency in speech articulation (hypokinetic dysarthria) were among the speech impairments. Due to the presence of non-stationary and discontinuity in the speech signal, extracting and classifying speech features has always been a difficult problem [58]. An important classification difficulty is the appropriate interpretation of voice and speech data to identify PD. ...
... Novel PD data with a class-balanced distribution were classified using the RF classification and the SMOTE method, modeling the data using the data points using multiple decision trees. New predictions were created by combining the findings of each decision tree and giving that category to the data point that was predicted by the majority of the trees [58]. Medication doses, time variables, and preoperative symptom-specific levodopa response were all shown to be strongly linked with clinical outcomes [90]. ...
Article
Full-text available
Parkinson’s disease (PD) is a devastating neurological disease that cannot be identified with traditional plasma experiments, necessitating the development of a faster, less expensive diagnostic instrument. Due to the difficulty of quantifying PD in the past, doctors have tended to focus on some signs while ignoring others, primarily relying on an intuitive assessment scale because of the disease’s characteristics, which include loss of motor control and speech that can be utilized to detect and diagnose this disease. It is an illness that impacts both motion and non-motion functions. It takes years to develop and has a wide range of clinical symptoms and prognoses. Parkinson’s patients commonly display non-motor symptoms such as sleep problems, neurocognitive ailments, and cognitive impairment long before the diagnosis, even though scientists have been working to develop designs for diagnosing and categorizing the disease, only noticeable defects such as movement patterns, speech, or writing skills are offered in this paper. This article provides a thorough analysis of several AI-based ML and DL techniques used to diagnose PD and their influence on developing additional research directions. It follows the guidelines of Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR). This review also examines the current state of PD diagnosis and the potential applications of data-driven AI technology. It ends with a discussion of future developments, which aids in filling critical gaps in the current Parkinson’s study.
... They achieved an accuracy of 92.46 % using KNN classifier with vowels. Similarly, to characterize Parkinson's speech, an intrinsic mode function cepstral coefficient feature was used by Karan et al. [11] to classify healthy controls (HC) from PD patients. In addition, Sigcha et al. [12] used recurrent neural network (RNN) with a single waistworn triaxial accelerometer approach to detect freezing of gait in PD patients. ...
... The last classifier is random forest which works on the principle of unpruned decision trees. The aggregation of multitude of decision trees is considered as a forest with dependence of each tree on different random variables [11]. A total of 100 trees were used for random forest classifier analysis in this study. ...
Article
Parkinson's disease (PD), a neurodegenerative disorder characterized by rest tremors, muscular rigidity, and bradykinesia, has become a global health concern. Currently, a neurologist determines the diagnosis of Parkinson's disease by taking into account several factors. An automated decision-making system would enhance patient care and improve the outcomes for the patient. Biomarkers, such as electroencephalograms (EEGs), can aid in the diagnosis process as they have proven useful in detecting abnormalities in the brain. This study presents a novel algorithm for the automated diagnosis of Parkinson's disease from EEG signals using a flexible analytic wavelet transform (FAWT). First, these acquired EEG signals are preprocessed before decomposition into five frequency sub-bands based on the FAWT method. Several entropy parameters are computed from the decomposed sub-bands and ranked based on their significance level in detecting PD through analysis of variance (ANOVA). Various classifiers are used to identify appropriate feature sets, including support vector machines (SVM), logistics, random forests (RF), radial basis functions (RBF), and k-nearest neighbors (KNN). The proposed approach is evaluated using data collected from two centers in Malaysia (Dataset-I) and the United States (Dataset-II). In dataset-I, the KNN classifier produces accuracy, specificity, sensitivity, and area under the curve of 99%, 99.45%, 99.12%, and 0.991, respectively, while in dataset-II, these values are 95.85%, 95.88%, 96.14%, and 0.959. The proposed system would be extremely useful for neurologists during their diagnostic process, as well as for current clinical practices.
... Tuncer et al. proposed a minimum-averagemaximum tree and singular value decomposition to extract a novel feature signal, subsequently processed by the k-nearest neighbour classifier [20]. Another novel feature introduced in the article by Karan et al. is an intrinsic mode function cepstral coefficient, which should lead to higher classification accuracy compared to standard MFCCs [21]. A non-linear dynamic complexity measure, a discrete wavelet transform, measures of fundamental frequency variation (jitter) and measures of amplitude variation (shimmer) are common baseline features that describe input speech recordings. ...
Article
Full-text available
Speech is one of the most serious manifestations of Parkinson's disease (PD). Sophisticated language/speech models have already demonstrated impressive performance on a variety of tasks, including classification. By analysing large amounts of data from a given setting, these models can identify patterns that would be difficult for clinicians to detect. We focus on evaluating the performance of a large self-supervised speech representation model, wav2vec, for PD classification. Based on the computed wav2vec embedding for each available speech signal, we calculated two sets of 512 derived features, wav2vec-sum and wav2vec-mean. Unlike traditional signal processing methods, this approach can learn a suitable representation of the signal directly from the data without requiring manual or hand-crafted feature extraction. Using an ensemble random forest classifier, we evaluated the embedding-based features on three different healthy vs. PD datasets (participants rhythmically repeat syllables /pa/, Italian dataset and English dataset). The obtained results showed that the wav2vec signal representation was accurate, with a minimum area under the receiver operating characteristic curve (AUROC) of 0.77 for the /pa/ task and the best AUROC of 0.98 for the Italian speech classification. The findings highlight the potential of the generalisability of the wav2vec features and the performance of these features in the cross-database scenarios.
... This makes datasets based on different languages even more different, and generalization would be less likely in such situations. Another limitation of this study was that we did not consider some of the features that were proposed in some recent studies [17,41,23,51]. However, based on a very recent review [37], we included all crucial features that were shown to be significant for PD diagnosis and monitoring. ...
... Karan et.al. [36] suggested an IMF based cepstral feature extraction to detect the neurological disorder and developed the prediction model using SVM. This method was evaluated with two data sets and resulted with significant improvement in accuracy of 10-20% when compared with benchmark MFCC features. ...
Article
Discrimination of normal and adventitious respiratory sounds from stethoscope auscultation by human ears is challenging owing to low frequency characteristics and varying frequency range for inspiration and expiration. This makes the diagnosis of pulmonary disorders a subjective one relying on the experience and hearing capability of the physician. Most computer assisted diagnosis systems formulated to address these limitations fail to capture the inherent acoustic property of respiratory sounds close to human auditory system. To circumvent this problem, this study exploits the gammatone filter banks to evaluate the distribution of all frequency components present in the signal. To categorize the respiratory signals, GoogLeNet based Convolutional Neural Networks (CNN) prediction model is developed through Time-Frequency (TF) visualization of gammatone cepstral coefficients acquired from decomposed Intrinsic Mode Functions (IMFs) in an empirical manner. As the performance of the CNN model is greatly dependent on the learning environment, this study also tends to optimize the values of the hyper parameters to enhance the classification performance of the CNN model. Accordingly, the optimal values of initial learning rate, L2 regularization and momentum are identified for both Stochastic Gradient Descent Momentum (SGDM) and Adaptive Momentum (ADAM) optimizer using Bayesian optimization technique. Experimentation is carried out for three TF visualization methods viz. spectrograms, scalograms and Constant Q spectrograms. Evaluated results show that scalogram method of classification yields high accuracy of 87.37% and 88.32% for default selection of hyperparameters using SGDM and ADAM optimizers and with optimal selection of hyperparameters based on objective function reveal an improved accuracy of 93.68 % and 95.67% for SGDM and ADAM optimizers.
... The features of the speech signal were illustrated using empirical mode decomposition. Intrinsic mode function cepstral coefficient was used to represent the speech characteristics (Karan et al. 2020). A study was presented on how environmental factors cause PD. ...
Article
Full-text available
Parkinson disease (PD) is a neurodegenerative disease that occurs due to insufficient level of dopamine in the human brain. This disease may occur predominantly in elder people. There exists no definite procedure to diagnose PD. It is diagnosed based on the symptoms, clinical trials, and number of laboratory tests. In this research paper, machine learning techniques are used to predict PD and help the medical practitioner to recommend personalized drugs for the patients. In this research paper, appropriate features are selected through rough set theory, and principal component analysis is used for dimensionality reduction. The performance is experimented using deep neural network, random forest, and SVM classifiers. The efficiency of the proposed approach is measured through confusion matrix, accuracy, precision, and recall.
... It is worth noting that certain classification models achieved almost 100% accuracy in distinguishing PD patients from healthy subjects on small datasets [23]. This study differs from previous research [24] in that it utilized a larger dataset and employed a deep neural network classifier. The study collected recordings from 252 subjects, consisting of 188 patients with PD and 64 healthy controls, and extracted various feature subsets from the recordings. ...
Article
Full-text available
Parkinson's disease (PD) is a prevalent neurodegenerative disorder that has prompted the development of telediagnosis and remote monitoring systems. Dysphonia, a common symptom in the early stages of PD, affects approximately 90% of patients. Therefore, testing for persistent pronunciation or dysphonia in continuous speech can aid in the diagnosis of PD. Our study utilized speech signals from 252 subjects as the dataset. In this study, language signal features were used as input to machine learning algorithms, and the resulting classifiers were integrated to improve accuracy in the classification of Parkinson's disease (PD). The experimental results demonstrated a diagnostic accuracy of up to 95% using these machine learning algorithms. Additionally, a method of feature extraction based on clinical experience was presented for analyzing subjects' language signals.
... This is due to differences in, for example, the considered speech tasks and experimental setups. For the vowels of PC-GITA, a few recent studies reported accuracies around 90% [60], [61], [62]. However, these studies considered each vowel of PC-GITA separately in training and testing the classification models. ...
Article
Full-text available
Parkinson's disease (PD) is a progressive neurological disorder which affects the motor system. The automatic detection of PD improves the diagnosis of the disease, and it can be done in a non-invasive manner from speech. In this paper, we investigate the use of an exemplar-based sparse representation (SR) classification approach for detecting PD from speech. Exemplars are speech feature vectors extracted from the training data. The idea is to formulate the detection task as a problem of finding sparse representations of test speech feature vectors with respect to training speech exemplars. The main advantage of using the SR approach instead of conventional machine learning (ML)-based approaches is that the training step–which is time-consuming and sometimes requires unorganized hyper-parameter tuning–is not needed. Furthermore, SRs are more robust to redundancy and noise in the data. In this work, we study SR classification approaches based on two sparse coding models, namely, l1-regularized least squares ( $l_{1}$ LS) and non-negative least squares (NNLS). We propose a strategy based on class-specific dictionaries for improving performance of the $l_{1}$ LS- and NNLS-based SR classification. To investigate the detection performance, the $l_{1}$ LS- and NNLS-based approaches are applied and compared with the traditional PD detection approach based on ML classification algorithms using the PC-GITA PD dataset and an openly available dataset consisting of mobile device voice recordings from healthy and PD patients. The results indicate that the proposed NNLS-based SR classification approach performs better than the traditional ML-based methods in discriminating PD patients from healthy subjects.
... This makes datasets based on different languages even more different, and generalization would be less likely in such situations. Another limitation of this study was that we did not consider some of the features that were proposed in some recent studies [17,41,23,51]. However, based on a very recent review [37], we included all crucial features that were shown to be significant for PD diagnosis and monitoring. ...
... Furthermore, persons with the subject disease in its early stages might experience speech problems [10]. These include dysphonia (weak vocal fluency), repetitious echoes (a tiny assortment of audio variations), and hypophonia (vocal musculature disharmony) [7,11]. Information from human aural emissions might be detected and evaluated using a computing unit [12,13]. ...
Article
Full-text available
Parkinson's disease (PD) is a neurodegenerative disease that impacts the neural, physiological, and behavioral systems of the brain, in which mild variations in the initial phases of the disease make precise diagnosis difficult. The general symptoms of this disease are slow movements known as 'bradykinesia'. The symptoms of this disease appear in middle age and the severity increases as one gets older. One of the earliest signs of PD is a speech disorder. This research proposed the effectiveness of using supervised classification algorithms, such as support vector machine (SVM), naïve Bayes, k-nearest neighbor (K-NN), and artificial neural network (ANN) with the subjective disease where the proposed diagnosis method consists of feature selection based on the filter method, the wrapper method, and classification processes. Since just a few clinical test features would be required for the diagnosis, a method such as this might reduce the time and expense associated with PD screening. The suggested strategy was compared to PD diagnostic techniques previously put forward and well-known classifiers. The experimental outcomes show that the accuracy of SVM is 87.17%, naïve Bayes is 74.11%, ANN is 96.7%, and KNN is 87.17%, and it is concluded that the ANN is the most accurate one with the highest accuracy. The obtained results were compared with those of previous studies, and it has been observed that the proposed work offers comparable and better results.
... The main features for speech sample classification vary across languages (Eyigoz et al., 2020). Different feature extraction methods and different datasets can also obstruct the unification of features (Karan et al., 2020;Zhang et al., 2021). It is one of the main goals for related studies to reduce the number of features by choosing the most relevant for PWP detection. ...
Article
Full-text available
Parkinson’s disease (PD) is a neurodegenerative disorder that negatively affects millions of people. Early detection is of vital importance. As recent researches showed dysarthria level provides good indicators to the computer-assisted diagnosis and remote monitoring of patients at the early stages. It is the goal of this study to develop an automatic detection method based on newest collected Chinese dataset. Unlike English, no agreement was reached on the main features indicating language disorders due to vocal organ dysfunction. Thus, one of our approaches is to classify the speech phonation and articulation with a machine learning-based feature selection model. Based on a relatively big sample, three feature selection algorithms (LASSO, mRMR, Relief-F) were tested to select the vocal features extracted from speech signals collected in a controlled setting, followed by four classifiers (Naïve Bayes, K-Nearest Neighbor, Logistic Regression and Stochastic Gradient Descent) to detect the disorder. The proposed approach shows an accuracy of 75.76%, sensitivity of 82.44%, specificity of 73.15% and precision of 76.57%, indicating the feasibility and promising future for an automatic and unobtrusive detection on Chinese PD. The comparison among the three selection algorithms reveals that LASSO selector has the best performance regardless types of vocal features. The best detection accuracy is obtained by SGD classifier, while the best resulting sensitivity is obtained by LR classifier. More interestingly, articulation features are more representative and indicative than phonation features among all the selection and classifying algorithms. The most prominent articulation features are F1, F2, DDF1, DDF2, BBE and MFCC.
... (GNE) and formant frequency or use spectrum and cepstrum for feature extraction. Other examples are mel-frequency cepstral coefficients (MFCC) [7], perception linear predictive coefficients (PLP), etc. [8]. After that, deep learning methods are used to detect dysarthria, such as convolutional neural network (CNN), CNN-LSTM (long short-term memory), and other models [9,10]. ...
Article
Full-text available
In recent years, due to the rise in the population and aging, the prevalence of neurological diseases is also increasing year by year. Among these patients with Parkinson’s disease, stroke, cerebral palsy, and other neurological symptoms, dysarthria often appears. If these dysarthria patients are not quickly detected and treated, it is easy to cause difficulties in disease course management. When the symptoms worsen, they can also affect the patient’s psychology and physiology. Most of the past studies on dysarthria detection used machine learning or deep learning models as classification models. This study proposes an integrated CNN-GRU model with convolutional neural networks and gated recurrent units to detect dysarthria. The experimental results show that the CNN-GRU model proposed in this study has the highest accuracy of 98.38%, which is superior to other research models.
... It can be seen as a signal-to-aspiration noise ratio when other aperiodicities in the signal are comparatively low [55]. Vocal fold excitation ratio (VFER) gives the amount of noise in terms of nonlinear energy and entropy value produced due to pathological vocal fold oscillation [56]. Articulation: Articulation deficits are mainly related to changes in position of tongue, lips, velum, and other articulators involved in speech production [17]. ...
Article
Full-text available
Early, objective, and accurate assessment and identification of dysarthria caused by neurological diseases are essential in neurorehabilitation. This could be achieved by a robust smart system. However, developing such a system requires a standard training database that is properly labelled, which unfortunately is currently lacking. The present study aimed to establish a standardized, audio-visual integrated speech database of subacute stroke patients with dysarthria, named “The Mandarin Subacute Stroke Dysarthria Multimodal (MSDM) Database”, which included audio-visual data from 25 subacute stroke patients and 25 healthy participants. In addition, comprehensive subjective clinical assessment information of speech-motor function and ecological psychology of each patient was also provided. Based on this database, a pilot study was conducted to detect the significant acoustic and visual characteristics that revealed the severity of dysarthria related to subacute stroke. The present study offered a novel perspective to objectively quantify and identify the pathological differences in speech production. It can serve as a baseline for the development of an automatic intelligent system for assessing severity of dysarthria. In conclusion, the establishment and analysis of high-quality database on articulation errors associated with dysarthria will benefit clinical treatments and contribute to the realization of automatic diagnostic tools that can be implemented for clinical telehealth services.
... Disease diagnosis and monitoring using voice bio-markers has precedence in some therapeutic areas, like Parkinson's disease ( [13,14]). Harimoorthy et al. proposed a cloud-based system to identify Parkinson's disease by applying machine learning to voice data [15]. Similarly, Asmea et al. proposed a neural network to identify Parkinson's disease that sought to identify traits of the voice disorder, dysphonia [16]. ...
... The majority of state-of-the-art automatic dysarthric speech classification techniques are based on training classical classifiers on handcrafted acoustic features characterizing different impaired speech dimensions [3][4][5][6][7][8][9]. Recently, deep learning approaches aiming to learn high-level speech representations relevant for such a task have gained attention in the research community [10][11][12][13][14][15][16][17][18]. ...
... In Equation 1, M represents the desired number of filters and f shows the list of frequencies. The MFCC method is a method that is frequently used especially in biomedical studies [28,29]. The block diagram of the MFCC method is given in Figure 4. ...
Article
Sleep patterns and sleep continuity have a great impact on people's quality of life. The sound of snoring both reduces the sleep quality of the snorer and disturbs other people in the environment. Interpretation of sleep signals by experts and diagnosis of the disease is a difficult and costly process. Therefore, in the study, an artificial intelligence-based hybrid model was developed for the classification of snoring sounds. In the proposed method, first of all, audio signals were converted into images using the Mel-spectrogram method. The feature maps of the obtained images were obtained using Alexnet and Resnet101 architectures. After combining the feature maps that are different in each architecture, dimension reduction was made using the NCA dimension reduction method. The feature map optimized using the NCA method was classified in the Bilayered Neural Network. In addition, spectrogram images were classified with 8 different CNN models to compare the performance of the proposed model. Later, in order to test the performance of the proposed model, feature maps were obtained using the MFCC method and the obtained feature maps were classified in different classifiers. The accuracy value obtained in the proposed model is 99.5%
Article
Full-text available
End-to-end deep learning models have shown promising results for the automatic screening of Parkinson’s disease by voice and speech. However, these models often suffer degradation in their performance when applied to scenarios involving multiple corpora. In addition, they also show corpus-dependent clusterings. These facts indicate a lack of generalisation or the presence of certain shortcuts in the decision, and also suggest the need for developing new corpus-independent models. In this respect, this work explores the use of domain adversarial training as a viable strategy to develop models that retain their discriminative capacity to detect Parkinson’s disease across diverse datasets. The paper presents three deep learning architectures and their domain adversarial counterparts. The models were evaluated with sustained vowels and diadochokinetic recordings extracted from four corpora with different demographics, dialects or languages, and recording conditions. The results showed that the space distribution of the embedding features extracted by the domain adversarial networks exhibits a higher intra-class cohesion. This behaviour is supported by a decrease in the variability and inter-domain divergence computed within each class. The findings suggest that domain adversarial networks are able to learn the common characteristics present in Parkinsonian voice and speech, which are supposed to be corpus, and consequently, language independent. Overall, this effort provides evidence that domain adaptation techniques refine the existing end-to-end deep learning approaches for Parkinson’s disease detection from voice and speech, achieving more generalizable models.
Conference Paper
Services based on Artificial Intelligence (AI) are becoming increasingly pervasive in our society. At the same time, however, we are also witnessing a growing awareness towards the ethical aspects and the trustworthiness of AI tools, especially in high stakes domains, such as the healthcare one. In this paper, we propose the adoption of AI techniques for predicting Parkinson’s Disease progression with the overarching aim of accommodating the urgent need for trustworthiness. We address two key requirements towards trustworthy AI, namely privacy preservation in learning AI models and their explainability. As for the former aspect, we consider the (rather common) case of medical data coming from different health institutions, assuming that they cannot be shared due to privacy concerns. To address this shortcoming, we leverage federated learning (FL) as a paradigm for collaborative model training among multiple parties without any disclosure of private raw data. As for the latter aspect, we focus on highly interpretable models, i.e., those for which humans are able to understand how decisions have been taken. An extensive experimental analysis carried out on a well-known Parkinson Telemonitoring dataset highlights how the proposed approach based on FL of fuzzy rule-based systems allows achieving, simultaneously, data privacy and interpretability. Results are reported for different data partitioning scenarios, also comparing the interpretable-by-design model with an opaque neural network model.
Article
This study proposes a novel method for detecting Parkinson's disease (PD) based on a time-frequency representation matrix (TFRM) of the speech signal generated by the wavelet synchrosqueezing transform (WSST). The energy and entropy of each frequency component of the TFRM are calculated and used as features for detecting PD using speech signals. Then, the genetic algorithm along with support vector machine (SVM) and gradient boosting models (GBM) are utilized for classification. The results indicate that the proposed approach effectively detects PD using speech signals. We have obtained the maximum accuracy of 95% using the word /apto/. The proposed work shows better results in comparison to the majority of the existing state-of-the-art techniques.
Article
Full-text available
Parkinson's disease is a recognizable clinical syndrome with a variety of causes and clinical presentations; it represents a rapidly growing neurodegenerative disorder. Since about 90 percent of Parkinson's disease sufferers have some form of early speech impairment, recent studies on tele diagnosis of Parkinson's disease have focused on the recognition of voice impairments from vowel phonations or the subjects' discourse. In this paper, we present a new approach for Parkinson's disease detection from speech sounds that are based on CNN and LSTM and uses two categories of characteristics Mel Frequency Cepstral Coefficients (MFCC) and Gammatone Cepstral Coefficients (GTCC) obtained from noise-removed speech signals with comparative EMD-DWT and DWT-EMD analysis. The proposed model is divided into three stages. In the first step, noise is removed from the signals using the EMD-DWT and DWT-EMD methods. In the second step, the GTCC and MFCC are extracted from the enhanced audio signals. The classification process is carried out in the third step by feeding these features into the LSTM and CNN models, which are designed to define sequential information from the extracted features. The experiments are performed using PC-GITA and Sakar datasets and 10-fold cross validation method, the highest classification accuracy for the Sakar dataset reached 100% for both EMD-DWT-GTCC-CNN and DWT-EMD-GTCC-CNN, and for the PC-GITA dataset, the accuracy is reached 100% for EMD-DWT-GTCC-CNN and 96.55% for DWT-EMD-GTCC-CNN. The results of this study indicate that the characteristics of GTCC are more appropriate and accurate for the assessment of PD than MFCC.
Book
In recent years, AI/ML tools have become more prevalent in the fields of medical imaging and imaging informatics, where systems are already outperforming physicians in a range of domains, such as in the classification of retinal fundus images in ophthalmology, chest X-rays in radiology, and skin cancer detection in dermatology, among many others. It has recently emerged as one of the fastest growing research areas given the evolution of techniques in radiology, molecular imaging, anatomical imaging, and functional imaging for detection, segmentation, diagnosis, annotation, summarization, and prediction. The ongoing innovations in this exciting and promising field play a powerful role in influencing the lives of millions through health, safety, education, and other opportunities intended to be shared across all segments of society. To achieve further progress, this Special Issue (SI) invited both research and review-type manuscripts to showcase ongoing research progress and development based on applications of AI/ML (especially DL techniques) in medical imaging to influence human health and healthcare systems in the diagnostic decision-making process. The SI published fourteen articles after a rigorous peer-review process across the spectrum of medical imaging modalities and the diversity of specialties depending on imaging techniques from radiology, dermatology, pathology, colonoscopy, endoscopy, etc.
Article
Different decision-making skills are used differently in the advancement of ML and DL techniques. In particular, it will probably be necessary to use ML and DL approaches for disease detection. The proposed effort uses acoustic-based DL approaches' ability to detect PD symptoms. This investigation also examines a number of motor metrics such as nonmotor PD measurements in this regard. Many DL approaches, including deep knowledge generation networks and recurrent networks, can be employed to identify this disease. Different decision-making capabilities are used differentially in the deployment of ML and DL methods. In particular, it will soon be necessary to use Machine learning and Deep Learning techniques for disease detection. The proposed effort uses acoustic-based DL approaches' ability to detect PD symptoms. This investigation also examines a number of motor metrics such as nonmotor PD measurements in this regard. Many DL approaches, including deep knowledge generation networks and recurrent networks, can be employed to identify this disease.
Article
Full-text available
Background: Normal voice production depends on the synchronized cooperation of multiple physiological systems, which makes the voice sensitive to changes. Any systematic, neurological, and aerodigestive distortion is prone to affect voice production through reduced cognitive, pulmonary, and muscular functionality. This sensitivity inspired using voice as a biomarker to examine disorders that affect the voice. Technological improvements and emerging machine learning (ML) technologies have enabled possibilities of extracting digital vocal features from the voice for automated diagnosis and monitoring systems. Objective: This study aims to summarize a comprehensive view of research on voice-affecting disorders that uses ML techniques for diagnosis and monitoring through voice samples where systematic conditions, nonlaryngeal aerodigestive disorders, and neurological disorders are specifically of interest. Methods: This systematic literature review (SLR) investigated the state of the art of voice-based diagnostic and monitoring systems with ML technologies, targeting voice-affecting disorders without direct relation to the voice box from the point of view of applied health technology. Through a comprehensive search string, studies published from 2012 to 2022 from the databases Scopus, PubMed, and Web of Science were scanned and collected for assessment. To minimize bias, retrieval of the relevant references in other studies in the field was ensured, and 2 authors assessed the collected studies. Low-quality studies were removed through a quality assessment and relevant data were extracted through summary tables for analysis. The articles were checked for similarities between author groups to prevent cumulative redundancy bias during the screening process, where only 1 article was included from the same author group. Results: In the analysis of the 145 included studies, support vector machines were the most utilized ML technique (51/145, 35.2%), with the most studied disease being Parkinson disease (PD; reported in 87/145, 60%, studies). After 2017, 16 additional voice-affecting disorders were examined, in contrast to the 3 investigated previously. Furthermore, an upsurge in the use of artificial neural network-based architectures was observed after 2017. Almost half of the included studies were published in last 2 years (2021 and 2022). A broad interest from many countries was observed. Notably, nearly one-half (n=75) of the studies relied on 10 distinct data sets, and 11/145 (7.6%) used demographic data as an input for ML models. Conclusions: This SLR revealed considerable interest across multiple countries in using ML techniques for diagnosing and monitoring voice-affecting disorders, with PD being the most studied disorder. However, the review identified several gaps, including limited and unbalanced data set usage in studies, and a focus on diagnostic test rather than disorder-specific monitoring. Despite the limitations of being constrained by only peer-reviewed publications written in English, the SLR provides valuable insights into the current state of research on ML-based voice-affecting disorder diagnosis and monitoring and highlighting areas to address in future research.
Article
Parkinson’s disease (PD) is the second most prevalent neurodegenerative disorder in the world after Alzheimer’s disease. Early diagnosing PD is challenging as it evolved slowly, and its symptoms eventuate gradually. Recent studies have demonstrated that changes in speech may be utilized as an excellent biomarker for the early diagnosis of PD. In this study, we have proposed a Chirplet transform (CT) based novel approach for diagnosing PD using speech signals. We employed CT to get the time-frequency matrix (TFM) of each speech recording, and we extracted time-frequency based entropy (TFE) features from the TFM. The statistical analysis demonstrates that the TFE features reflect the changes in speech that occurs in the speech due to PD, hence can be used for classifying the PD and healthy control (HC) individuals. The effectiveness of the proposed framework is validated using the vowels and words from the PC-GITA database. The genetic algorithm is utilized to select the optimum features subset, while a support vector machine (SVM), decision tree (DT), K-Nearest Neighbor (KNN), and Naïve Bayes (NB) classifiers are employed for classification. The TFE features outperform the breathiness and Mel frequency cepstral coefficients (MFCC) features. The SVM classifier is most effective compared to other machine-learning classifiers. The highest classification accuracy rates of 98% and 99% are achieved using the vowel /a/ and word /atleta/, respectively. The results reveal that the proposed CT-based entropy features effectively diagnose PD using the speech of a person.
Article
Parkinson’s disease (PD) is a neuron-related disorder due to the decrease in dopaminergic neurons present in the midbrain. For the last few decades, speech is an emerging interest in the analysis and detection of PD. In this study, a predictive machine learning framework based on extreme gradient boosting (XGBoost) features selection and a stacked ensemble approach is presented to investigate the voice tremor of people suffering from PD. The proposed framework consists of two stages: In the first stage the optimized features are obtained using XGBoost features selection, and in the second stage a PD detection system is developed using stacked ensemble classifiers. Leave one subject out (LOSO) cross-validation shows that the proposed framework gives average accuracy of up to 95.07% compared to results obtained with individual classifiers. Additionally, it was also concluded that reduced features had given the highest classification accuracy compared to the raw features set which saves training time and enhances the prediction accuracy.
Chapter
Parkinson’s disease is a nervous system disease that progresses over time and causes the patient’s movement skills to deteriorate. The deficiency of dopamine hormone in the brain causes a sort of abnormal activity, which leads to problems in movement and other Parkinson’s disease symptoms like fuzzy thinking, difficulty in recalling things. No specific tests exist to diagnose the patients. In many cases, the clinical picture of Parkinson’s disease is typical; nonetheless, symptoms that separate it from other conditions should be meticulously investigated. No two people experience this disease in the same way. Even with the presence of various technological models that help predict this disease, none of them are personalized enough. The main focus is on personalization by building a hybrid model which will take into consideration various important factors of the disease. Hence, it has been decided to come up with a way to predict the onset or presence of this disease based on the few and not easily detectable changes in the person which might be the potential symptoms of this disease which affects more than 1 million people in India annually, will prove to be helpful. With the integration of technology, a cost-effective, user-friendly, and personalized system can be developed.KeywordsParkinson’s diseaseIngeniousMachine learningPrediction system
Chapter
Full-text available
Several monumental vicissitudes have occurred in IC design industry in various fields of electronics. The challenge of circuit design is addressed by numerous multifaceted optimization approaches such as the technology castoff for the implementation of design, the topologies in realization, the circuits, architectures and algorithm. Therefore, in product development, the trade-off exists between area-power-speed and optimal ASIC library. This work reveals a paradigm of GDI library creation which supports for designing combinational and sequential logic circuit for low-power and high-speed applications. This work demonstrates four different GDI library pattern creations with and without level restoration circuits. The experimentation was done using Silterra 130 nm process mentor graphics Pyxis software and the parameter like rise time, fall time, delay power and dynamic power have been analysed. These four library cells are compared with the existing counterpart CMOS technology and reveal the significant improvement in terms of transistor count, delay and power.KeywordsMUX based connectivityGDI libraryGDI with bufferGDI F1 and F2Level restorationPower and delay
Article
Full-text available
According to the World Health Organization (WHO), Parkinson’s disease (PD) is a neurodegenerative disease of the brain that causes motor symptoms including slower movement, rigidity, tremor, and imbalance in addition to other problems like Alzheimer’s disease (AD), psychiatric problems, insomnia, anxiety, and sensory abnormalities. Techniques including artificial intelligence (AI), machine learning (ML), and deep learning (DL) have been established for the classification of PD and normal controls (NC) with similar therapeutic appearances in order to address these problems and improve the diagnostic procedure for PD. In this article, we examine a literature survey of research articles published up to September 2022 in order to present an in-depth analysis of the use of datasets, various modalities, experimental setups, and architectures that have been applied in the diagnosis of subjective disease. This analysis includes a total of 217 research publications with a list of the various datasets, methodologies, and features. These findings suggest that ML/DL methods and novel biomarkers hold promising results for application in medical decision-making, leading to a more methodical and thorough detection of PD. Finally, we highlight the challenges and provide appropriate recommendations on selecting approaches that might be used for subgrouping and connection analysis with structural magnetic resonance imaging (sMRI), DaTSCAN, and single-photon emission computerized tomography (SPECT) data for future Parkinson’s research.
Preprint
Full-text available
This paper considers a representation learning strategy to model speech signals from patients with Parkinson's disease and cleft lip and palate. In particular, it compares different parametrized representation types such as wideband and narrowband spectrograms, and wavelet-based scalograms, with the goal of quantifying the representation capacity of each. Methods for quantification include the ability of the proposed model to classify different pathologies and the associated disease severity. Additionally, this paper proposes a novel fusion strategy called multi-spectral fusion that combines wideband and narrowband spectral resolutions using a representation learning strategy based on autoencoders. The proposed models are able to classify the speech from Parkinson's disease patients with accuracy up to 95\%. The proposed models were also able to asses the dysarthria severity of Parkinson's disease patients with a Spearman correlation up to 0.75. These results outperform those observed in literature where the same problem was addressed with the same corpus.
Article
Background and Objective Speech impairment is an early symptom of Parkinson's disease (PD). This study has summarized the literature related to speech and voice in detecting PD and assessing its severity. Methods A systematic review of the literature from 2010 to 2021 to investigate analysis methods and signal features. The keywords “Automatic analysis” in conjunction with “PD speech” or “PD voice” were used, and the PubMed and ScienceDirect databases were searched. A total of 838 papers were found on the first run, of which 189 were selected. One hundred and forty-seven were found to be suitable for the review. The different datasets, recording protocols, signal analysis methods and features that were reported are listed. Values of the features that separate PD patients from healthy controls were tabulated. Finally, the barriers that limit the wide use of computerized speech analysis are discussed. Results Speech and voice may be valuable markers for PD. However, large differences between the datasets make it difficult to compare different studies. In addition, speech analytic methods that are not informed by physiological understanding may alienate clinicians. Conclusions The potential usefulness of speech and voice for the detection and assessment of PD is confirmed by evidence from the classification and correlation results.
Article
Parkinson’s disease is a neurological illness that affects individuals at the later stage of life. Most patients complain of voice or speech abnormalities during the nascent stage of this disease, and it is difficult to recognize these abnormalities. This creates a need for a speech signal-based Parkinson's detection system to aid clinicians in the diagnosis process. A hybrid Parkinson's disease detection system has been proposed in this research work. Two speech datasets have been used in the design of this system: The first is an Italian Parkinson's Voice & Speech dataset, and the other is Mobile Device Voice Recordings at King's College London dataset. Seventeen acoustic features have been generated from the voice samples available in the datasets using Parselmouth library. In addition, based on the significance of features, the eight most significant features have been used in the design of the model. These features have been selected using genetic algorithm method. Four classifiers, k-nearest neighbors, XGBoost, random forest, and logistic regression, have been used during classification stage. The accuracy, sensitivity, f-measure, specificity, and precision parameters have been used for the analysis of the designed system. The combination of a genetic algorithm-based feature selection approach and logistic regression classifier has given 100% accuracy on Italian Parkinson's Voice & Speech dataset. The same feature extraction and classifier combination on the Mobile Device Voice Recordings at King's College London dataset have attained an accuracy level of 90%. Results have shown that the proposed system has outperformed the system found in the literature.
Article
Parkinson's disease (PD) is a neurodegenerative disorder. Hence, there is a tremendous demand for adapting vocal features to determine PD in an earlier stage. This paper devises a technique to diagnose PD using voice signals. Initially, the voice signals are considered an input. The signal is fed to pre-processing wherein the filtering is adapted to remove noise. Thereafter, feature extraction is done that includes fluctuation index, spectral flux, spectral centroid, Mel frequency Cepstral coefficient (MFCC), spectral spread, tonal power ratio, spectral kurtosis and the proposed Exponential delta-Amplitude modulation signal (delta-AMS). Here, exponential delta-amplitude modulation spectrogram (Exponential-delta AMS) is devised by combining delta-amplitude modulation spectrogram (delta-AMS) and exponential weighted moving average (EWMA). The feature selection is done considering the extracted features using the proposed squirrel search water algorithm (SSWA), which is devised by combining Squirrel search algorithm (SSA) and water cycle algorithm (WCA). The fitness is newly devised considering Canberra distance. Finally, selected features are fed to attention-based long short-term memory (attention-based LSTM) in order to identify the existence of PD. Here, the training of attention-based LSTM is performed with developed SSWA. The proposed SSWA-based attention-based LSTM offered enhanced performance with 92.5% accuracy, 95.4% sensitivity and 91.4% specificity.
Article
Full-text available
Parkinson’s disease (PD) is an aging neurological disease deficiencies dopamine and occupies the second position among theneurological disease after the Alzheimer’s in the world. The identification of PD in the early stage is extremely advanced andexpensive. Many researchers investigated on PD in divergent ways and different approaches to identifying the PD in the earlystage with low cost. One of the effective approaches such as PD voice analysis is an important topic in the current decade.In this paper, a novel probabilistic neural network-based approach is proposed for analyzing the PD. The major objective ofthis paper is to develop a highly accurate probabilistic neural network-based intelligent approach for the identification andclassification of PD diseases. The inputs are considered as 1200 sound records as vowel vocalizations ‘a’, ‘e’, ‘i’, ‘o’, and‘u’ in different timings (morning, mid-day, and night) of the day from 62 PD and 51 non-PD individuals. From the experi-mental analysis, it is evident that the performance of the dataset with PNN is increased proportionally to the incrementalneurons in the hidden layer of PNN up to seven and it is found 100% accuracy with minimum time and gradient values. Theprojected PNN model with seven hidden layer neurons is a very powerful tool for predicting the PD in early detections withminimum cost. Comparative analysis with other standard machine learning approaches is evident towards the superiority ofthe proposed PNN model performance for successful identification of PD through voice analysis (PDF) Vital2021 Article ProbabilisticNeuralNetwork-bas. Available from: https://www.researchgate.net/publication/362555245_Vital2021_Article_ProbabilisticNeuralNetwork-bas [accessed Sep 27 2023].
Conference Paper
Full-text available
This paper presents the analysis and classification of Parkinson disease. When a people suffering from Parkinson disease their vocal fold and vocal tract is affected severely and thus speech characteristics are alter during phonation. In this paper variational mode decomposition (VMD) is used for extracting relevant information of speech signal. VMD decomposes the speech signal into modes or sub signal. Various statistical features (mean, variance, skewness and kurtosis), energy and energy entropy are used for Parkinson disease detection. From the experiment, VMD based feature outperforms the Mel cepstral coefficient (MFCC). The proposed feature shows the classification accuracy 96.29%.
Article
Full-text available
This work explores the effectiveness of the Intrinsic Mode Functions (IMFs) of the speech signal, in estimating its Glottal Closure Instants (GCIs). The IMFs of the speech signal, which are its AM–FM or oscillatory components, are obtained from two similar nonlinear and non-stationary signal analysis techniques—Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN), and Modified Empirical Mode Decomposition (MEMD). Both these techniques are advanced variants of the original technique—Empirical Mode Decomposition (EMD). MEMD is much faster than ICEEMDAN, whereas the latter curtails mode-mixing (a drawback of EMD) more effectively. It is observed that the partial summation of a certain subset of the IMFs results in a signal whose minima are aligned with the GCIs. Based on this observation, two different methods are devised for estimating the GCIs from the IMFs of ICEEMDAN and MEMD. The two methods are captioned ICEEMDAN-based GCIs Estimation (IGE) and MEMD-based GCIs Estimation (MGE). The results reveal that IGE and MGE provide consistent and reliable estimates of the GCIs, compared to the state-of-the-art methods, across different scenarios—clean, noisy, and telephone channel conditions.
Article
Full-text available
This study investigates signals from sustained phonation and text-dependent speech modalities for Parkinson’s disease screening. Phonation corresponds to the vowel /a/ voicing task and speech to the pronunciation of a short sentence in Lithuanian language. Signals were recorded through two channels simultaneously, namely, acoustic cardioid (AC) and smart phone (SP) microphones. Additional modalities were obtained by splitting speech recording into voiced and unvoiced parts. Information in each modality is summarized by 18 well-known audio feature sets. Random forest (RF) is used as a machine learning algorithm, both for individual feature sets and for decision-level fusion. Detection performance is measured by the out-of-bag equal error rate (EER) and the cost of log-likelihood-ratio. Essentia audio feature set was the best using the AC speech modality and YAAFE audio feature set was the best using the SP unvoiced modality, achieving EER of 20.30% and 25.57%, respectively. Fusion of all feature sets and modalities resulted in EER of 19.27% for the AC and 23.00% for the SP channel. Non-linear projection of a RF-based proximity matrix into the 2D space enriched medical decision support by visualization.
Article
Full-text available
Aim: The research described is intended to give a description of articulation dynamics as a correlate of the kinematic behavior of the jaw-tongue biomechanical system, encoded as a probability distribution of an absolute joint velocity. This distribution may be used in detecting and grading speech from patients affected by neurodegenerative illnesses, as Parkinson Disease. Hypothesis: The work hypothesis is that the probability density function of the absolute joint velocity includes information on the stability of phonation when applied to sustained vowels, as well as on fluency if applied to connected speech. Methods: A dataset of sustained vowels recorded from Parkinson Disease patients is contrasted with similar recordings from normative subjects. The probability distribution of the absolute kinematic velocity of the jaw-tongue system is extracted from each utterance. A Random Least Squares Feed-Forward Network (RLSFN) has been used as a binary classifier working on the pathological and normative datasets in a leave-one-out strategy. Monte Carlo simulations have been conducted to estimate the influence of the stochastic nature of the classifier. Two datasets for each gender were tested (males and females) including 26 normative and 53 pathological subjects in the male set, and 25 normative and 38 pathological in the female set. Results: Male and female data subsets were tested in single runs, yielding equal error rates under 0.6% (Accuracy over 99.4%). Due to the stochastic nature of each experiment, Monte Carlo runs were conducted to test the reliability of the methodology. The average detection results after 200 Montecarlo runs of a 200 hyperplane hidden layer RLSFN are given in terms of Sensitivity (males: 0.9946, females: 0.9942), Specificity (males: 0.9944, females: 0.9941) and Accuracy (males: 0.9945, females: 0.9942). The area under the ROC curve is 0.9947 (males) and 0.9945 (females). The equal error rate is 0.0054 (males) and 0.0057 (females). Conclusions: The proposed methodology avails that the use of highly normalized descriptors as the probability distribution of kinematic variables of vowel articulation stability, which has some interesting properties in terms of information theory, boosts the potential of simple yet powerful classifiers in producing quite acceptable detection results in Parkinson Disease.
Conference Paper
Full-text available
Neurodegenerative syndromes such as Parkinson’s disease usually lead to speech impairments. Reduced intelligibility of spoken language is treatable with Speech and Language Therapy. A successful speech therapy implements the principles of frequency, intensity and repetition. Consequently, patients need to be highly motivated for the exercises to keep up with their training. We argue that game-based technology are prone to support patients in partaking in a self-sustained high frequency training. Furthermore, studies demonstrate that game-based interventions have the potential to enhance motivation for rehabilitative exercising in patients with neurological disorders. Building on these insights we apply successful principles of gamification to enhance impaired speech in patients with neurogenerative syndromes. With the ISi-Speech project (‘Individualisierte Spracherkennung in der Rehabilitation für Menschen mit Beeinträchtigung in der Sprechverständlichkeit’ (in German) [individual speech recognition in therapy for people with motor speech disorders]) we further integrate psychological motivation theory (self-determination) and user driven design into the developmental process of a rehabilitation tool for patients with Parkinson’s disease.
Article
Full-text available
This work explores the utility of the time-domain signal components, or the Intrinsic Mode Functions (IMFs), of speech signals’, as generated from the data-adaptive filterbank nature of Empirical Mode Decomposition (EMD), in characterizing speakers for the task of text-independent Speaker Verification (SV). A modified version of EMD, denoted as MEMD, which extracts IMFs with lesser mode-mixing, and provides a better representation of the higher frequency spectrum of speech, is also utilized for the SV task. Three different features are extracted over 20 ms frames, from the IMFs of EMD and MEMD. They are, then, tested individually, and in conjunction with the Mel Frequency Cepstral Coefficients (MFCCs), for SV. Two corpora - the NIST SRE 2003 corpus, and the CHAINS corpus - are used for the experiments. The results evaluated on the NIST SRE 2003 database, using the i-vector framework, reveal that the features extracted from the IMFs, in conjunction with the MFCCs, enhances the performance of the SV system. Further, it is observed that only a small set of lower-order IMFs is useful and necessary for characterizing speaker-specific information. The combination of the features with the MFCCs is also found to be useful when short speech utterances of ≤ 10 s are used for testing. Similarly, the results evaluated on the CHAINS corpus, using the conventional Gaussian Mixture Model (GMM) framework, reveal that the features, in combination with the MFCCs, enhance the performance of the SV system, not only for normal speech, but also for fast and whispered speech. Again, it is observed that only the first few IMFs are needed and useful for achieving such enhanced performance.
Article
Full-text available
A system that is capable of automatically discriminating healthy people from people with Parkinson’s Disease (PD) from speech recordings is proposed. It is initially based on 27 features, extracted from recordings of sustained vowels. The number of characteristics has been further reduced by feature selection. The system has been tested by using a heterogeneous database, composed of 40 control subjects and 40 subjects with PD belonging to different severity stages of the disease and under prescribed treatment. Repeated measures per individual were averaged before being assigned to subject, avoiding the usual practice of considering measurements within the same subject as independent. The best overall accuracy obtained was 85.25%, with a sensitivity of 90.23% and a specificity of 80.28%. Additionally, a pilot experiment to track PD severity stages has been performed on 32 out of the 40 initial subjects with PD. To the authors’ knowledge, this is the first speech-based experiment on automatic PD tracking by using the Hoehn and Yahr’s scale (clinical metric mainly focused on postural instability). The results suggest that progression of voice impairment follows different developmental trajectories than postural instability, implying different degenerative mechanisms.
Conference Paper
Full-text available
Parkinson's disease (PD) is a neurodegenerative disorder that is characterized by the loss of dopaminergic neurons in the mid brain. It is demonstrated that about 90% of the people with PD also develop speech impairments, exhibiting symptoms such as monotonic speech, low pitch intensity, inappropriate pauses, imprecision in consonants and problems in prosody; although they are already identify problems, only 3% to 4% of the patients receive speech therapy. The research community has addressed the problem of the automatic detection of PD by means of noise measures; however, in such works only the phonation of the English vowel /a/ has been considered. In this paper, the five Spanish vowels uttered by 50 people with PD and 50 healthy controls (HC) are evaluated automatically considering a set of four noise measures: Harmonics to Noise Ratio (HNR), Normalized Noise Energy (NNE), Cepstral HNR (CHNR) and Glottal to Noise Excitation Ratio (GNE). The decision on whether a speech recording is from a person with PD or from a HC is taken by a K nearest neighbors (k-NN) classifier, finding an accuracy of 66.57% when only the vowel /i/ is considered.
Article
Full-text available
70% to 90% of patients with Parkinson's disease (PD) show an affected voice. Various studies revealed, that voice and prosody is one of the earliest indicators of PD. The issue of this study is to automatically detect whether the speech/voice of a person is affected by PD. We employ acoustic features, prosodic features and features derived from a two-mass model of the vocal folds on different kinds of speech tests: sustained phonations, syllable repetitions, read texts and monologues. Classification is performed in either case by SVMs. A correlation-based feature selection was performed, in order to identify the most important features for each of these systems. We report recognition results of 91% when trying to differentiate between normal speaking persons and speakers with PD in early stages with prosodic modeling. With acoustic modeling we achieved a recognition rate of 88% and with vocal modeling we achieved 79%. After feature selection these results could greatly be improved. But we expect those results to be too optimistic. We show that read texts and monologues are the most meaningful texts when it comes to the automatic detection of PD based on articulation, voice, and prosodic evaluations. The most important prosodic features were based on energy, pauses and F0. The masses and the compliances of spring were found to be the most important parameters of the two-mass vocal fold model.
Chapter
Full-text available
Empirical Mode Decomposition is a data driven technique proposed by Huang. In this work, we explore spectral properties of the intrinsic mode functions and apply them to speech signals corresponding to real and simulated sustained vowels. For the synthetic sustained vowels we propose a phonation model that includes perturbations implied in common laryngeal pathologies. We extract features from each signal using the Burg’s standard spectral analysis of their intrinsic mode functions. Due to its well-known theoretical properties, the classic K-nearest neighbor’s classification rule is applied to real and synthetic data. We show that even using this basic pattern classification algorithm, the selected spectral features of only three intrinsic mode functions are enough to discriminate between normal and pathological voices. We have obtained a 99.00% of correct classifications between normal and pathological synthetic voices (K=1, sensitivity=0.990, specificity=0.990); while in the case of real voices the percentage of correct classification was 93.40% (K=3, sensitivity=0.925, specificity=0.926). These results strongly suggest that spectral properties of Empirical Mode Decomposition provide useful discriminative information for this task. Additionally we consider two pathologies of different etiology and treatment, which, given the similarity of their voice characteristics, are frequently misdiagnosed in clinical practice: muscular tension dysphonia and adductor spasmodic dysphonia. Preliminary results with a reduced real data base suggest that this approach could provide useful orientation to physicians and voice pathologists.
Article
Full-text available
We present an assessment of the practical value of existing traditional and non-standard measures for discriminating healthy people from people with Parkinson's disease (PD) by detecting dysphonia. We introduce a new measure of dysphonia, Pitch Period Entropy (PPE), which is robust to many uncontrollable confounding effects including noisy acoustic environments and normal, healthy variations in voice frequency. We collected sustained phonations from 31 people, 23 with PD. We then selected 10 highly uncorrelated measures, and an exhaustive search of all possible combinations of these measures finds four that in combination lead to overall correct classification performance of 91.4%, using a kernel support vector machine. In conclusion, we find that non-standard methods in combination with traditional harmonics-to-noise ratios are best able to separate healthy from PD subjects. The selected non-standard methods are robust to many uncontrollable variations in acoustic environment and individual subjects, and are thus well-suited to telemonitoring applications.
Article
Full-text available
An assessment of vocal impairment is presented for separating healthy people from persons with early untreated Parkinson's disease (PD). This study's main purpose was to (a) determine whether voice and speech disorder are present from early stages of PD before starting dopaminergic pharmacotherapy, (b) ascertain the specific characteristics of the PD-related vocal impairment, (c) identify PD-related acoustic signatures for the major part of traditional clinically used measurement methods with respect to their automatic assessment, and (d) design new automatic measurement methods of articulation. The varied speech data were collected from 46 Czech native speakers, 23 with PD. Subsequently, 19 representative measurements were pre-selected, and Wald sequential analysis was then applied to assess the efficiency of each measure and the extent of vocal impairment of each subject. It was found that measurement of the fundamental frequency variations applied to two selected tasks was the best method for separating healthy from PD subjects. On the basis of objective acoustic measures, statistical decision-making theory, and validation from practicing speech therapists, it has been demonstrated that 78% of early untreated PD subjects indicate some form of vocal impairment. The speech defects thus uncovered differ individually in various characteristics including phonation, articulation, and prosody.
Article
Parkinson's disease (PD) is a neurodegenerative disease that affects millions of people worldwide, causing mental and mainly motor dysfunctions. The negative impact on the patient's daily routine has moved the science in search of new techniques that can reduce its negative effects and also identify the disease in individuals. One of the main motor characteristics of PD is the hand tremor faced by patients, which turns out to be a crucial information to be used towards a computer-aided diagnosis. In this context, we make use of handwriting dynamics data acquired from individuals when submitted to some tasks that measure abilities related to writing skills. This work proposes the application of recurrence plots to map the signals onto the image domain, which are further used to feed a Convolutional Neural Network for learning proper information that can help the automatic identification of PD. The proposed approach was assessed in a public dataset under several scenarios that comprise different combinations of deep-based architectures, image resolutions, and training set sizes. Experimental results showed significant accuracy improvement compared to our previous work with an average accuracy of over 87%. Moreover, it was observed an improvement in accuracy concerning the classification of patients (i.e., mean recognition rates above to 90%). The promising results showed the potential of the proposed approach toward the automatic identification of Parkinson's disease.
Article
The prevalence of speech disorders among individuals with Parkinson's disease (PD) has been reported to be as high as 89%. Speech impairment in PD results from a combination of motor and nonmotor deficits. The production of speech depends upon the coordination of various motor activities: respiration, phonation, articulation, resonance and prosody. A speech disorder is defined as impairment in any of its inter-related components. Despite the high prevalence of speech disorders in PD, only 3-4% receive speech treatment. Treatment modalities include pharmacological intervention, speech therapy, surgery, deep brain stimulation and vocal fold augmentation. Although management of Parkinsonian dysarthria is clinically challenging, speech treatment in PD should be part of a multidisciplinary approach to patient care in this disease.
Article
Purpose: This study compared the information content and information efficiency of spoken language in individuals with Parkinson's disease (PD) to a healthy comparator group. Method: Nineteen participants with PD and 19 healthy older adults completed the prospective, cross-sectional study. In the primary analysis, 2 language samples elicited by standardized protocols were analyzed for group differences using standard discourse informativeness measures including main events (MEs; Wright, Capilouto, Wagovich, Cranfill, & Davis, 2005) analyzed as %MEs and correct information units (CIUs; Nicholas & Brookshire, 1993) analyzed as %CIUs and CIUs/min. In exploratory analyses, the following were examined: (a) associations among conceptual (%MEs) and lexical (%CIUs and CIUs/min) measures and (b) associations among informativeness measures and age, education, disease severity/duration, global cognition, speech intelligibility, and a verb confrontation naming measure. Results: In the primary analysis, the PD group differed significantly from the control group on conceptual (%MEs) and lexical measures of content (%CIUs) and efficiency (CIUs/min). In exploratory analyses, for the control group %MEs were significantly correlated with CIUs/min. Significant associations among conceptual and lexical measures of informativeness were not found in the PD group. For controls, there were no significant correlations between informativeness measures and any of the demographic or speech/cognitive/language variables. In the PD group, there was a significant and positive association between CIUs/min and Dementia Rating Scale-Second Edition scores (Mattis, 2001). A significant but negative correlation was found between CIUs/min and motor severity scores. However, %MEs and verb naming were significantly and positively correlated. Conclusions: Individuals with PD without dementia demonstrated reduced discourse informativeness that reflects disruptions to both conceptual and lexical discourse processes. In exploratory analyses, reduced efficiency of information content was associated with global cognition and motor severity. Clinical and research implications are discussed within a Cognitivist framework of discourse production (Sheratt, 2007).
Article
Background and objective: In this work, we present a systematic review concerning the recent enabling technologies as a tool to the diagnosis, treatment and better quality of life of patients diagnosed with Parkinson's Disease (PD), as well as an analysis of future trends on new approaches to this end. Methods: In this review, we compile a number of works published at some well-established databases, such as Science Direct, IEEEXplore, PubMed, Plos One, Multidisciplinary Digital Publishing Institute (MDPI), Association for Computing Machinery (ACM), Springer and Hindawi Publishing Corporation. Each selected work has been carefully analyzed in order to identify its objective, methodology and results. Results: The review showed the majority of works make use of signal-based data, which are often acquired by means of sensors. Also, we have observed the increasing number of works that employ virtual reality and e-health monitoring systems to increase the life quality of PD patients. Despite the different approaches found in the literature, almost all of them make use of some sort of machine learning mechanism to aid the automatic PD diagnosis. Conclusions: The main focus of this survey is to consider computer-assisted diagnosis, and how effective they can be when handling the problem of PD identification. Also, the main contribution of this review is to consider very recent works only, mainly from 2015 and 2016.
Article
This paper presents an optimized cuttlefish algorithm for feature selection based on the traditional cuttlefish algorithm, which can be used for diagnosis of Parkinson's disease at its early stage. Parkinson is a central nervous system disorder, caused due to the loss of brain cells. Parkinson's disease is incurable and could eventually lead to death but medications can help to control symptoms and elongate the patient's life to some extent. The proposed model uses the traditional cuttlefish algorithm as a search strategy to ascertain the optimal subset of features. The decision tree and k-nearest neighbor classifier as a judgment on the selected features. The Parkinson speech with multiple types of sound recordings and Parkinson Handwriting sample's datasets are used to evaluate the proposed model. The proposed algorithm can be used in predicting the Parkinson's disease with an accuracy of approximately 94% and help individual to have proper treatment at early stage. The experimental result reveals that the proposed bio-inspired algorithm finds an optimal subset of features, maximizing the accuracy, minimizing number of features selected and is more stable.
Article
Symptoms of Parkinson's disease vary from patient to patient. Additionally, the progression of those symptoms also differs among patients. Most of the studies on the analysis of speech of people with Parkinson's disease do not consider such an individual variation. This paper presents a methodology for the automatic and individual monitoring of speech disorders developed by PD patients. The neurological state and dysarthria level of the patients are evaluated. The proposed system is based on individual speaker models which are created for each patient. Two different models are evaluated, the classical GMM–UBM and the i–vectors approach. These two methods are compared with respect to a baseline found with a traditional Support Vector Regressor. Different speech aspects (phonation, articulation, and prosody) are considered to model recordings of spontaneous speech and a read text. A multi-aspect coefficient is proposed with the aim of incorporating information from all of these speech aspects into a single measure. Two different scenarios are considered to assess a set with seven PD patients: (1) the longitudinal test set which consists of speech recordings captured in five recording sessions distributed from 2012 to 2016, and (2) the at-home test set which consists of speech recordings captured in the home of the same seven patients during 4 months (one day per month, four times per day). The UBM is trained with the recordings of 100 speakers (50 with Parkinson's disease and 50 healthy speakers) captured with controlled acoustic conditions and a professional audio-setting. With the aim of evaluating the suitability of the proposed approaches and the possibility of extending this kind of systems to remotely assess the speech of the patients, a total of five different communication channels (sound-proof booth, Skype®, Hangouts®, mobile phone, and land-line) are considered to train and test the system. Due to the reduced number of recording sessions in the longitudinal test set, the experiments that involved this set are evaluated with the Pearson's correlation. The experiments with the at-home test set are evaluated with the Spearman's correlation. The results estimating the dysarthria level of the patients in the at-home test set indicate a correlation of 0.55 with a modified version of the Frenchay Dysarthria Assessment scale when the GMM-UBM model is applied upon the Skype® recordings. The results in the longitudinal test set indicate a correlation of 0.77 using a model based on i-vectors with recordings captured in the sound-proof-booth. The evaluation of the neurological state of the patients in the longitudinal test set shows correlations of up to 0.55 with the Movement Disorder Society - Unified Parkinson's Disease Rating Scale also using models based on i-vectors created with Skype® recordings. These results suggest that the i–vector approach is suitable when the acoustic conditions among recording sessions differ (longitudinal test set). The GMM-UBM approach seems to be more suitable when the acoustic conditions do not change a lot among recording sessions (at-home test set). Particularly, the best results were obtained with the Skype® calls, which can be explained due to several preprocessing stages that this codec applies to the audio signals. In general, the results suggest that the proposed approaches are suitable for tele-monitoring the dysarthria level and the neurological state of PD patients.
Article
Diagnosis of Parkinson's disease at its early stage is important in proper treatment of the patients so they can lead productive lives for as long as possible. Although many techniques have been proposed to diagnose the Parkinson's disease at an early stage but none of them are efficient. In this work, to improve the diagnosis of Parkinson's disease, we have introduced a novel improved and optimized version of crow search algorithm(OCSA). The proposed OCSA can be used in predicting the Parkinson's disease with an accuracy of 100% and help individual to have proper treatment at early stage. The performance of OCSA has been measured for 20 benchmark datasets and the results have been compared with the original chaotic crow search algorithm(CCSA). The experimental result reveals that the proposed nature-inspired algorithm finds an optimal subset of features, maximizing the accuracy and minimizing a number of features selected and is more stable.
Article
A study is presented analyzing tremor in the voice of speakers that were diagnosed with Parkinson’s disease (PD). The examined sounds are sustained /a/s, originating from a large dysarthric speech corpus. Six measures of vocal tremor are extracted from these vowels by applying a self-developed algorithm that is based on autocorrelation of contours and implemented as a script of an open-source speech analysis program. Univariate analyses of covariance reveal significantly raised tremor magnitudes (tremor intensity indices and tremor power indices) in PD speakers off medication as compared to a control group as well as within PD speakers in off medication condition as compared to on medication. No significant differences are found between the control group and PD speakers on medication as well as for tremor frequencies. However, the greater part of variance in tremor measures is always accounted for the speakers’ age.
Article
Background and objective: Parkinson's disease (PD) is considered a degenerative disorder that affects the motor system, which may cause tremors, micrography, and the freezing of gait. Although PD is related to the lack of dopamine, the triggering process of its development is not fully understood yet. Methods: In this work, we introduce convolutional neural networks to learn features from images produced by handwritten dynamics, which capture different information during the individual's assessment. Additionally, we make available a dataset composed of images and signal-based data to foster the research related to computer-aided PD diagnosis. Results: The proposed approach was compared against raw data and texture-based descriptors, showing suitable results, mainly in the context of early stage detection, with results nearly to 95%. Conclusions: The analysis of handwritten dynamics using deep learning techniques showed to be useful for automatic Parkinson's disease identification, as well as it can outperform handcrafted features.
Article
The diagnosis of Parkinson's Disease is a challenging task which might be supported by new tools to objectively evaluate the presence of deviations in patient's motor capabilities. To this respect, the dysarthric nature of patient's speech has been exploited in several works to detect the presence of this disease, but none of them has deeply studied the use of state-of-the-art speaker recognition techniques for this task. In this paper, two classification schemes (GMM-UBM and i-Vectors-GPLDA) are employed separately with several parameterization techniques, namely PLP, MFCC and LPC. Additionally, the influence of the kinetic changes, described by their derivatives, is analysed. With the proposed methodology, an accuracy of 87% with an AUC of 0.93 is obtained in the optimal configuration. These results are comparable to those obtained in other works employing speech for Parkinson's Disease detection and confirm that the selected speaker recognition techniques are a solid baseline to compare with future works. Results suggest that Rasta-PLP is the most reliable parameterization for the proposed task among all the tested features while the two employed classification schemes perform similarly. Additionally, results confirm that kinetic changes provide a substantial performance improvement in Parkinson's Disease automatic detection systems and should be considered in the future.
Article
Parkinson's Disease (PD) is a progressive degenerative disease of the nervous system that affects movement control. Unified Parkinson's Disease Rating Scale (UPDRS) is the baseline assessment for PD. UPDRS is the most widely used standardized scale to assess parkinsonism. Discovering the relationship between speech signal properties and UPDRS scores is an important task in PD diagnosis. Supervised machine learning techniques have been extensively used in predicting PD through a set of datasets. However, the most methods developed by supervised methods do not support the incremental updates of data. In addition, the standard supervised techniques cannot be used in an incremental situation for disease prediction and therefore they require to recompute all the training data to build the prediction models. In this paper, we take the advantages of an incremental machine learning technique, Incremental support vector machine, to develop a new method for UPDRS prediction. We use Incremental support vector machine to predict Total-UPDRS and Motor-UPDRS. We also use Non-linear iterative partial least squares for data dimensionality reduction and self-organizing map for clustering task. To evaluate the method, we conduct several experiments with a PD dataset and present the results in comparison with the methods developed in the previous research. The prediction accuracies of method measured by MAE for the Total-UPDRS and Motor-UPDRS were obtained respectively MAE = 0.4656 and MAE = 0.4967. The results of experimental analysis demonstrated that the proposed method is effective in predicting UPDRS. The method has potential to be implemented as an intelligent system for PD prediction in healthcare.
Article
The aim of this study is the analysis of continuous speech signals of people with Parkinson's disease (PD) considering recordings in different languages (Spanish, German, and Czech). A method for the characterization of the speech signals, based on the automatic segmentation of utterances into voiced and unvoiced frames, is addressed here. The energy content of the unvoiced sounds is modeled using 12 Mel-frequency cepstral coefficients and 25 bands scaled according to the Bark scale. Four speech tasks comprising isolated words, rapid repetition of the syllables /pa/-/ta/-/ka/, sentences, and read texts are evaluated. The method proves to be more accurate than classical approaches in the automatic classification of speech of people with PD and healthy controls. The accuracies range from 85% to 99% depending on the language and the speech task. Cross-language experiments are also performed confirming the robustness and generalization capability of the method, with accuracies ranging from 60% to 99%. This work comprises a step forward for the development of computer aided tools for the automatic assessment of dysarthric speech signals in multiple languages.
Article
About 1% of people older than 65 years suffer from Parkinson's disease (PD) and 90% of them develop several speech impairments, affecting phonation, articulation, prosody and fluency. Computer-aided tools for the automatic evaluation of speech can provide useful information to the medical experts to perform a more accurate and objective diagnosis and monitoring of PD patients and can help also to evaluate the correctness and progress of their therapy. Although there are several studies that consider spectral and cepstral information to perform automatic classification of speech of people with PD, so far it is not known which is the most discriminative, spectral or cepstral analysis. In this paper, the discriminant capability of six sets of spectral and cepstral coefficients is evaluated, considering speech recordings of the five Spanish vowels and a total of 24 isolated words. According to the results, linear predictive cepstral coefficients are the most robust and exhibit values of the area under the receiver operating characteristic curve above 0.85 in 6 of the 24 words.
Article
Although articulatory deficits represent an important manifestation of dysarthria in Parkinson’s disease (PD), the most widely used methods currently available for the automatic evaluation of speech performance are focused on the assessment of dysphonia. The aim of the present study was to design a reliable automatic approach for the precise estimation of articulatory deficits in PD. Twenty-four individuals diagnosed with de novo PD and twenty-two age-matched healthy controls were recruited. Each participant performed diadochokinetic tasks based upon the fast repetition of /pa/-/ta/-/ka/ syllables. All phonemes were manually labeled and an algorithm for their automatic detection was designed. Subsequently, 13 features describing six different articulatory aspects of speech including vowel quality, coordination of laryngeal and supralaryngeal activity, precision of consonant articulation, tongue movement, occlusion weakening, and speech timing were analyzed. In addition, a classification experiment using a support vector machine based on articulatory features was proposed to differentiate between PD patients and healthy controls. The proposed detection algorithm reached approximately 80% accuracy for a 5 ms threshold of absolute difference between manually labeled references and automatically detected positions. When compared to controls, PD patients showed impaired articulatory performance in all investigated speech dimensions ( $p < 0.05$). Moreover, using the six features representing different aspects of articulation, the best overall classification result attained a success rate of 88% in separating PD from controls. Imprecise consonant articulation was found to be the most powerful indicator of PD-related dysarthria. We envisage our approach as the first step towards development of acoustic methods allowing the automated assessment of articulatory features in dysarthrias.
Book
Introduction to Audio Analysis serves as a standalone introduction to audio analysis, providing theoretical background to many state-of-the-art techniques. It covers the essential theory necessary to develop audio engineering applications, but also uses programming techniques, notably MATLAB®, to take a more applied approach to the topic. Basic theory and reproducible experiments are combined to demonstrate theoretical concepts from a practical point of view and provide a solid foundation in the field of audio analysis. Audio feature extraction, audio classification, audio segmentation, and music information retrieval are all addressed in detail, along with material on basic audio processing and frequency domain representations and filtering. Throughout the text, reproducible MATLAB® examples are accompanied by theoretical descriptions, illustrating how concepts and equations can be applied to the development of audio analysis systems and components. A blend of reproducible MATLAB® code and essential theory provides enable the reader to delve into the world of audio signals and develop real-world audio applications in various domains.
Article
This paper introduces a novel approach, Cepstral Separation Difference (CSD), for quantification of speech impairment in Parkinson’s disease (PD). CSD represents a ratio between the magnitudes of glottal (source) and supra-glottal (filter) log-spectrums acquired using the source-filter speech model. The CSD-based features were tested on a database consisting of 240 clinically rated running speech samples acquired from 60 PD patients and 20 healthy controls. The Guttmann (µ2) monotonic correlations between the CSD features and the speech symptom severity ratings were strong (up to 0.78). This correlation increased with the increasing textual difficulty in different speech tests. CSD was compared with some non-CSD speech features (harmonic ratio, harmonic-to-noise ratio and Mel-frequency cepstral coefficients) for speech symptom characterization in terms of consistency and reproducibility. The high intra-class correlation coefficient (>0.9) and analysis of variance indicates that CSD features can be used reliably to distinguish between severity levels of speech impairment. Results motivate the use of CSD in monitoring speech symptoms in PD.
Article
Today, digital audio applications are part of our everyday lives. Audio classification can provide powerful tools for content management. If an audio clip automatically can be classified it can be stored in an organised database, which can improve the management of audio dramatically. In this paper, we propose effective algorithms to automatically classify audio clips into one of six classes: music, news, sports, advertisement, cartoon and movie. For these categories a number of acoustic features that include linear predictive coefficients, linear predictive cepstral coefficients and mel-frequency cepstral coefficients are extracted to characterize the audio content. The autoassociative neural network model (AANN) is used to capture the distribution of the acoustic feature vectors. The AANN model captures the distribution of the acoustic features of a class, and the backpropagation learning algorithm is used to adjust the weights of the network to minimize the mean square error for each feature vector. The proposed method also compares the performance of AANN with a Gaussian mixture model (GMM) wherein the feature vectors from each class were used to train the GMM models for those classes. During testing, the likelihood of a test sample belonging to each model is computed and the sample is assigned to the class whose model produces the highest likelihood.
Article
There has been considerable recent research into the connection between Parkinson's disease (PD) and speech impairment. Recently, a wide range of speech signal processing algorithms (dysphonia measures) aiming to predict PD symptom severity using speech signals have been introduced. In this paper, we test how accurately these novel algorithms can be used to discriminate PD subjects from healthy controls. In total, we compute 132 dysphonia measures from sustained vowels. Then, we select four parsimonious subsets of these dysphonia measures using four feature selection algorithms, and map these feature subsets to a binary classification response using two statistical classifiers: random forests and support vector machines. We use an existing database consisting of 263 samples from 43 subjects, and demonstrate that these new dysphonia measures can outperform state-of-the-art results, reaching almost 99% overall classification accuracy using only ten dysphonia features. We find that some of the recently proposed dysphonia measures complement existing algorithms in maximizing the ability of the classifiers to discriminate healthy controls from PD subjects. We see these results as an important step toward noninvasive diagnostic decision support in PD.
Conference Paper
Linear source-filter models have been widely used by researchers as a front-end for speaker identification systems. It uses the cepstral features derived from the power spectrum of the speech signal. But it is also well known that a significant part of the acoustic information cannot be modeled by the linear source-filter model, and thus, the need for nonlinear features becomes apparent. In this paper, an attempt is made to investigate the use of phase function in the analytic signal for deriving a representation of frequencies present in the speech signal. The main objective of the paper is to present a novel parameterization of speech that is based on the nonlinear AM-FM speaker model in the context of close-set speaker identification. The proposed features measure the amount of amplitude and frequency modulation and attempt to model aspects of the speaker related information that the commonly used linear source-filter model fails to capture. To evaluate the robustness of the proposed features for speaker identification, clean speech corpus from TIMIT database has been used and combined the speech signal with car noise and babble noise from the NOISEX-92 database. The proposed feature set provides significant improvements in the identification accuracy over the conventional method like MFCC under mismatched training and testing environments. The results show that better speaker identification rates are attainable under mismatched conditions especially at low signal-to-noise ratio (SNR).
Article
The objective of this letter is to demonstrate the complementary nature of speaker-specific information present in the residual phase in comparison with the information present in the conventional mel-frequency cepstral coefficients (MFCCs). The residual phase is derived from speech signal by linear prediction analysis. Speaker recognition studies are conducted on the NIST-2003 database using the proposed residual phase and the existing MFCC features. The speaker recognition system based on the residual phase gives an equal error rate (EER) of 22%, and the system using the MFCC features gives an EER of 14%. By combining the evidence from both the residual phase and the MFCC features, an EER of 10.5% is obtained, indicating that speaker-specific excitation information is present in the residual phase. This information is useful since it is complementary to that of MFCCs.
Speech analysis for diagnosis of Parkinson's disease using genetic algorithm and support vector machine
  • M Shahbakhi
  • D Taheri Far
  • E Tahami
Shahbakhi M, Taheri Far D, Tahami E. Speech analysis for diagnosis of Parkinson's disease using genetic algorithm and support vector machine. J Biomed Sci Eng 2014;147-56. http://dx.doi.org/10.4236/jbise.2014.7401
Voice analysis for detecting patients with Parkinson's disease using the hybridization of the best acoustic features
  • A Benba
  • A Jilbab
  • A Hammouch
Benba A, Jilbab A, Hammouch A. Voice analysis for detecting patients with Parkinson's disease using the hybridization of the best acoustic features. Int J Electr Eng Inform 2016. http://dx.doi.org/10.15676/ijeei.8.1.20168
Spectral and cepstral analyses for Parkinson's disease detection in Spanish vowels and words
  • J R Orozco-Arroyave
  • F Hönig
  • J D Arias-Londoño
  • J F Vargasbonilla
  • E Nöth
Orozco-Arroyave JR, Hönig F, Arias-Londoño JD, VargasBonilla JF, Nöth E. Spectral and cepstral analyses for Parkinson's disease detection in Spanish vowels and words. Expert Syst 2015;32:688-97. http://dx.doi.org/10.1111/exsy.12106
A Matlab toolbox for musical feature extraction from audio. International conference on digital audio effects
  • O Lartillot
  • P Toiviainen
Lartillot O, Toiviainen P. A Matlab toolbox for musical feature extraction from audio. International conference on digital audio effects; 2007.
Proceedings of NATO Advance Institute on Computational Hearing
  • P Cosi
Cosi P. Evidence against frame-based analysis techniques. 1998 Proceedings of NATO Advance Institute on Computational Hearing; 2009. p. 163-8. doi:10.1.1.39.4812.
A variable frame length and rate algorithm based on the spectral kurtosis measure for speaker verification
  • C Jung
  • K J Han
  • H Seo
  • S S Narayanan
  • H Kang
Jung C, Han KJ, Seo H, Narayanan SS, Kang H. A variable frame length and rate algorithm based on the spectral kurtosis measure for speaker verification. Perform Eval 2010;2754-7.
Voice fundamental frequency extraction algorithm based on ensemble empirical mode decomposition and entropies