Article

Parkinson disease prediction using intrinsic mode function based features from speech signal

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Abstract— Parkinson’s disease (PD) is a progressive neurological disorder prevalent in old age. Past researches have shown that speech can be used as an early marker for identification of PD. It affects a number of speech components such as phonation, speech intensity, articulation, and respiration, which alters the speech intelligibility. Speech feature extraction and classification always have been challenging tasks due to the existence of non-stationary and discontinuity in the speech signal. In this study, Empirical mode decomposition (EMD) based features are demonstrated to capture the mentioned characteristics. A new feature, intrinsic mode function cepstral coefficient (IMFCC) is proposed to efficiently represent the characteristics of Parkinson speech. The performances of proposed features are assessed with two different datasets: dataset1 and dataset 2 each having 20 normal and 25 Parkinson affected peoples. From the results, it is demonstrated that the proposed intrinsic mode function cepstral coefficient feature provides the superior classification accuracy of both datasets. There is a significant increase of 10-20 % in accuracy compared to the standard acoustic and Mel frequency cepstral coefficient (MFCC) features.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... EMD is a signal decomposition method that is not subject to the Heisenberg uncertainty principle and is particularly suitable for processing nonlinear and nonstationary signals [19]. It has been demonstrated that the intrinsic mode functions (IMFs) obtained after voice signal decomposition using EMD carry information about the vocal tract and vocal folds [25]. Karan et al. [25] proposed intrinsic mode function cepstral coefficient (IMFCC) based on EMD from sustained vowels to effectively characterize the PD patients' voice, improving the accuracy by 10% over MFCC-based features. ...
... It has been demonstrated that the intrinsic mode functions (IMFs) obtained after voice signal decomposition using EMD carry information about the vocal tract and vocal folds [25]. Karan et al. [25] proposed intrinsic mode function cepstral coefficient (IMFCC) based on EMD from sustained vowels to effectively characterize the PD patients' voice, improving the accuracy by 10% over MFCC-based features. However, traditional EMD algorithm suffers from mode mixing, end effects, sensitivity to noise, and lack of complete mathematical theory [19,26]. ...
... According to previous studies, we set the number k of IMFs obtained based on traditional EMD and VMD methods to 6 and 4, respectively [22,25,26]. To determine the CEEMDANbased k value, we divided the 120-sample dataset into training and test sets by 7 to 3, then used the support vector machine (SVM), random forest, and multilayer perceptron under the scikit-learn's default parameters to model the training set [31]. Figure 3 shows that both CEEMDAN-based HCCs and DMFCC features have the highest classification accuracy on the test set when k is 8. ...
... Sharma et al. demonstrated the dyadic characteristics of EMD features on AM-FM voice models (Sharma et al., 2017). Karan et al. applied EMD analysis on sustained vowel recordings from a subset of the subjects and demonstrated the dyadic characteristics of EMD features can improve the discrimination accuracy on sustained vowels by 10% (Karan et al., 2020b). Recently, Zhang et al. conducted IMF energy-based study on sustained /a/ (Zhang et al., 2021). ...
... The original IMF features included frequency, power, and estimated signal-to-noise ratio, as described in (Rueda and Krishnan, 2018). The new set of EMD features based on dyadic filterbank characteristics were added to capture the dyadic characteristic (Flandrin et al., 2004(Flandrin et al., , 2005Sharma et al., 2017;Karan et al., 2020b). These new features included zero-crossing rate represented by IMF center frequency, ratio of the adjacent center frequencies, and energy balancing ratio in equation (3). ...
Article
Empirical Mode Decomposition (EMD) was designed to analyse nonlinear and non-stationary signals. EMD voice analysis had been applied to Parkinson’s sustained vowels, but very limited studies have been done on highly dynamic Diadochokinesia (DDK) utterances. This paper applies the EMD’s dyadic filterbank characteristics to extract DDK features and an in-depth study on the efficacy of two segmentation strategies. The EMD analysis on DDK looks at the spectrum characteristics of Intrinsic Mode Functions (IMF) and the handling of mode mixing conditions. DDK recordings of Healthy Control (HC) subjects and patients with Parkinson’s disease (PD) were segmented using various fixed frame sizes compared to dynamic segmentation based on/pa-ta-ka/ triad length, and also the signal envelope as a whole. An overlapping windowing of 2/3 was used in the fixed frame size segmentation to augment and to capture the redundant and transition information. No overlapping was used in the/pa-ta-ka/ triad segmentation. For the fixed frame size segmentation, we found that there is a region of consistency. Within this region, the IMF center frequencies and bandwidths maintained the same but varied outside the region. The segmentation comparisons used a basic set of EMD features with and without DeltaEMD features that capture segment-to-segment deviations. Using the basic EMD dyadic features, fixed frame size segmentation out-performed/pa-ta-ka/ triad segmentation. When DeltaEMD features were added to provide segment deviation information,/pa-ta-ka/ triad out-performed fixed frame segmentation. Additional segment-magnitude amplification factor and segment length were found to improve the performance of the/pa-ta-ka/ triad segmentation. With the added features,/pa-ta-ka/ triad out-performed the others and had an improved accuracy of 78%. Additional features have also increased the envelope discrimination to 76%. The results also indicated the potentials of using voice envelopes for PD analysis.
... The authors included in [6] generated an accuracy of 94% while detecting PD. In contrast, Karan et al. [7] suggested empirical mode decomposition and extracting features from intrinsic mode function to efficiently describe PS features using ML algorithms: support vector machines (SVM) and random forest (RF) after employing a PS dataset [7]. Whereas, Solana-Lavalle et al. [8] implemented 8 to 20 wrapperbased feature selection approaches along with four classifiers: k-nearest neighbors (KNN), multi-layer perceptron, SVM, and RF to detect vocal-based PD after employing the PDC dataset in which SVM received the best performance in terms of accuracy value of 94.7% while detecting PD [8]. ...
... The authors included in [6] generated an accuracy of 94% while detecting PD. In contrast, Karan et al. [7] suggested empirical mode decomposition and extracting features from intrinsic mode function to efficiently describe PS features using ML algorithms: support vector machines (SVM) and random forest (RF) after employing a PS dataset [7]. Whereas, Solana-Lavalle et al. [8] implemented 8 to 20 wrapperbased feature selection approaches along with four classifiers: k-nearest neighbors (KNN), multi-layer perceptron, SVM, and RF to detect vocal-based PD after employing the PDC dataset in which SVM received the best performance in terms of accuracy value of 94.7% while detecting PD [8]. ...
Article
Full-text available
Parkinson's disease is one of the most prevalent neurodegenerative sicknesses distinguished by motor function impairment. Parkinson's disease (PD) diagnosis is a complicated job that demands the evaluation of numerous non-motor and motor signs. Throughout the analysis of vocal or speech abnormalities are notable indications that doctors should think. Early diagnosis of PD is essential for preliminary treatment and assisting the doctor to heal and evade the PD's spread in other brain cells and save several lives. So, this study introduces an adaptive expert diagnostic system to predict PD accurately. This suggested system proposes a hybrid methodology: two-stage mutual information and autoencoder-based dimensionality reduction approach with genetically optimized LightGBM (MI-AE-GOLGBM) algorithm, to improve the proposed system's performance and predict the best outcomes. The proposed MI-AE-GOLGBM approach comprises four methodologies: mutual information, autoencoder, genetic algorithm, and LightGBM algorithm, in which mutual information and autoencoder are implemented to form a two-stage dimensionality reduction approach for selecting the informative features from the input dataset and hence producing a reduced dataset with the most significant newly generated features, and genetic algorithm is employed to intelligently optimize the hyperparameters of LightGBM algorithm in which LightGBM algorithm utilizes such newly generated features and the best-optimized hyperparameters provided by the two-stage mutual information and autoencoder-based dimension reduction methods and the genetic algorithm, respectively, to which to classify the PD sufferers and healthy controls and enhance the precision value and reliability of the proposed system. Four different real-world publicly available Parkinson's disease datasets are employed in this proposed research to assess and verify the proposed methodology's performance. This proposed research utilizes different machine learning (ML) algorithms to compare our proposed approach's performance. The outcomes reveal that the proposed methodology can produce the best predictions based on voice data relating to the PD compared to the different ML algorithms.
... Empirical mode decomposition (EMD) [1] is a local, datadriven, and adaptive method in processing nonlinear and nonstationary signals and has been widely used in machinery, voice, geography, medicine, and other fields [2][3][4][5][6][7][8][9][10]. ...
... Almost all SDs show weak convergence of slow oscillation. Except for SD 8 ...
Article
Full-text available
Empirical mode decomposition (EMD) is an effective method to deal with nonlinear nonstationary data, but the lack of orthogonal decomposition theory and mode-mixing are the main problems that limit the application of EMD. In order to solve these two problems, we propose an improved method of EMD. The most important part of this improved method is to change the mean value by envelopes of signal in EMD to the mean value by the definite integral, which enables the mean value to be mathematically expressed strictly. Firstly, we prove that the signal is orthogonally decomposed by the improved method. Secondly, the Monte Carlo method of white noise is used to explain that the improved method can effectively alleviate mode-mixing. In addition, the improved method is adaptive and does not need any input parameters, and the intrinsic mode functions (IMFs) generated from it is robust to sifting. We have carried out experiments on a series of artificial and real data, the results show that the improved method is the orthogonal decomposition method and can effectively alleviate mode-mixing, and it has better decomposition performance and physical meaning than EMD, ensemble EMD (EEMD), and complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN). In addition, the improved method is generally more time-consuming than EMD, but far less than EEMD and CEEMDAN.
... They achieved an accuracy of 92.46 % using KNN classifier with vowels. Similarly, to characterize Parkinson's speech, an intrinsic mode function cepstral coefficient feature was used by Karan et al. [11] to classify healthy controls (HC) from PD patients. In addition, Sigcha et al. [12] used recurrent neural network (RNN) with a single waistworn triaxial accelerometer approach to detect freezing of gait in PD patients. ...
... The last classifier is random forest which works on the principle of unpruned decision trees. The aggregation of multitude of decision trees is considered as a forest with dependence of each tree on different random variables [11]. A total of 100 trees were used for random forest classifier analysis in this study. ...
Article
Parkinson's disease (PD), a neurodegenerative disorder characterized by rest tremors, muscular rigidity, and bradykinesia, has become a global health concern. Currently, a neurologist determines the diagnosis of Parkinson's disease by taking into account several factors. An automated decision-making system would enhance patient care and improve the outcomes for the patient. Biomarkers, such as electroencephalograms (EEGs), can aid in the diagnosis process as they have proven useful in detecting abnormalities in the brain. This study presents a novel algorithm for the automated diagnosis of Parkinson's disease from EEG signals using a flexible analytic wavelet transform (FAWT). First, these acquired EEG signals are preprocessed before decomposition into five frequency sub-bands based on the FAWT method. Several entropy parameters are computed from the decomposed sub-bands and ranked based on their significance level in detecting PD through analysis of variance (ANOVA). Various classifiers are used to identify appropriate feature sets, including support vector machines (SVM), logistics, random forests (RF), radial basis functions (RBF), and k-nearest neighbors (KNN). The proposed approach is evaluated using data collected from two centers in Malaysia (Dataset-I) and the United States (Dataset-II). In dataset-I, the KNN classifier produces accuracy, specificity, sensitivity, and area under the curve of 99%, 99.45%, 99.12%, and 0.991, respectively, while in dataset-II, these values are 95.85%, 95.88%, 96.14%, and 0.959. The proposed system would be extremely useful for neurologists during their diagnostic process, as well as for current clinical practices.
... Besides, patients in the early stage of Parkinson's disease might have speech disorders [12]. These abnormalities include dysphonia (poor verbal intelligibility), sounds monotonous (a small range of audio fluctuation), and hypophonia (vocal musculature dissonance) [9,13]. Except for the direct meaning of language, information from human's acoustical signals could be perceived and analyzed through computing [14]. ...
... Except for the direct meaning of language, information from human's acoustical signals could be perceived and analyzed through computing [14]. In consequence, it has been suggested that identifying PD by using speech classifications and marks is feasible [2,13]. ...
... IMFs have been explored in the past for Speech/Music Discrimination (SMD) task using statistical features [28]. Researchers have also found that IMFs contain different speech production information like formant tracking, glottal source information and vocal tract structure [24]. Earlier work evaluated different statistical and energy based features computed from IMFs of a signal for SMD task. ...
Article
Full-text available
Automatic Speech/Music classification uses different signal processing techniques to categorize multimedia content into different classes. The proposed work explores Hilbert Spectrum (HS) obtained from different AM-FM components of an audio signal, also called Intrinsic Mode Functions (IMFs) to classify an incoming audio signal into speech/music signal. The HS is a twodimensional representation of instantaneous energies (IE) and instantaneous frequencies (IF) obtained using Hilbert Transform of the IMFs. This HS is further processed using Mel-filter bank and Discrete Cosine Transform (DCT) to generate novel IF and Instantaneous Amplitude (IA) based cepstral features. Validations of the results were done using three databases-Slaney Database, GTZAN and MUSAN database. To evaluate the general applicability of the proposed features, extensive experiments were conducted on different combination of audio files from S&S, GTZAN and MUSAN database and promising results are achieved. Finally, performance of the system is compared with performance of existing cepstral features and previous works in this domain.
... Four classifiers k-NN, MLP, SVM, and Random Forest was the basis of the detection engine. Karan et al. (2020) proposed an SVM-based PD detection system where Intrinsic Mode Function Cepstral Coefficient (IMFCC) feature extraction is used to extract the most relevant feature for Parkinson's and Control patients classification. The authors validated their proposed approach through the two most widely used voice datasets. ...
Article
Full-text available
The progressive reduction of dopaminergic neurons in the human brain, especially at the substantia nigra is one of the principal causes of Parkinson’s Disease (PD). Voice alteration is one of the earliest symptoms found in PD patients. Therefore, the impaired PD subjects’ acoustic voice signal plays a crucial role in detecting the presence of Parkinson's. This manuscript presents four distinct decision tree ensemble methods of PD detection on a trailblazing ForEx++ rule-based framework. The Systematically Developed Forest (SysFor) and a Penalizing Attributes Decision Forest (ForestPA) ensemble approaches has been used for PD detection. The proposed detection schemes efficiently identify positive subjects using primary voice signal features, viz . , baseline, vocal fold, and time–frequency. A novel feature selection scheme termed Feature Ranking to Feature Selection (FRFS) has also been proposed to combine filter and wrapper strategies. The proposed FRFS scheme encompasses Gel’s normality test to rank and selects outstanding features from baseline, time–frequency, and vocal fold feature groups. The SysFor and ForestPA decision forests underneath the ForEx++ rule-based framework on both FRFS feature ranking and subset selection represents Parkinson’s detection approaches, which expedite a better overall impact on segregating PD from control subjects. It has been observed that the ForestPA decision forest in the ForEx++ framework on FRFS ranked features proved to be a robust Parkinson’s detection scheme. The proposed models deliver the highest accuracy of 94.12% and a lowest mean absolute error of 0.25, resulting in an Area Under Curve (AUC) value of 0.97.
... Sakar et al. [18] concluded that sustained vowels are more suitable in making PD prediction model. Bolanos et al [19] evaluated noise measure-based features for classification of PD from healthy using k-nearest neighbor (k-NN) classifier and obtained an accuracy of 66.57% using vowel /i/.Recently Karan et al. [20] proposed a PD detection system using empirical mode decomposition and Support vector machine classifier and obtained 96% accuracy. Arroyave et al. [21] presented a paper on spectral and cepstral features for Parkinson's disease identification in the Spanish language using five Spanish vowels and 24 isolated words using spectral-based features and giving an accuracy of 84% for sustained vowels.Suman Deb and S Dandapat [22] classified thespeech signal using new feature HPER (Harmonic peak to energy ratio). ...
Article
Speech signal can be used as marker for identification of Parkinson's disease. It is neurological disorder which is progressive in nature mainly effect the people in old age. Identification of relevant discriminate features from speech signal has been a challenge in this area. In this paper, factor analysis method is used to select distinguishing features from a set of features. These selected features are more effective for detection of the PD. From an empirical study on existing dataset and a generated dataset, it was found that the jitter, shimmer variants and noise to harmonic ratio are dominant features in detecting PD. Further, these features are employed in support vector machine for classifying PD from healthy subjects. This method provides an average accuracy of 85 % with sensitivity and specificity of about 86% and 84%. Important outcome of this study is that sustained vowels phonation captures distinguishing information for analysis and detection of PD.
... The main features for speech sample classification vary across languages (Eyigoz et al., 2020). Different feature extraction methods and different datasets can also obstruct the unification of features (Karan et al., 2020;Zhang et al., 2021). It is one of the main goals for related studies to reduce the number of features by choosing the most relevant for PWP detection. ...
Article
Full-text available
Parkinson’s disease (PD) is a neurodegenerative disorder that negatively affects millions of people. Early detection is of vital importance. As recent researches showed dysarthria level provides good indicators to the computer-assisted diagnosis and remote monitoring of patients at the early stages. It is the goal of this study to develop an automatic detection method based on newest collected Chinese dataset. Unlike English, no agreement was reached on the main features indicating language disorders due to vocal organ dysfunction. Thus, one of our approaches is to classify the speech phonation and articulation with a machine learning-based feature selection model. Based on a relatively big sample, three feature selection algorithms (LASSO, mRMR, Relief-F) were tested to select the vocal features extracted from speech signals collected in a controlled setting, followed by four classifiers (Naïve Bayes, K-Nearest Neighbor, Logistic Regression and Stochastic Gradient Descent) to detect the disorder. The proposed approach shows an accuracy of 75.76%, sensitivity of 82.44%, specificity of 73.15% and precision of 76.57%, indicating the feasibility and promising future for an automatic and unobtrusive detection on Chinese PD. The comparison among the three selection algorithms reveals that LASSO selector has the best performance regardless types of vocal features. The best detection accuracy is obtained by SGD classifier, while the best resulting sensitivity is obtained by LR classifier. More interestingly, articulation features are more representative and indicative than phonation features among all the selection and classifying algorithms. The most prominent articulation features are F1, F2, DDF1, DDF2, BBE and MFCC.
... [9] statistically measured the discrete wavelet transform signals after an empirical mode decomposition (EMD). [10] also used EMD to propose intrinsic mode function related features including cepstral coefficient (IMFCC). [11] performed a fusion of the decisions of an extreme learning machine, Gaussian mixture model (GMM) and SVM, which had as input the fusion of MPEG-7 audio parameters and interlaced derivative pattern parameters. ...
Article
With the recent development of speech-enabled interactive systems using artificial agents, there has been substantial interest in the analysis and classification of voice disorders to provide more inclusive systems for people living with specific speech and language impairments. In this paper, a two-stage framework is proposed to perform an accurate classification of diverse voice pathologies. The first stage consists of speech enhancement processing based on the original premise, which considers impaired voice as a noisy signal. To put this hypothesis into practice, the noise lestral harmonic-to-noise ratio (CHNR). The second stage consists of a convolutional neural network with long short-term memory (CNN-LSTM) architecture designed to learn complex features from spectrograms of the first-stage enhanced signals. A new sinusoidal rectified unit (SinRU) is proposed to be used as an activation function by the CNN-LSTM network. The experiments are carried out by using two subsets of the Saarbruecken voice database (SVD) with different etiologies covering eight pathologies. The first subset contains voice recordings of patients with vocal cordectomy, psychogenic dysphonia, pachydermia laryngis and frontolateral partial laryngectomy, and the second subset contains voice recordings of patients with vocal fold polyp, chronic laryngitis, functional dysphonia, and vocal cord paresis. Dysarthria severity levels identification in Nemours and Torgo databases is also carried out. The experimental results showed that using the minimum mean square error (MMSE)-based signal enhancer prior to the CNN-LSTM network using SinRU, led to a significant improvement in the automatic classification of the investigated voice disorders and dysarhtria severity levels. These findings support the hypothesis that using an appropriate speech enhancement preprocessing has positive effects on the accuracy of the automatic classification of voice pathologies thanks to the reduction of the intrinsic noise induced by the voice impairment.
... Related works have reported various computer-assisted techniques in the diagnosis and assessment of PD. As the mature voice acquisition equipment facilitates the construction of voice-based data, most study on PD focuses on dealing with speech processing [6]. Besides, approaches based on handwritten spiral images drawn by PD patients are also hotspots in recent research [7,8]. ...
Article
Full-text available
Long-term monitoring of resting tremor is key to assess the status of patients suffering from Parkinson’s disease (PD), which is of vital importance for reasonable medication. The detection and quantification of resting tremor in reported works rely heavily on specified movements and are not appropriate for long-term monitoring in real-life condition. The purpose of this study is to develop a detection model for long-term monitoring of resting tremor and explore an effective indicator for tremor quantification. This study included long-term acceleration data from PD patients and proposed a resting tremor detection model based on machine learning classifiers and Synthetic Minority Oversampling Technique (SMOTE). Four machine learning classifiers, K-Nearest Neighbor (KNN), Random Forest (RF), Adaptive Boosting (AdaBoost), and Support Vector Machine (SVM), were compared. Furthermore, an indicator called tremor timing ratio (TTR) was defined and calculated for tremor quantification. The detection model with RF classifier achieved the highest overall accuracy of 94.81%. The sample entropy of the acceleration signal was proved most influential in the classification by exploring the feature importance. Through the Kruskal-Wallis test and the Mann-Whitney U test, the TTR had a strong correlation with the subscore of resting tremor in Unified Parkinson Disease Rating Scale (UPDRS). Such two-step evaluation process for resting tremor can detect the tremor effectively and is expected to be applied in long-term monitoring of PD patients in daily life to realize a more comprehensive assessment of PD.
... Compared to the wavelet approach in which the number of decomposition levels and basis functions are preordained, in Empirical Mode Decomposition, they are performed axiomatically with respect to the nature of the signal. In 2018, Karan et al. [16] proposed an intrinsic mode function (IMF) based feature extraction to efficiently detect Parkinson disease and showed an improvement in accuracy of 10-20% compared with standard MFCC features. In [17], Qian et.al proposed a methodology to classify snore sounds using random forest classifier, wherein the descriptors from Wavelet energy packet features and empirical mode decomposition (EMD) features are extracted from audio files. ...
Article
Full-text available
Chronic obstructive pulmonary disease is a widespread, evitable and remediable ailment accentuated by the deregulation of a stream of air in the lungs or due to pleura anomalies owing to harmful fuels. Timely detection and prevention are of utmost importance to curb the spread of these disorders. To diagnose the respiratory illness, clinicians use the traditional approach of auscultation and this led to the development of state-of-the-art technology tools for sensing the morbidities. In this pursuit, in this work, a novel deep-learning structure is framed for better classification of lung sounds with the amalgamation of features extracted using the Empirical mode decomposition technique and improved network models. In this paper, a two-stage approach is proposed to classify the acoustic files from the ICBHI benchmark dataset. At the first stage, intrinsic mode function (IMF) feature vectors are extracted from lung sounds and the best combination of IMF features to classify respiratory disorders is found. In the next stage, Gammatone filters are applied on the best combination IMF features and Gammatone cepstral coefficients (GTCC) are computed. The GTCC are input to the deep-learning model, Recurrent Neural Network-based stacked BiLSTM classifier for classification. It is observed that the IMF 3 has more meaningful information and enhances the performance in conjunction with GTCCs compared to other IMFs and MFCC. Moreover, the results demonstrate that the proposed GTCC of the third IMF component applied to the stacked BiLSTM framework excels the competing Convolutional Neural Network method of classification in terms of accuracy, specificity and sensitivity.
... background noise) as a limitation for different studies results comparison. [50] points out the difficulties to extrapolate the results obtained with different databases due their recording differences. However, [21] proposes a SNR level of 42 dB for perturbation measurements (jitter and shimmer) to be reliable, and estimates 30 dB as the lowest limit of SNR level for reliable usage of classifiers. ...
Article
Full-text available
Automatic voice condition analysis systems have been developed to automatically discriminate pathological voices from healthy ones in the context of two disorders related to exudative lesions of Reinke’s space: nodules and Reinke’s edema. The systems are based on acoustic features, extracted from sustained vowel recordings. Reduced subsets of features have been obtained from a larger set by a feature selection algorithm based on Whale Optimization in combination with Support Vector Machine classification. Robustness of the proposed systems is assessed by adding noise of two different types (synthetic white noise and actual noise recorded in a clinical environment) to corrupt the speech signals. Two speech databases were used for this investigation: the Massachusetts Eye and Ear Infirmary (MEEI) database and a second one specifically collected in Hospital San Pedro de Alcántara (Cáceres, Spain) for the scope of this work (UEX-Voice database). The results show that the prediction performance of the detection systems appreciably decrease when moving from MEEI to a database recorded in more realistic conditions. For both pathologies, the prediction performance declines under noisy conditions, being the effect of white noise more pronounced than the effect of noise recorded in the clinical environment.
... An empirical mode decomposition (EMD) has also been proposed for the extraction of vocal characteristics. These characteristics are then classified by SVM and random forest (RF), which is commonly used for binary classifications [20]. EMD method has utilized the decomposition of a non-stationary signal into a series of intrinsic mode functions and thereafter the extracted features were fed into classifiers such as SVM and RF [21]. ...
Article
The diagnosis of Parkinson's disease (PD) is important in neurological pathology for appropriate medical therapy. Algorithms based on decision tree induction (DTI) have been widely used for diagnosing PD through biomedical voice disorders. However, DTI for PD diagnosis is based on a greedy search algorithm which causes overfitting and inferior solutions. This paper improved the performance of DTI using evolutionary-based genetic algorithms. The goal was to combine evolutionary techniques, namely, a genetic algorithm (GA) and genetic programming (GP), with a decision tree algorithm (J48) to improve the classification performance. The developed model was applied to a real biomedical dataset for the diagnosis of PD. The results showed that the accuracy of the J48, was improved from 80.51% to 89.23% and to 90.76% using the GA and GP, respectively.
... Furthermore, persons with the subject disease in its early stages might experience speech problems [10]. These include dysphonia (weak vocal fluency), repetitious echoes (a tiny assortment of audio variations), and hypophonia (vocal musculature disharmony) [7,11]. Information from human aural emissions might be detected and evaluated using a computing unit [12,13]. ...
Article
Full-text available
Parkinson's disease (PD) is a neurodegenerative disease that impacts the neural, physiological, and behavioral systems of the brain, in which mild variations in the initial phases of the disease make precise diagnosis difficult. The general symptoms of this disease are slow movements known as 'bradykinesia'. The symptoms of this disease appear in middle age and the severity increases as one gets older. One of the earliest signs of PD is a speech disorder. This research proposed the effectiveness of using supervised classification algorithms, such as support vector machine (SVM), naïve Bayes, k-nearest neighbor (K-NN), and artificial neural network (ANN) with the subjective disease where the proposed diagnosis method consists of feature selection based on the filter method, the wrapper method, and classification processes. Since just a few clinical test features would be required for the diagnosis, a method such as this might reduce the time and expense associated with PD screening. The suggested strategy was compared to PD diagnostic techniques previously put forward and well-known classifiers. The experimental outcomes show that the accuracy of SVM is 87.17%, naïve Bayes is 74.11%, ANN is 96.7%, and KNN is 87.17%, and it is concluded that the ANN is the most accurate one with the highest accuracy. The obtained results were compared with those of previous studies, and it has been observed that the proposed work offers comparable and better results.
... In the two databases, the signals are recorded at 44.1 kHz and 48.1 kHz sampling rates. It is observed from the literature [35] the most of the latent features are within 8 kHz bandwidth and the Hence pre-processing is done by downsampling the speech signal to 16 kHz. ...
Article
The early detection of COVID-19 is a challenging task due to its deadly spreading nature and existing fear in minds of people. Speech-based detection can be one of the safest tools for this purpose as the voice of the suspected can be easily recorded. The Mel Frequency Cepstral Coefficient (MFCC) analysis of speech signal is one of the oldest but potential analysis tools. The performance of this analysis mainly depends on the use of conversion between normal frequency scale to perceptual frequency scale and the frequency range of the filters used. Traditionally, in speech recognition, these values are fixed. But the characteristics of speech signals vary from disease to disease. In the case of detection of COVID-19, mainly the coughing sounds are used whose bandwidth and properties are quite different from the complete speech signal. By exploiting these properties the efficiency of the COVID-19 detection can be improved. To achieve this objective the frequency range and the conversion scale of frequencies have been suitably optimized. Further to enhance the accuracy of detection performance, speech enhancement has been carried out before extraction of features. By implementing these two concepts a new feature called COVID-19 Coefficient (C-19CC) is developed in this paper. Finally, the performance of these features has been compared.
... In Equation 1, M represents the desired number of filters and f shows the list of frequencies. The MFCC method is a method that is frequently used especially in biomedical studies [28,29]. The block diagram of the MFCC method is given in Figure 4. ...
Article
Sleep patterns and sleep continuity have a great impact on people's quality of life. The sound of snoring both reduces the sleep quality of the snorer and disturbs other people in the environment. Interpretation of sleep signals by experts and diagnosis of the disease is a difficult and costly process. Therefore, in the study, an artificial intelligence-based hybrid model was developed for the classification of snoring sounds. In the proposed method, first of all, audio signals were converted into images using the Mel-spectrogram method. The feature maps of the obtained images were obtained using Alexnet and Resnet101 architectures. After combining the feature maps that are different in each architecture, dimension reduction was made using the NCA dimension reduction method. The feature map optimized using the NCA method was classified in the Bilayered Neural Network. In addition, spectrogram images were classified with 8 different CNN models to compare the performance of the proposed model. Later, in order to test the performance of the proposed model, feature maps were obtained using the MFCC method and the obtained feature maps were classified in different classifiers. The accuracy value obtained in the proposed model is 99.5%
... The majority of state-of-the-art automatic dysarthric speech classification techniques are based on training classical classifiers on handcrafted acoustic features characterizing different impaired speech dimensions [3][4][5][6][7][8][9]. Recently, deep learning approaches aiming to learn high-level speech representations relevant for such a task have gained attention in the research community [10][11][12][13][14][15][16][17][18]. ...
... Disease diagnosis and monitoring using voice bio-markers has precedence in some therapeutic areas, like Parkinson's disease ( [13,14]). Harimoorthy et al. proposed a cloud-based system to identify Parkinson's disease by applying machine learning to voice data [15]. Similarly, Asmea et al. proposed a neural network to identify Parkinson's disease that sought to identify traits of the voice disorder, dysphonia [16]. ...
... It can be seen as a signal-to-aspiration noise ratio when other aperiodicities in the signal are comparatively low [55]. Vocal fold excitation ratio (VFER) gives the amount of noise in terms of nonlinear energy and entropy value produced due to pathological vocal fold oscillation [56]. Articulation: Articulation deficits are mainly related to changes in position of tongue, lips, velum, and other articulators involved in speech production [17]. ...
Article
Full-text available
Early, objective, and accurate assessment and identification of dysarthria caused by neurological diseases are essential in neurorehabilitation. This could be achieved by a robust smart system. However, developing such a system requires a standard training database that is properly labelled, which unfortunately is currently lacking. The present study aimed to establish a standardized, audio-visual integrated speech database of subacute stroke patients with dysarthria, named “The Mandarin Subacute Stroke Dysarthria Multimodal (MSDM) Database”, which included audio-visual data from 25 subacute stroke patients and 25 healthy participants. In addition, comprehensive subjective clinical assessment information of speech-motor function and ecological psychology of each patient was also provided. Based on this database, a pilot study was conducted to detect the significant acoustic and visual characteristics that revealed the severity of dysarthria related to subacute stroke. The present study offered a novel perspective to objectively quantify and identify the pathological differences in speech production. It can serve as a baseline for the development of an automatic intelligent system for assessing severity of dysarthria. In conclusion, the establishment and analysis of high-quality database on articulation errors associated with dysarthria will benefit clinical treatments and contribute to the realization of automatic diagnostic tools that can be implemented for clinical telehealth services.
... (GNE) and formant frequency or use spectrum and cepstrum for feature extraction. Other examples are mel-frequency cepstral coefficients (MFCC) [7], perception linear predictive coefficients (PLP), etc. [8]. After that, deep learning methods are used to detect dysarthria, such as convolutional neural network (CNN), CNN-LSTM (long short-term memory), and other models [9,10]. ...
Article
Full-text available
In recent years, due to the rise in the population and aging, the prevalence of neurological diseases is also increasing year by year. Among these patients with Parkinson’s disease, stroke, cerebral palsy, and other neurological symptoms, dysarthria often appears. If these dysarthria patients are not quickly detected and treated, it is easy to cause difficulties in disease course management. When the symptoms worsen, they can also affect the patient’s psychology and physiology. Most of the past studies on dysarthria detection used machine learning or deep learning models as classification models. This study proposes an integrated CNN-GRU model with convolutional neural networks and gated recurrent units to detect dysarthria. The experimental results show that the CNN-GRU model proposed in this study has the highest accuracy of 98.38%, which is superior to other research models.
... In the study done by Rojas et al. [26], a new Computer-Aided system for diagnosing Parkinson's disease is proposed, which is based on the Empirical Mode Decomposition that decomposes any non-linear and non-stationary time series into a small number of oscillatory Intrinsic Mode Functions a monotonous Residuum. In the study done by Karan et al. [27], empirical mode decomposition based features are used to reveal the speech characteristics, and a new feature called Intrinsic Mode Function Cepstral Coefficients (IMFCC) is introduced to find out the patterns of Parkinsonian individuals' speeches. ...
Article
Full-text available
Parkinson’s disease (PD) is the second most common neurodegenerative disorder all over the world. There are resting tremor, bradykinesia, and rarely dystonia, all of which are motor symptoms, among the manifestations of PD. But the direct use of these motor symptoms for diagnosis can be misleading since PD can be confused with other Parkinsonisms and further disorders with a similar symptom. Therefore gait can be used, which has significant dynamics in the detection of PD and is an extremely complex motion. In this paper, we employed a state-of-the-art ensemble learning algorithm, called the vibes algorithm, and the Hilbert-Huang Transform (HHT) to recognize PD gait patterns. We extracted the features by the processing of the signals, which come from sixteen sensors on the bottom of both feet, through HHT and sixteen statistical functions. We then performed the two-stage feature selection process by using the vibes algorithm and the OneRAttributeEval algorithm. Finally, we exploited the vibes algorithm and the Classification and Regression Trees as a base learner to differentiate between patients with PD and the control group. The classification accuracy, sensitivity and specificity rates of the proposed method are 98.79%, 98.92%, and 98.61%, respectively. Moreover, we thoroughly contrasted our method with the previous sixteen works. The experiment results demonstrated that our method is high-performance and maintains stability. We also found out two unrevealed markers that could provide support in clinical diagnosis for PD apart from the classification task.
Article
Parkinson’s disease (PD) is a neuro-degenerative disease due to loss of brain cells, which produces dopamine. It is most common after Alzheimer’s disease specially seen in old age people. In the earlier stage of disease, it has been noticed that most of the people suffering from speech disorder. From last two decades many studies have been conducted for the analysis of vocal tremors in PD. This study explores the combined approach of Variational Mode Decomposition (VMD) and Hilbert spectrum analysis (HSA) to investigate the voice tremor of patients with PD. A new set of features Hilbert cepstral coefficients (HCCs) are proposed in this study. Proposed features are assessed using vowels and words of PC-GITA database. The effectiveness of HCC features is utilized to perform classification, and regression analysis for PD detection. The highest average classification accuracy up to 91% and 96% is obtained with vowel /a/ and word /apto/ respectively. Further the classification accuracy up to 82% is obtained with independent dataset, when tested with the optimized model developed using PC-GITA database. In dysarthria level prediction highest correlation up to 0.82 is obtained using vowel /a/ and 0.8 with word /petaka/. The outcomes of this study indicate that the proposed articulatory features are suitable and accurate for PD assessment.
Chapter
Parkinson’s disease causes disruption in many vital functions such as speech, walking, sleeping, and movement, which are the basic functions of a human being. Early diagnosis is very important for the treatment of this disease. In order to diagnose Parkinson’s disease, doctors need brain tomography, and some biochemical and physical tests. In addition, the majority of those suffering from this disease are over 60 years of age, make it difficult to carry out the tests necessary for the diagnosis of the disease. This difficult process of diagnosing Parkinson’s disease triggers new researches. In our study, rule-based diagnosis of parkinson’s disease with the help of acoustic sounds was aimed. For this purpose, 188 (107 Male-81 Female) individuals with Parkinson’s disease and 64 healthy (23 Male-41 Female) individuals were asked to say the letter ‘a’ three times and their measurements were made and recorded. In this study, the data set of recorded 756 measurements was used. Baseline, Time, Vocal, MFCC and Wavelet that are extracted from the voice recording was used. The data set was balanced in terms of the “Patient/Healthy” feature. Then, with the help of Eta correlation coefficient based feature selection algorithm (E-Score), the best 20% feature was selected for each property group. For the machine learning step, the data were divided into two groups as 75% training, 25% test group with the help of systematic sampling method. The accuracy of model performance was evaluated with Sensivity, Specifitiy, F-Measurement, AUC and Kapa values. As a result of the study, it was found that the disease could be detected accurately with an accuracy rate of 84.66% and a sensitivity rate of 0.96. High success rates indicate that patients can be diagnosed with Parkinson’s disease with the help of their voice recordings.
Article
Parkinson's disease (PD) is a neuron related disorder that affects the people in old age. The majority of people suffering from PD develop several voice impairments mainly related to what is known as dysarthric speech. Voice analysis can help in PD detection and in the evaluation of the dysarthria level of the patients. This study introduces time-frequency features to model discontinuities and abrupt changes that arise in the voice signal due to PD. The proposed method consists of four stages: time-frequency matrix (TFM) representation, TFM decomposition using non-negative matrix factorization (NMF), feature extraction and classification. Statistical analyses show that the proposed time-frequency features significantly differentiate between PD patients and healthy speakers. Experiments with sustained vowel phonations and isolated words of the corpus PC–GITA are conducted. The proposed method achieved average classification accuracies of up to 92 % in vowels, and 97 % in words. There is an improvement in accuracy ranging from 10% to 40 % compared to existing methods. Further, the developed models are evaluated upon an independent dataset. Results on this separate test set show accuracies ranging from 63% to 75% in vowels, and from 53% to 75% in isolated words. Regarding the dysarthria level evaluation, Spearman's correlations between original and predicted labels are around 0.81 in sustained vowels and in isolated words. The results indicate that the proposed approach is suitable and robust for the automatic detection of PD.
Chapter
Parkinson’s Disease (PD) is a progressive neurodegenerative disorder that mainly affects the central nervous system causing cognitive, emotional and language disorders. Speech impairment is one of the earliest PD symptoms, and may be used for an automatic assessment to support the diagnosis and the evaluation of the disease severity, in the two biological sexes (male and female). This study investigates the processing of voice signals for measuring the incidence of Parkinson’s disease in women and men. The approach evaluates the use of several extracted features and two learning techniques Support Vector Machines (SVM) and Long Short-Term Memory (LSTM) to classify data obtained from four databases. Each database contains different data to each other and in a different language. The audio tasks were recorded using six different microphone. The results reveal cases of Parkinson’s disease appear more in men than in women.
Article
Chronic obstructive pulmonary disease (COPD) is a global burden, which is estimated to be the third leading cause of death worldwide by 2030. The economic burden of COPD grows continuously because it is not a curable disease. These conditions make COPD an important research field of artificial intelligence (AI) techniques in medicine. In this study, an integrated approach of the statistical-based fuzzy cognitive maps (SBFCM) and artificial neural networks (ANN) is proposed for predicting length of hospital stay of patients with COPD, who admitted to the hospital with an acute exacerbation. The SBFCM method is developed to determine the input variables of the ANN model. The SBFCM conducts statistical analysis to prepare preliminary information for the experts and then collects expert opinions accordingly, to define a conceptual map of the system. The integration of SBFCM and ANN methods provides both statistical data and expert opinion in the prediction model. In the numerical application, the proposed approach outperformed the conventional approach and other machine learning algorithms with 79.95% accuracy, revealing the power of expert opinion involvement in medical decisions. A medical decision support framework is constructed for better prediction of length of hospital stay and more effective hospital management.Graphical abstract
Article
Background: Resting tremor is an essential characteristic in patients suffering from Parkinson's disease (PD). Objective: Quantification and monitoring of tremor severity is clinically important to help achieve medication or rehabilitation guidance in daily monitoring. Methods: Wrist-worn tri-axial accelerometers were utilized to record the long-term acceleration signals of PD patients with different tremor severities rated by Unified Parkinson's Disease Rating Scale (UPDRS). Based on the extracted features, three kinds of classifiers were used to identify different tremor severities. Statistical tests were further designed for the feature analysis. Results: The support vector machine (SVM) achieved the best performance with an overall accuracy of 94.84%. Additional feature analysis indicated the validity of the proposed feature combination and revealed the importance of different features in differentiating tremor severities. Conclusion: The present work obtains a high-accuracy classification in tremor severity, which is expected to play a crucial role in PD treatment and symptom monitoring in real life.
Chapter
Correct and early diagnosing Parkinson’s Disease (PD) is vital as it enables the patient to receive the proper treatment as required for the current stage of the disease. Early diagnosis is crucial, as certain treatments, such as levodopa and carbidopa, have been proven to be more effective if given in the early stages of PD. At present the diagnosis of PD is solely based on the clinical assessment of a patient’s motor symptoms. By this stage however, PD has developed to such an extent that irreversible neurological damage has already occurred, meaning the patient has no chance of recovering. By implementing the use of machine learning into the process of assessing a potential PD patient the disease can be detected and diagnosed at a much earlier stage, allowing for swift intervention, which increases the chance of PD not developing to such damaging levels in the patient. Machine Learning is a subfield of artificial intelligence that provides different technique to scientists, clinicians and patients to address and detect diseases like PD at early stage. The main symptom of PD is the vocal impairment that distinguishes from the normal person. In this study, we used a PD vocal based dataset that has 755 features The Principal Component Analysis (PCA) and Linear Discriminate Analysis (LDA) techniques are used to reduce the dimensionality of the available Parkinson’s dataset to 8 optimal features. The study used four supervised machine learning algorithms, two algorithms are from the ensemble techniques, Random Forest, Adaboost Support Vector Machine and Logistic Regression. The Random Forest model with LDA and PCA shows the highest accuracy of 0.948% and 0.840% respectively.
Article
Full-text available
According to the World Health Organization (WHO), Parkinson’s disease (PD) is a neurodegenerative disease of the brain that causes motor symptoms including slower movement, rigidity, tremor, and imbalance in addition to other problems like Alzheimer’s disease (AD), psychiatric problems, insomnia, anxiety, and sensory abnormalities. Techniques including artificial intelligence (AI), machine learning (ML), and deep learning (DL) have been established for the classification of PD and normal controls (NC) with similar therapeutic appearances in order to address these problems and improve the diagnostic procedure for PD. In this article, we examine a literature survey of research articles published up to September 2022 in order to present an in-depth analysis of the use of datasets, various modalities, experimental setups, and architectures that have been applied in the diagnosis of subjective disease. This analysis includes a total of 217 research publications with a list of the various datasets, methodologies, and features. These findings suggest that ML/DL methods and novel biomarkers hold promising results for application in medical decision-making, leading to a more methodical and thorough detection of PD. Finally, we highlight the challenges and provide appropriate recommendations on selecting approaches that might be used for subgrouping and connection analysis with structural magnetic resonance imaging (sMRI), DaTSCAN, and single-photon emission computerized tomography (SPECT) data for future Parkinson’s research.
Preprint
Full-text available
This paper considers a representation learning strategy to model speech signals from patients with Parkinson's disease and cleft lip and palate. In particular, it compares different parametrized representation types such as wideband and narrowband spectrograms, and wavelet-based scalograms, with the goal of quantifying the representation capacity of each. Methods for quantification include the ability of the proposed model to classify different pathologies and the associated disease severity. Additionally, this paper proposes a novel fusion strategy called multi-spectral fusion that combines wideband and narrowband spectral resolutions using a representation learning strategy based on autoencoders. The proposed models are able to classify the speech from Parkinson's disease patients with accuracy up to 95\%. The proposed models were also able to asses the dysarthria severity of Parkinson's disease patients with a Spearman correlation up to 0.75. These results outperform those observed in literature where the same problem was addressed with the same corpus.
Article
Background and Objective Speech impairment is an early symptom of Parkinson's disease (PD). This study has summarized the literature related to speech and voice in detecting PD and assessing its severity. Methods A systematic review of the literature from 2010 to 2021 to investigate analysis methods and signal features. The keywords “Automatic analysis” in conjunction with “PD speech” or “PD voice” were used, and the PubMed and ScienceDirect databases were searched. A total of 838 papers were found on the first run, of which 189 were selected. One hundred and forty-seven were found to be suitable for the review. The different datasets, recording protocols, signal analysis methods and features that were reported are listed. Values of the features that separate PD patients from healthy controls were tabulated. Finally, the barriers that limit the wide use of computerized speech analysis are discussed. Results Speech and voice may be valuable markers for PD. However, large differences between the datasets make it difficult to compare different studies. In addition, speech analytic methods that are not informed by physiological understanding may alienate clinicians. Conclusions The potential usefulness of speech and voice for the detection and assessment of PD is confirmed by evidence from the classification and correlation results.
Article
Parkinson’s disease is a neurological illness that affects individuals at the later stage of life. Most patients complain of voice or speech abnormalities during the nascent stage of this disease, and it is difficult to recognize these abnormalities. This creates a need for a speech signal-based Parkinson's detection system to aid clinicians in the diagnosis process. A hybrid Parkinson's disease detection system has been proposed in this research work. Two speech datasets have been used in the design of this system: The first is an Italian Parkinson's Voice & Speech dataset, and the other is Mobile Device Voice Recordings at King's College London dataset. Seventeen acoustic features have been generated from the voice samples available in the datasets using Parselmouth library. In addition, based on the significance of features, the eight most significant features have been used in the design of the model. These features have been selected using genetic algorithm method. Four classifiers, k-nearest neighbors, XGBoost, random forest, and logistic regression, have been used during classification stage. The accuracy, sensitivity, f-measure, specificity, and precision parameters have been used for the analysis of the designed system. The combination of a genetic algorithm-based feature selection approach and logistic regression classifier has given 100% accuracy on Italian Parkinson's Voice & Speech dataset. The same feature extraction and classifier combination on the Mobile Device Voice Recordings at King's College London dataset have attained an accuracy level of 90%. Results have shown that the proposed system has outperformed the system found in the literature.
Article
The progress prediction of Parkinson's disease (PD) is one of the most important issues in early diagnosis of PD. Many researches have been conducted in this field, however, most existing methods focus on the selection of baseline features and regressors to reduce prediction errors. Different from the previous studies, the main goal of this paper is to obtain more effective features by feature transformation of baseline features to improve the prediction performance. Therefore, this paper proposes a prediction model based on graph wavelet transform (GWT) and attention weighted random forest (RF). Firstly, a clustering algorithm is adopted to reduce the prediction error of the model. Next, a multi-scale analysis of the feature vectors by GWT is conducted to yield a frequency feature representation that is more relevant to the target value. Finally, the frequency features are input into the attention weighted RF to predict the severity of PD, allowing the results of decision trees with better predictive performance in the RF to be highlighted while reducing the risk of overfitting. The effectiveness of the method is evaluated on the Parkinson's telemonitoring dataset collected by the University of Oxford. The experimental results show that the mean absolute error and root mean squared error of the proposed method for predicting PD severity (motor- and total-UPDRS) are 1.53, 2.13 and 1.91, 2.70, respectively. Compared with the quoted optimal method, the errors are reduced by 7.27%, 4.05% and 5.45%, 1.10%, respectively. This indicates that the proposed method has better prediction performance.
Article
More than 90% of patients with Parkinson’s disease suffer from hypokinetic dysarthria. This paper proposes a novel end-to-end deep learning model for Parkinson’s disease detection from speech signals. The proposed model extracts time series dynamic features using time-distributed two-dimensional convolutional neural networks (2D-CNNs), and then captures the dependencies between these time series using a one-dimensional CNN (1D-CNN). The performance of the proposed model was verified on two databases. On Database-1, the proposed model outperformed expert features-based machine learning models and achieved promising results, showing accuracies of 81.6% on the speech task of sustained vowel /a/ and 75.3% on the speech task of reading a short sentence (/si shi si zhi shi shi zi/) in Chinese. On Database-2, the proposed model was assessed on multiple sound types, including vowels, words, and sentences. An accuracy of up to 92% was obtained on the speech tasks, which included reading simple (/loslibros/) and complex (/viste/) sentences in Spanish. By visualizing the features generated by the model, it was found that the learned time series dynamic features are able to capture the characteristics of the reduced overall frequency range and reduced variability of Parkinson’s disease sounds, which are important clinical evidence for detecting Parkinson’s disease patients. The results also suggest that the low-frequency region of the Mel-spectrogram is more influential and important than the high-frequency region for Parkinson’s disease detection from speech.
Article
Parkinson’s disease (PD) is the most common neurological disorder that typically affects elderly people. In the earlier stage of disease, it has been seen that 90% of the patients develop voice disorders namely hypokinetic dysarthria. As time passes, the severity of PD increases, patients have difficulty performing different speech tasks. During the progression of the disease, due to less control of articulatory organs such as the tongue, jaw, and lips, the quality of speech signals deteriorates. Periodic medical evaluations are very important for PD patients; however, having access to a medical appointment with a neurologist is a privilege in most countries. Considering that the speech recording process is inexpensive and very easy to do, we want to explore in this paper the suitability of mapping information of the dysarthria level into the neurological state of patients and vice versa. Three levels of severity are considered in a multiclass framework using time–frequency (TF) features and Random-Forest along with an Error-Correcting Output Code (ECOC) approach. The multiclass classification task based on dysarthria level is performed using the estimated features with words and diadochokinetic (DDK) speech tasks. The developed model shows an unweighted average recall (UAR) of 68.49% with the DDK task /pakata/ based on m-FDA level, and 48.8% with the word /petaka/ based on the UPDRS level using the Random Forest classifier. With the aim, to evaluate the neurological states using the dysarthria level, the developed models based on m-FDA level are used to predict the MDS-UPDRS-III level of patients. The highest matching accuracy of 42% with the word /apto/ is achieved. Similarly, the multiclass classification framework based on MDS-UPDRS-III is applied to predict the dysarthria level of patients. In this case, the highest matching accuracy of 32% is obtained with the word /petaka/ and the DDK tasks /pa/ and /ta/.
Chapter
Full-text available
Blockchain dominion runs parallel to the abstractness in trust that serves the best feature in the modern world. Technology remains well connected to data migration that is in a way the core necessity of improvisation. Realizing the significance of data privacy has the best footfall in the present generation, which is sustainably handy with blockchain technologies. Similar implications need to be initiated for the sensitive personal data contained in public health overall. Patients have easy access to e-medicines, tele-medicines, AI-based consulting, and the list of development catering devices goes on every day. Patients can get comfortable services with an easy click if blockchain comes for the health-tech industry, which may tailor an effective personalized system for someone’s health. The healthcare industry caters to important transactions, which entails protective smart contracts requiring updated and faster financials. Each healthcare area can engage with technology, providing more immediate solutions for a vast population. Public health would improvise in the developing countries, and keeping track via the blockchain would be worthwhile. Looking forward, the implementation of blockchain technology needs to comply with robust privacy and data protection mechanism. Along with the biomedical outlook needs in-hand solid support via blockchain technologies, which would resolve mismanagement of government hospitals. With that reliability and credibility as important factor to establish a better governance which seems impossible without innovative technologies like blockchain.
Article
Full-text available
Speech plays an important role among the human communication and also a dominant source of medium for human computer interaction (HCI) to exchange information. Hence, it has always been an important research topic in the fields of Artificial Intelligence (AI) and Machine Learning (ML). However, in the traditional machine learning approach, when the dimension of the feature vector becomes quite large, it takes a huge amount of storage space and processing time for the learning algorithms. To address this problem, we have proposed a hybrid wrapper feature selection algorithm, called CEOAS, using clustering-based Equilibrium Optimizer (EO) and Atom Search Optimization (ASO) algorithm for recognizing different human emotions from speech signals. We have extracted Linear Prediction Coding (LPC) and Linear Predictive Cepstral Coefficient (LPCC) from the audio signals. Our proposed model helps to reduce the feature dimension as well as improves the classification accuracy of the learning model. The model has been evaluated on four standard benchmark datasets namely, SAVEE, EmoDB, RAVDESS, and IEMOCAP and impressive recognition accuracies of 98.01%, 98.72%, 84.62% and 74.25% respectively have been achieved which are better than many state-of-the-art algorithms.
Article
Parkinson’s disease (PD) is an aging neurological disease deficiencies dopamine and occupies the second position among the neurological disease after the Alzheimer’s in the world. The identification of PD in the early stage is extremely advanced and expensive. Many researchers investigated on PD in divergent ways and different approaches to identifying the PD in the early stage with low cost. One of the effective approaches such as PD voice analysis is an important topic in the current decade. In this paper, a novel probabilistic neural network-based approach is proposed for analyzing the PD. The major objective of this paper is to develop a highly accurate probabilistic neural network-based intelligent approach for the identification and classification of PD diseases. The inputs are considered as 1200 sound records as vowel vocalizations ‘a’, ‘e’, ‘i’, ‘o’, and ‘u’ in different timings (morning, mid-day, and night) of the day from 62 PD and 51 non-PD individuals. From the experimental analysis, it is evident that the performance of the dataset with PNN is increased proportionally to the incremental neurons in the hidden layer of PNN up to seven and it is found 100% accuracy with minimum time and gradient values. The projected PNN model with seven hidden layer neurons is a very powerful tool for predicting the PD in early detections with minimum cost. Comparative analysis with other standard machine learning approaches is evident towards the superiority of the proposed PNN model performance for successful identification of PD through voice analysis.
Conference Paper
Full-text available
This paper presents the analysis and classification of Parkinson disease. When a people suffering from Parkinson disease their vocal fold and vocal tract is affected severely and thus speech characteristics are alter during phonation. In this paper variational mode decomposition (VMD) is used for extracting relevant information of speech signal. VMD decomposes the speech signal into modes or sub signal. Various statistical features (mean, variance, skewness and kurtosis), energy and energy entropy are used for Parkinson disease detection. From the experiment, VMD based feature outperforms the Mel cepstral coefficient (MFCC). The proposed feature shows the classification accuracy 96.29%.
Article
Full-text available
This work explores the effectiveness of the Intrinsic Mode Functions (IMFs) of the speech signal, in estimating its Glottal Closure Instants (GCIs). The IMFs of the speech signal, which are its AM–FM or oscillatory components, are obtained from two similar nonlinear and non-stationary signal analysis techniques—Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN), and Modified Empirical Mode Decomposition (MEMD). Both these techniques are advanced variants of the original technique—Empirical Mode Decomposition (EMD). MEMD is much faster than ICEEMDAN, whereas the latter curtails mode-mixing (a drawback of EMD) more effectively. It is observed that the partial summation of a certain subset of the IMFs results in a signal whose minima are aligned with the GCIs. Based on this observation, two different methods are devised for estimating the GCIs from the IMFs of ICEEMDAN and MEMD. The two methods are captioned ICEEMDAN-based GCIs Estimation (IGE) and MEMD-based GCIs Estimation (MGE). The results reveal that IGE and MGE provide consistent and reliable estimates of the GCIs, compared to the state-of-the-art methods, across different scenarios—clean, noisy, and telephone channel conditions.
Article
Full-text available
This study investigates signals from sustained phonation and text-dependent speech modalities for Parkinson’s disease screening. Phonation corresponds to the vowel /a/ voicing task and speech to the pronunciation of a short sentence in Lithuanian language. Signals were recorded through two channels simultaneously, namely, acoustic cardioid (AC) and smart phone (SP) microphones. Additional modalities were obtained by splitting speech recording into voiced and unvoiced parts. Information in each modality is summarized by 18 well-known audio feature sets. Random forest (RF) is used as a machine learning algorithm, both for individual feature sets and for decision-level fusion. Detection performance is measured by the out-of-bag equal error rate (EER) and the cost of log-likelihood-ratio. Essentia audio feature set was the best using the AC speech modality and YAAFE audio feature set was the best using the SP unvoiced modality, achieving EER of 20.30% and 25.57%, respectively. Fusion of all feature sets and modalities resulted in EER of 19.27% for the AC and 23.00% for the SP channel. Non-linear projection of a RF-based proximity matrix into the 2D space enriched medical decision support by visualization.
Article
Full-text available
Aim: The research described is intended to give a description of articulation dynamics as a correlate of the kinematic behavior of the jaw-tongue biomechanical system, encoded as a probability distribution of an absolute joint velocity. This distribution may be used in detecting and grading speech from patients affected by neurodegenerative illnesses, as Parkinson Disease. Hypothesis: The work hypothesis is that the probability density function of the absolute joint velocity includes information on the stability of phonation when applied to sustained vowels, as well as on fluency if applied to connected speech. Methods: A dataset of sustained vowels recorded from Parkinson Disease patients is contrasted with similar recordings from normative subjects. The probability distribution of the absolute kinematic velocity of the jaw-tongue system is extracted from each utterance. A Random Least Squares Feed-Forward Network (RLSFN) has been used as a binary classifier working on the pathological and normative datasets in a leave-one-out strategy. Monte Carlo simulations have been conducted to estimate the influence of the stochastic nature of the classifier. Two datasets for each gender were tested (males and females) including 26 normative and 53 pathological subjects in the male set, and 25 normative and 38 pathological in the female set. Results: Male and female data subsets were tested in single runs, yielding equal error rates under 0.6% (Accuracy over 99.4%). Due to the stochastic nature of each experiment, Monte Carlo runs were conducted to test the reliability of the methodology. The average detection results after 200 Montecarlo runs of a 200 hyperplane hidden layer RLSFN are given in terms of Sensitivity (males: 0.9946, females: 0.9942), Specificity (males: 0.9944, females: 0.9941) and Accuracy (males: 0.9945, females: 0.9942). The area under the ROC curve is 0.9947 (males) and 0.9945 (females). The equal error rate is 0.0054 (males) and 0.0057 (females). Conclusions: The proposed methodology avails that the use of highly normalized descriptors as the probability distribution of kinematic variables of vowel articulation stability, which has some interesting properties in terms of information theory, boosts the potential of simple yet powerful classifiers in producing quite acceptable detection results in Parkinson Disease.
Conference Paper
Full-text available
Neurodegenerative syndromes such as Parkinson’s disease usually lead to speech impairments. Reduced intelligibility of spoken language is treatable with Speech and Language Therapy. A successful speech therapy implements the principles of frequency, intensity and repetition. Consequently, patients need to be highly motivated for the exercises to keep up with their training. We argue that game-based technology are prone to support patients in partaking in a self-sustained high frequency training. Furthermore, studies demonstrate that game-based interventions have the potential to enhance motivation for rehabilitative exercising in patients with neurological disorders. Building on these insights we apply successful principles of gamification to enhance impaired speech in patients with neurogenerative syndromes. With the ISi-Speech project (‘Individualisierte Spracherkennung in der Rehabilitation für Menschen mit Beeinträchtigung in der Sprechverständlichkeit’ (in German) [individual speech recognition in therapy for people with motor speech disorders]) we further integrate psychological motivation theory (self-determination) and user driven design into the developmental process of a rehabilitation tool for patients with Parkinson’s disease.
Article
Full-text available
This work explores the utility of the time-domain signal components, or the Intrinsic Mode Functions (IMFs), of speech signals’, as generated from the data-adaptive filterbank nature of Empirical Mode Decomposition (EMD), in characterizing speakers for the task of text-independent Speaker Verification (SV). A modified version of EMD, denoted as MEMD, which extracts IMFs with lesser mode-mixing, and provides a better representation of the higher frequency spectrum of speech, is also utilized for the SV task. Three different features are extracted over 20 ms frames, from the IMFs of EMD and MEMD. They are, then, tested individually, and in conjunction with the Mel Frequency Cepstral Coefficients (MFCCs), for SV. Two corpora - the NIST SRE 2003 corpus, and the CHAINS corpus - are used for the experiments. The results evaluated on the NIST SRE 2003 database, using the i-vector framework, reveal that the features extracted from the IMFs, in conjunction with the MFCCs, enhances the performance of the SV system. Further, it is observed that only a small set of lower-order IMFs is useful and necessary for characterizing speaker-specific information. The combination of the features with the MFCCs is also found to be useful when short speech utterances of ≤ 10 s are used for testing. Similarly, the results evaluated on the CHAINS corpus, using the conventional Gaussian Mixture Model (GMM) framework, reveal that the features, in combination with the MFCCs, enhance the performance of the SV system, not only for normal speech, but also for fast and whispered speech. Again, it is observed that only the first few IMFs are needed and useful for achieving such enhanced performance.
Article
Full-text available
A system that is capable of automatically discriminating healthy people from people with Parkinson’s Disease (PD) from speech recordings is proposed. It is initially based on 27 features, extracted from recordings of sustained vowels. The number of characteristics has been further reduced by feature selection. The system has been tested by using a heterogeneous database, composed of 40 control subjects and 40 subjects with PD belonging to different severity stages of the disease and under prescribed treatment. Repeated measures per individual were averaged before being assigned to subject, avoiding the usual practice of considering measurements within the same subject as independent. The best overall accuracy obtained was 85.25%, with a sensitivity of 90.23% and a specificity of 80.28%. Additionally, a pilot experiment to track PD severity stages has been performed on 32 out of the 40 initial subjects with PD. To the authors’ knowledge, this is the first speech-based experiment on automatic PD tracking by using the Hoehn and Yahr’s scale (clinical metric mainly focused on postural instability). The results suggest that progression of voice impairment follows different developmental trajectories than postural instability, implying different degenerative mechanisms.
Conference Paper
Full-text available
Parkinson's disease (PD) is a neurodegenerative disorder that is characterized by the loss of dopaminergic neurons in the mid brain. It is demonstrated that about 90% of the people with PD also develop speech impairments, exhibiting symptoms such as monotonic speech, low pitch intensity, inappropriate pauses, imprecision in consonants and problems in prosody; although they are already identify problems, only 3% to 4% of the patients receive speech therapy. The research community has addressed the problem of the automatic detection of PD by means of noise measures; however, in such works only the phonation of the English vowel /a/ has been considered. In this paper, the five Spanish vowels uttered by 50 people with PD and 50 healthy controls (HC) are evaluated automatically considering a set of four noise measures: Harmonics to Noise Ratio (HNR), Normalized Noise Energy (NNE), Cepstral HNR (CHNR) and Glottal to Noise Excitation Ratio (GNE). The decision on whether a speech recording is from a person with PD or from a HC is taken by a K nearest neighbors (k-NN) classifier, finding an accuracy of 66.57% when only the vowel /i/ is considered.
Article
Full-text available
70% to 90% of patients with Parkinson's disease (PD) show an affected voice. Various studies revealed, that voice and prosody is one of the earliest indicators of PD. The issue of this study is to automatically detect whether the speech/voice of a person is affected by PD. We employ acoustic features, prosodic features and features derived from a two-mass model of the vocal folds on different kinds of speech tests: sustained phonations, syllable repetitions, read texts and monologues. Classification is performed in either case by SVMs. A correlation-based feature selection was performed, in order to identify the most important features for each of these systems. We report recognition results of 91% when trying to differentiate between normal speaking persons and speakers with PD in early stages with prosodic modeling. With acoustic modeling we achieved a recognition rate of 88% and with vocal modeling we achieved 79%. After feature selection these results could greatly be improved. But we expect those results to be too optimistic. We show that read texts and monologues are the most meaningful texts when it comes to the automatic detection of PD based on articulation, voice, and prosodic evaluations. The most important prosodic features were based on energy, pauses and F0. The masses and the compliances of spring were found to be the most important parameters of the two-mass vocal fold model.
Chapter
Full-text available
Empirical Mode Decomposition is a data driven technique proposed by Huang. In this work, we explore spectral properties of the intrinsic mode functions and apply them to speech signals corresponding to real and simulated sustained vowels. For the synthetic sustained vowels we propose a phonation model that includes perturbations implied in common laryngeal pathologies. We extract features from each signal using the Burg’s standard spectral analysis of their intrinsic mode functions. Due to its well-known theoretical properties, the classic K-nearest neighbor’s classification rule is applied to real and synthetic data. We show that even using this basic pattern classification algorithm, the selected spectral features of only three intrinsic mode functions are enough to discriminate between normal and pathological voices. We have obtained a 99.00% of correct classifications between normal and pathological synthetic voices (K=1, sensitivity=0.990, specificity=0.990); while in the case of real voices the percentage of correct classification was 93.40% (K=3, sensitivity=0.925, specificity=0.926). These results strongly suggest that spectral properties of Empirical Mode Decomposition provide useful discriminative information for this task. Additionally we consider two pathologies of different etiology and treatment, which, given the similarity of their voice characteristics, are frequently misdiagnosed in clinical practice: muscular tension dysphonia and adductor spasmodic dysphonia. Preliminary results with a reduced real data base suggest that this approach could provide useful orientation to physicians and voice pathologists.
Article
Full-text available
We present an assessment of the practical value of existing traditional and non-standard measures for discriminating healthy people from people with Parkinson's disease (PD) by detecting dysphonia. We introduce a new measure of dysphonia, Pitch Period Entropy (PPE), which is robust to many uncontrollable confounding effects including noisy acoustic environments and normal, healthy variations in voice frequency. We collected sustained phonations from 31 people, 23 with PD. We then selected 10 highly uncorrelated measures, and an exhaustive search of all possible combinations of these measures finds four that in combination lead to overall correct classification performance of 91.4%, using a kernel support vector machine. In conclusion, we find that non-standard methods in combination with traditional harmonics-to-noise ratios are best able to separate healthy from PD subjects. The selected non-standard methods are robust to many uncontrollable variations in acoustic environment and individual subjects, and are thus well-suited to telemonitoring applications.
Article
Full-text available
An assessment of vocal impairment is presented for separating healthy people from persons with early untreated Parkinson's disease (PD). This study's main purpose was to (a) determine whether voice and speech disorder are present from early stages of PD before starting dopaminergic pharmacotherapy, (b) ascertain the specific characteristics of the PD-related vocal impairment, (c) identify PD-related acoustic signatures for the major part of traditional clinically used measurement methods with respect to their automatic assessment, and (d) design new automatic measurement methods of articulation. The varied speech data were collected from 46 Czech native speakers, 23 with PD. Subsequently, 19 representative measurements were pre-selected, and Wald sequential analysis was then applied to assess the efficiency of each measure and the extent of vocal impairment of each subject. It was found that measurement of the fundamental frequency variations applied to two selected tasks was the best method for separating healthy from PD subjects. On the basis of objective acoustic measures, statistical decision-making theory, and validation from practicing speech therapists, it has been demonstrated that 78% of early untreated PD subjects indicate some form of vocal impairment. The speech defects thus uncovered differ individually in various characteristics including phonation, articulation, and prosody.
Article
Parkinson's disease (PD) is a neurodegenerative disease that affects millions of people worldwide, causing mental and mainly motor dysfunctions. The negative impact on the patient's daily routine has moved the science in search of new techniques that can reduce its negative effects and also identify the disease in individuals. One of the main motor characteristics of PD is the hand tremor faced by patients, which turns out to be a crucial information to be used towards a computer-aided diagnosis. In this context, we make use of handwriting dynamics data acquired from individuals when submitted to some tasks that measure abilities related to writing skills. This work proposes the application of recurrence plots to map the signals onto the image domain, which are further used to feed a Convolutional Neural Network for learning proper information that can help the automatic identification of PD. The proposed approach was assessed in a public dataset under several scenarios that comprise different combinations of deep-based architectures, image resolutions, and training set sizes. Experimental results showed significant accuracy improvement compared to our previous work with an average accuracy of over 87%. Moreover, it was observed an improvement in accuracy concerning the classification of patients (i.e., mean recognition rates above to 90%). The promising results showed the potential of the proposed approach toward the automatic identification of Parkinson's disease.
Article
The prevalence of speech disorders among individuals with Parkinson's disease (PD) has been reported to be as high as 89%. Speech impairment in PD results from a combination of motor and nonmotor deficits. The production of speech depends upon the coordination of various motor activities: respiration, phonation, articulation, resonance and prosody. A speech disorder is defined as impairment in any of its inter-related components. Despite the high prevalence of speech disorders in PD, only 3-4% receive speech treatment. Treatment modalities include pharmacological intervention, speech therapy, surgery, deep brain stimulation and vocal fold augmentation. Although management of Parkinsonian dysarthria is clinically challenging, speech treatment in PD should be part of a multidisciplinary approach to patient care in this disease.
Article
Purpose: This study compared the information content and information efficiency of spoken language in individuals with Parkinson's disease (PD) to a healthy comparator group. Method: Nineteen participants with PD and 19 healthy older adults completed the prospective, cross-sectional study. In the primary analysis, 2 language samples elicited by standardized protocols were analyzed for group differences using standard discourse informativeness measures including main events (MEs; Wright, Capilouto, Wagovich, Cranfill, & Davis, 2005) analyzed as %MEs and correct information units (CIUs; Nicholas & Brookshire, 1993) analyzed as %CIUs and CIUs/min. In exploratory analyses, the following were examined: (a) associations among conceptual (%MEs) and lexical (%CIUs and CIUs/min) measures and (b) associations among informativeness measures and age, education, disease severity/duration, global cognition, speech intelligibility, and a verb confrontation naming measure. Results: In the primary analysis, the PD group differed significantly from the control group on conceptual (%MEs) and lexical measures of content (%CIUs) and efficiency (CIUs/min). In exploratory analyses, for the control group %MEs were significantly correlated with CIUs/min. Significant associations among conceptual and lexical measures of informativeness were not found in the PD group. For controls, there were no significant correlations between informativeness measures and any of the demographic or speech/cognitive/language variables. In the PD group, there was a significant and positive association between CIUs/min and Dementia Rating Scale-Second Edition scores (Mattis, 2001). A significant but negative correlation was found between CIUs/min and motor severity scores. However, %MEs and verb naming were significantly and positively correlated. Conclusions: Individuals with PD without dementia demonstrated reduced discourse informativeness that reflects disruptions to both conceptual and lexical discourse processes. In exploratory analyses, reduced efficiency of information content was associated with global cognition and motor severity. Clinical and research implications are discussed within a Cognitivist framework of discourse production (Sheratt, 2007).
Article
Background and objective: In this work, we present a systematic review concerning the recent enabling technologies as a tool to the diagnosis, treatment and better quality of life of patients diagnosed with Parkinson's Disease (PD), as well as an analysis of future trends on new approaches to this end. Methods: In this review, we compile a number of works published at some well-established databases, such as Science Direct, IEEEXplore, PubMed, Plos One, Multidisciplinary Digital Publishing Institute (MDPI), Association for Computing Machinery (ACM), Springer and Hindawi Publishing Corporation. Each selected work has been carefully analyzed in order to identify its objective, methodology and results. Results: The review showed the majority of works make use of signal-based data, which are often acquired by means of sensors. Also, we have observed the increasing number of works that employ virtual reality and e-health monitoring systems to increase the life quality of PD patients. Despite the different approaches found in the literature, almost all of them make use of some sort of machine learning mechanism to aid the automatic PD diagnosis. Conclusions: The main focus of this survey is to consider computer-assisted diagnosis, and how effective they can be when handling the problem of PD identification. Also, the main contribution of this review is to consider very recent works only, mainly from 2015 and 2016.
Article
This paper presents an optimized cuttlefish algorithm for feature selection based on the traditional cuttlefish algorithm, which can be used for diagnosis of Parkinson's disease at its early stage. Parkinson is a central nervous system disorder, caused due to the loss of brain cells. Parkinson's disease is incurable and could eventually lead to death but medications can help to control symptoms and elongate the patient's life to some extent. The proposed model uses the traditional cuttlefish algorithm as a search strategy to ascertain the optimal subset of features. The decision tree and k-nearest neighbor classifier as a judgment on the selected features. The Parkinson speech with multiple types of sound recordings and Parkinson Handwriting sample's datasets are used to evaluate the proposed model. The proposed algorithm can be used in predicting the Parkinson's disease with an accuracy of approximately 94% and help individual to have proper treatment at early stage. The experimental result reveals that the proposed bio-inspired algorithm finds an optimal subset of features, maximizing the accuracy, minimizing number of features selected and is more stable.
Article
Symptoms of Parkinson's disease vary from patient to patient. Additionally, the progression of those symptoms also differs among patients. Most of the studies on the analysis of speech of people with Parkinson's disease do not consider such an individual variation. This paper presents a methodology for the automatic and individual monitoring of speech disorders developed by PD patients. The neurological state and dysarthria level of the patients are evaluated. The proposed system is based on individual speaker models which are created for each patient. Two different models are evaluated, the classical GMM–UBM and the i–vectors approach. These two methods are compared with respect to a baseline found with a traditional Support Vector Regressor. Different speech aspects (phonation, articulation, and prosody) are considered to model recordings of spontaneous speech and a read text. A multi-aspect coefficient is proposed with the aim of incorporating information from all of these speech aspects into a single measure. Two different scenarios are considered to assess a set with seven PD patients: (1) the longitudinal test set which consists of speech recordings captured in five recording sessions distributed from 2012 to 2016, and (2) the at-home test set which consists of speech recordings captured in the home of the same seven patients during 4 months (one day per month, four times per day). The UBM is trained with the recordings of 100 speakers (50 with Parkinson's disease and 50 healthy speakers) captured with controlled acoustic conditions and a professional audio-setting. With the aim of evaluating the suitability of the proposed approaches and the possibility of extending this kind of systems to remotely assess the speech of the patients, a total of five different communication channels (sound-proof booth, Skype®, Hangouts®, mobile phone, and land-line) are considered to train and test the system. Due to the reduced number of recording sessions in the longitudinal test set, the experiments that involved this set are evaluated with the Pearson's correlation. The experiments with the at-home test set are evaluated with the Spearman's correlation. The results estimating the dysarthria level of the patients in the at-home test set indicate a correlation of 0.55 with a modified version of the Frenchay Dysarthria Assessment scale when the GMM-UBM model is applied upon the Skype® recordings. The results in the longitudinal test set indicate a correlation of 0.77 using a model based on i-vectors with recordings captured in the sound-proof-booth. The evaluation of the neurological state of the patients in the longitudinal test set shows correlations of up to 0.55 with the Movement Disorder Society - Unified Parkinson's Disease Rating Scale also using models based on i-vectors created with Skype® recordings. These results suggest that the i–vector approach is suitable when the acoustic conditions among recording sessions differ (longitudinal test set). The GMM-UBM approach seems to be more suitable when the acoustic conditions do not change a lot among recording sessions (at-home test set). Particularly, the best results were obtained with the Skype® calls, which can be explained due to several preprocessing stages that this codec applies to the audio signals. In general, the results suggest that the proposed approaches are suitable for tele-monitoring the dysarthria level and the neurological state of PD patients.
Article
Diagnosis of Parkinson's disease at its early stage is important in proper treatment of the patients so they can lead productive lives for as long as possible. Although many techniques have been proposed to diagnose the Parkinson's disease at an early stage but none of them are efficient. In this work, to improve the diagnosis of Parkinson's disease, we have introduced a novel improved and optimized version of crow search algorithm(OCSA). The proposed OCSA can be used in predicting the Parkinson's disease with an accuracy of 100% and help individual to have proper treatment at early stage. The performance of OCSA has been measured for 20 benchmark datasets and the results have been compared with the original chaotic crow search algorithm(CCSA). The experimental result reveals that the proposed nature-inspired algorithm finds an optimal subset of features, maximizing the accuracy and minimizing a number of features selected and is more stable.
Article
A study is presented analyzing tremor in the voice of speakers that were diagnosed with Parkinson’s disease (PD). The examined sounds are sustained /a/s, originating from a large dysarthric speech corpus. Six measures of vocal tremor are extracted from these vowels by applying a self-developed algorithm that is based on autocorrelation of contours and implemented as a script of an open-source speech analysis program. Univariate analyses of covariance reveal significantly raised tremor magnitudes (tremor intensity indices and tremor power indices) in PD speakers off medication as compared to a control group as well as within PD speakers in off medication condition as compared to on medication. No significant differences are found between the control group and PD speakers on medication as well as for tremor frequencies. However, the greater part of variance in tremor measures is always accounted for the speakers’ age.
Article
Background and objective: Parkinson's disease (PD) is considered a degenerative disorder that affects the motor system, which may cause tremors, micrography, and the freezing of gait. Although PD is related to the lack of dopamine, the triggering process of its development is not fully understood yet. Methods: In this work, we introduce convolutional neural networks to learn features from images produced by handwritten dynamics, which capture different information during the individual's assessment. Additionally, we make available a dataset composed of images and signal-based data to foster the research related to computer-aided PD diagnosis. Results: The proposed approach was compared against raw data and texture-based descriptors, showing suitable results, mainly in the context of early stage detection, with results nearly to 95%. Conclusions: The analysis of handwritten dynamics using deep learning techniques showed to be useful for automatic Parkinson's disease identification, as well as it can outperform handcrafted features.
Article
The diagnosis of Parkinson's Disease is a challenging task which might be supported by new tools to objectively evaluate the presence of deviations in patient's motor capabilities. To this respect, the dysarthric nature of patient's speech has been exploited in several works to detect the presence of this disease, but none of them has deeply studied the use of state-of-the-art speaker recognition techniques for this task. In this paper, two classification schemes (GMM-UBM and i-Vectors-GPLDA) are employed separately with several parameterization techniques, namely PLP, MFCC and LPC. Additionally, the influence of the kinetic changes, described by their derivatives, is analysed. With the proposed methodology, an accuracy of 87% with an AUC of 0.93 is obtained in the optimal configuration. These results are comparable to those obtained in other works employing speech for Parkinson's Disease detection and confirm that the selected speaker recognition techniques are a solid baseline to compare with future works. Results suggest that Rasta-PLP is the most reliable parameterization for the proposed task among all the tested features while the two employed classification schemes perform similarly. Additionally, results confirm that kinetic changes provide a substantial performance improvement in Parkinson's Disease automatic detection systems and should be considered in the future.
Article
Parkinson's Disease (PD) is a progressive degenerative disease of the nervous system that affects movement control. Unified Parkinson's Disease Rating Scale (UPDRS) is the baseline assessment for PD. UPDRS is the most widely used standardized scale to assess parkinsonism. Discovering the relationship between speech signal properties and UPDRS scores is an important task in PD diagnosis. Supervised machine learning techniques have been extensively used in predicting PD through a set of datasets. However, the most methods developed by supervised methods do not support the incremental updates of data. In addition, the standard supervised techniques cannot be used in an incremental situation for disease prediction and therefore they require to recompute all the training data to build the prediction models. In this paper, we take the advantages of an incremental machine learning technique, Incremental support vector machine, to develop a new method for UPDRS prediction. We use Incremental support vector machine to predict Total-UPDRS and Motor-UPDRS. We also use Non-linear iterative partial least squares for data dimensionality reduction and self-organizing map for clustering task. To evaluate the method, we conduct several experiments with a PD dataset and present the results in comparison with the methods developed in the previous research. The prediction accuracies of method measured by MAE for the Total-UPDRS and Motor-UPDRS were obtained respectively MAE = 0.4656 and MAE = 0.4967. The results of experimental analysis demonstrated that the proposed method is effective in predicting UPDRS. The method has potential to be implemented as an intelligent system for PD prediction in healthcare.
Article
The aim of this study is the analysis of continuous speech signals of people with Parkinson's disease (PD) considering recordings in different languages (Spanish, German, and Czech). A method for the characterization of the speech signals, based on the automatic segmentation of utterances into voiced and unvoiced frames, is addressed here. The energy content of the unvoiced sounds is modeled using 12 Mel-frequency cepstral coefficients and 25 bands scaled according to the Bark scale. Four speech tasks comprising isolated words, rapid repetition of the syllables /pa/-/ta/-/ka/, sentences, and read texts are evaluated. The method proves to be more accurate than classical approaches in the automatic classification of speech of people with PD and healthy controls. The accuracies range from 85% to 99% depending on the language and the speech task. Cross-language experiments are also performed confirming the robustness and generalization capability of the method, with accuracies ranging from 60% to 99%. This work comprises a step forward for the development of computer aided tools for the automatic assessment of dysarthric speech signals in multiple languages.
Article
About 1% of people older than 65 years suffer from Parkinson's disease (PD) and 90% of them develop several speech impairments, affecting phonation, articulation, prosody and fluency. Computer-aided tools for the automatic evaluation of speech can provide useful information to the medical experts to perform a more accurate and objective diagnosis and monitoring of PD patients and can help also to evaluate the correctness and progress of their therapy. Although there are several studies that consider spectral and cepstral information to perform automatic classification of speech of people with PD, so far it is not known which is the most discriminative, spectral or cepstral analysis. In this paper, the discriminant capability of six sets of spectral and cepstral coefficients is evaluated, considering speech recordings of the five Spanish vowels and a total of 24 isolated words. According to the results, linear predictive cepstral coefficients are the most robust and exhibit values of the area under the receiver operating characteristic curve above 0.85 in 6 of the 24 words.
Article
Although articulatory deficits represent an important manifestation of dysarthria in Parkinson’s disease (PD), the most widely used methods currently available for the automatic evaluation of speech performance are focused on the assessment of dysphonia. The aim of the present study was to design a reliable automatic approach for the precise estimation of articulatory deficits in PD. Twenty-four individuals diagnosed with de novo PD and twenty-two age-matched healthy controls were recruited. Each participant performed diadochokinetic tasks based upon the fast repetition of /pa/-/ta/-/ka/ syllables. All phonemes were manually labeled and an algorithm for their automatic detection was designed. Subsequently, 13 features describing six different articulatory aspects of speech including vowel quality, coordination of laryngeal and supralaryngeal activity, precision of consonant articulation, tongue movement, occlusion weakening, and speech timing were analyzed. In addition, a classification experiment using a support vector machine based on articulatory features was proposed to differentiate between PD patients and healthy controls. The proposed detection algorithm reached approximately 80% accuracy for a 5 ms threshold of absolute difference between manually labeled references and automatically detected positions. When compared to controls, PD patients showed impaired articulatory performance in all investigated speech dimensions ( $p < 0.05$). Moreover, using the six features representing different aspects of articulation, the best overall classification result attained a success rate of 88% in separating PD from controls. Imprecise consonant articulation was found to be the most powerful indicator of PD-related dysarthria. We envisage our approach as the first step towards development of acoustic methods allowing the automated assessment of articulatory features in dysarthrias.
Book
Introduction to Audio Analysis serves as a standalone introduction to audio analysis, providing theoretical background to many state-of-the-art techniques. It covers the essential theory necessary to develop audio engineering applications, but also uses programming techniques, notably MATLAB®, to take a more applied approach to the topic. Basic theory and reproducible experiments are combined to demonstrate theoretical concepts from a practical point of view and provide a solid foundation in the field of audio analysis. Audio feature extraction, audio classification, audio segmentation, and music information retrieval are all addressed in detail, along with material on basic audio processing and frequency domain representations and filtering. Throughout the text, reproducible MATLAB® examples are accompanied by theoretical descriptions, illustrating how concepts and equations can be applied to the development of audio analysis systems and components. A blend of reproducible MATLAB® code and essential theory provides enable the reader to delve into the world of audio signals and develop real-world audio applications in various domains.
Article
This paper introduces a novel approach, Cepstral Separation Difference (CSD), for quantification of speech impairment in Parkinson’s disease (PD). CSD represents a ratio between the magnitudes of glottal (source) and supra-glottal (filter) log-spectrums acquired using the source-filter speech model. The CSD-based features were tested on a database consisting of 240 clinically rated running speech samples acquired from 60 PD patients and 20 healthy controls. The Guttmann (µ2) monotonic correlations between the CSD features and the speech symptom severity ratings were strong (up to 0.78). This correlation increased with the increasing textual difficulty in different speech tests. CSD was compared with some non-CSD speech features (harmonic ratio, harmonic-to-noise ratio and Mel-frequency cepstral coefficients) for speech symptom characterization in terms of consistency and reproducibility. The high intra-class correlation coefficient (>0.9) and analysis of variance indicates that CSD features can be used reliably to distinguish between severity levels of speech impairment. Results motivate the use of CSD in monitoring speech symptoms in PD.
Article
Today, digital audio applications are part of our everyday lives. Audio classification can provide powerful tools for content management. If an audio clip automatically can be classified it can be stored in an organised database, which can improve the management of audio dramatically. In this paper, we propose effective algorithms to automatically classify audio clips into one of six classes: music, news, sports, advertisement, cartoon and movie. For these categories a number of acoustic features that include linear predictive coefficients, linear predictive cepstral coefficients and mel-frequency cepstral coefficients are extracted to characterize the audio content. The autoassociative neural network model (AANN) is used to capture the distribution of the acoustic feature vectors. The AANN model captures the distribution of the acoustic features of a class, and the backpropagation learning algorithm is used to adjust the weights of the network to minimize the mean square error for each feature vector. The proposed method also compares the performance of AANN with a Gaussian mixture model (GMM) wherein the feature vectors from each class were used to train the GMM models for those classes. During testing, the likelihood of a test sample belonging to each model is computed and the sample is assigned to the class whose model produces the highest likelihood.
Article
There has been considerable recent research into the connection between Parkinson's disease (PD) and speech impairment. Recently, a wide range of speech signal processing algorithms (dysphonia measures) aiming to predict PD symptom severity using speech signals have been introduced. In this paper, we test how accurately these novel algorithms can be used to discriminate PD subjects from healthy controls. In total, we compute 132 dysphonia measures from sustained vowels. Then, we select four parsimonious subsets of these dysphonia measures using four feature selection algorithms, and map these feature subsets to a binary classification response using two statistical classifiers: random forests and support vector machines. We use an existing database consisting of 263 samples from 43 subjects, and demonstrate that these new dysphonia measures can outperform state-of-the-art results, reaching almost 99% overall classification accuracy using only ten dysphonia features. We find that some of the recently proposed dysphonia measures complement existing algorithms in maximizing the ability of the classifiers to discriminate healthy controls from PD subjects. We see these results as an important step toward noninvasive diagnostic decision support in PD.
Conference Paper
Linear source-filter models have been widely used by researchers as a front-end for speaker identification systems. It uses the cepstral features derived from the power spectrum of the speech signal. But it is also well known that a significant part of the acoustic information cannot be modeled by the linear source-filter model, and thus, the need for nonlinear features becomes apparent. In this paper, an attempt is made to investigate the use of phase function in the analytic signal for deriving a representation of frequencies present in the speech signal. The main objective of the paper is to present a novel parameterization of speech that is based on the nonlinear AM-FM speaker model in the context of close-set speaker identification. The proposed features measure the amount of amplitude and frequency modulation and attempt to model aspects of the speaker related information that the commonly used linear source-filter model fails to capture. To evaluate the robustness of the proposed features for speaker identification, clean speech corpus from TIMIT database has been used and combined the speech signal with car noise and babble noise from the NOISEX-92 database. The proposed feature set provides significant improvements in the identification accuracy over the conventional method like MFCC under mismatched training and testing environments. The results show that better speaker identification rates are attainable under mismatched conditions especially at low signal-to-noise ratio (SNR).