ArticlePDF Available

Classification of Cardiovascular disease using dysphonia measurement in speech

Authors:
  • Mohammed V University in Rabat

Abstract and Figures

Cardiovascular disease is the leading cause of death worldwide. The diagnosis is made by non-invasive methods, but far from being comfortable, rapid, and accessible to everyone. Speech analysis is an emerging non-invasive diagnostic tool, and a lot of research has shown that it is efficient in speech recognition, and in detecting Parkinson's disease, so can it be effective for discrimination between patients with cardiovascular disease and healthy people? This present work answers the question posed, by collecting a database of 75 people, 35 of them suffering from cardiovascular diseases, and 40 are healthy. We took from each one, three vocal recordings of sustained vowels (aaaaa…, ooooo… .. and iiiiiiii… ..). By measuring dysphonia in speech, we were able to extract 26 features, with which we will train three types of classifiers, the k-near-neighbor, the support vectors machine classifier, and the naive Bayes classifier. The methods were tested for accuracy and stability, and we obtained 81% accuracy as the best result using the k-near-neighbor classifier.
Content may be subject to copyright.
Article citation info:
Bourouhou A, Jilbab A, Nacir C, Hammouch A. Classification of cardiovascular diseases using dysphonia measurement in speech.
Diagnostyka. 2021;22(1):31-37. https://doi.org/10.29354/diag/132586
31
DIAGNOSTYKA, 2021, Vol. 22, No. 1
ISSN 1641-6414
e-ISSN 2449-5220
DOI: 10.29354/diag/132586
CLASSIFICATION OF CARDIOVASCULAR DISEASES USING
DYSPHONIA MEASUREMENT IN SPEECH
Abdelhamid BOUROUHOU, Abdelilah JILBAB, Chafik NACIR, Ahmed HAMMOUCH
University Mohammed V, Ecole Normale Superieure de l'Enseignement Technique, Rabat, Morocco
e-mail: abdelhamid.bourouhou@um5s.net.ma
Abstract
Cardiovascular disease is the leading cause of death worldwide. The diagnosis is made by non-invasive
methods, but it is far from being comfortable, rapid, and accessible to everyone.
Speech analysis is an emerging non-invasive diagnostic tool, and a lot of researches have shown that it is
efficient in speech recognition and in detecting Parkinson's disease, so can it be effective for differentiating
between patients with cardiovascular disease and healthy people?
This present work answers the question posed, by collecting a database of 75 people, 35 of whom
suffering from cardiovascular diseases, and 40 are healthy. We took from each one three vocal recordings of
sustained vowels (aaaaa…, ooooo… .. and iiiiiiii… ..). By measuring dysphonia in speech, we were able to
extract 26 features, with which we will train three types of classifiers: the k-near-neighbor, the support
vectors machine classifier, and the naive Bayes classifier.
The methods were tested for accuracy and stability, and we obtained 81% accuracy as the best result
using the k-near-neighbor classifier.
Keywords: Cardiovascular disease, speech analysis, dysphonia measurement, classification methods, PCA features selection
1. INTRODUCTION
A healthy human body returns to perfect blood
circulation; the engine element behind this
circulation is the heart. The heart is a muscle that
pumps blood throughout the body.
Cardiovascular disease is abnormalities that
harm the normal functioning of the heart; they are
disorders that affect the heart and blood vessels. We
can cite, as examples, coronary heart disease,
cerebrovascular disease, peripheral arteriopathy,
rheumatic heart disease, and venous thrombosis.
The world heart federation has declared that
cardiovascular disease (CVD) is responsible for
17.5 million deaths per year worldwide. This death
rate has put CVD at the top of the world's death
causes. The world health organization (WHO)
predicts that by 2030 the number of deaths will
reach 23,6 million, and 1 of 10 people aged from 30
to 70 years old will die prematurely from
cardiovascular disease. On the other hand, 80% of
premature deaths could be avoided or delayed [1].
The following figure [2] shows a top 10 causes of
death worldwide for the year 2017, published by
the Institute for Health Metrics and Evaluation
(IHME).
The 80% that can be avoided, requires early
detection of CVD. For this reason, we need a
reliable, precise, fast, and inexpensive tool that will
make a distinction between cardiovascular patients
and healthy people.
In order to respond to this task, a lot of
researches have been carried out. Some of them
tried to develop an automatic diagnosis based on
the techniques of medical imaging,
echocardiography, or magnetic resonance imaging
(MRI). Other researches were based on the analysis
of electrocardiogram signals (ECG) [3,4], while
some others chose phono-cardiogram signals (PCG)
[5,6]. The starting information and methods are
different, but the goal remains the same the
development of an automatic diagnostic tool. All
the previously provided methods used one or more
information from the clinical examination of the
individual concerned.
It has been suggested that the characteristics of
the voice signal are associated with a number of
different disease entities, including dyslexia,
attention deficit hyperactivity disorder, Parkinson's
disease and other neurological disorders [7, 8, 9].
Why is not it the case with CVD?
So, we came up with the idea of achieving the
same goal but using a simple information source
that does not require a clinical examination. This
source is the human speech. Successful research
into Parkinson's detection and speech recognition
has inspired us, whence the present paper.
Clinical practice uses sustained vowels to assess
the quality of the voice; it is for the speaker to
pronounce a vowel maintained as long as possible
and at a comfortable level. [10] Healthy people can
stationary produce a sustained vowel, while this is
DIAGNOSTYKA, Vol. 22, No. 1 (2021)
Bourouhou A, Jilbab A, Nacir C, Hammouch A.: Classification of cardiovascular diseases using
32
not the case for those with vocal impairments. [11,
12, 13]
And to distinguish between healthy and sick
people using speech requires finding the difference
in spoken speech. Therefore, we must study the
speech dysfunctions of the two teams.
Fig. 1. Top 10 causes of death worldwide for
2017 from World Health Organization [2]
2. DATABASE DESCRIPTION
To conduct research on the differentiation
between healthy people and patients with
cardiovascular disease (CVD), we need a database
with these two categories of people.
The first challenge encountered was collecting
this database because, at first, it took a long time
before convincing the director of the CHU as well
as the head of the cardiology department.
Afterwards, we had to wait for the hospitalization
of cardiovascular patients, the end of their treatment
period and the arrival of a new group of patients.
Our database consists mainly of 75 people, 40
of whom are healthy and 35 suffer from
cardiovascular disease (CVD). We took 3 voice
recordings for each individual pronouncing
sustained vowels (aaaaa ..... / oooooo ...... / iiiiiii
......). The duration of each recording is 5s, and the
used device is a smartphone.
The following table summarizes the entire
contents of the database:
Table 1. Database description
Database
Total person
75
Patient
35
Gender
Women
Man
48%
52%
Age average
55,5 years
Healthy
Patient
55 years
56 years
Total records
225
Sampling
frequency
44100 Hz
3. METHODS
To distinguish between the people with (CVD)
and the healthy ones using speech, we used
Dysphonia measures for each of the two categories.
Before proceeding to the measures of the
Dysphonia, we filtered and fragmented the
recordings in order to keep only the useful part.
Then, we proceeded to the Dysphonia
measurements which will be the subject of 26
characteristics extracted from speech for each
recording; these built the training base for our used
classifiers (k-NN, Naive Bayes, SVM). Validation
was ensured by the k-folds cross-validation
technique. The diagram in fig.2 presents all steps of
our approach.
3.1. Signal acquisition
In this part, we used a smartphone, equipped by
MEMS technology microphone, model (CMM-
3729 AB-38308-TR) to obtain the records. This
microphone operates in a bandwidth of [100 ....
10,000 Hz], has 65 dBA signal to noise ratio
measure at 1 kHz by 94 dBA signal, and has 0.2%
total harmonic distortion measure at 1 kHz by 94
dB signal. [14]
3.2. Segmentation & filtering
After the signal acquisition, we proceed on two
essential steps. The first one consists of signal
segmentation because our records take between 6
seconds and 10 seconds for each one, and contains
some additional useless sounds. So, we propose a
segmentation for each record to keep just the 5
seconds which represents the sustained vowel.
The second step is filtering because our records
are a speech (vocal production), and this vocal
production is made up of sounds with very specific
frequency components, often given as the
fundamental frequency (F0, corresponding to the
carrier signal), and the first formants (Fi, spikes in
the spectral amplitude due to the resonances of the
duct voice). The vowels in our case- are very
DIAGNOSTYKA, Vol. 22, No. 1 (2021)
Bourouhou A, Jilbab A, Nacir C, Hammouch A.: Classification of cardiovascular diseases using
33
easily characterized by their formants. The
frequencies used by human speech can therefore be
between 110 and 7 kHz (speech communications).
But we filter the signal to keep just the
bandwidth [300-3400 Hz] because our speech
records contain sustained vowels, and the chosen
bandwidth includes the fundamental frequency and
the first 3 formants, which is sufficient for us.
So, we applied a Butterworth high pass filter
with a cutoff frequency at 300Hz and seventh order
succeeded by Butterworth low pass filter with a
cutoff frequency at 3400Hz, and also with seventh
order. The Butterworth filter choice came from its
characteristics, the cutoff frequency is the same
regardless of the filter order; the response in the
bandwidth is very flat, and the attenuation slope can
be increased by increasing the filter order.
Fig. 2. Diagram of the used method
3.3. Dysphonia measurement
In this step, we try to extract 26 features from
each record. These features built the training
datasets and were the keys to classify entities into
healthy and sick with (CVD).
The 26 features are linear and non-linear
parameters; we had frequency parameters,
amplitude parameters, harmonic parameters, and
some specific parameters dedicated to vocal
disorder analysis.
We used the “Praat” software to extract all
parameters, and the following table sums up all
extracted features:
Table 2. Features extracted from voice signal
Features
Description
Pitch
1. Median (Hz)
Depends on the
number of
vibrations per
second produced
by the vocal
cords.
The main
acoustic
correlate of tone
and intonation
2. Mean (Hz)
3. Standard
deviation (Hz)
4. Maximum (Hz)
5. Minimum (Hz)
Pulses
6. Number of
pulses
7. Number of
periods
8. Mean of
periods (s:
seconds)
9. Standard
deviation of
periods (s)
Jitter
10. Absolute (s)
Fundamental
frequency
variation
measurements
11. Local (%)
12. RAP (%)
13. PPQ5 (%)
14. DDP (%)
Shimmer
15. Local (%)
Amplitude
variation
measurements
16. Local (dB)
17. APQ3 (%)
18. APQ5 (%)
19. APQ11(%)
20. DDA (%)
Harmonicity
21. Mean
autocorrelation
22. Mean noise to
harmonics ratio
23. Mean
harmonics to
noise ratio (dB)
Intensity
24. Mean (dB)
25. Maximum (dB)
26. Minimum (dB)
The table below presents the used mathematical
expressions to extract features:
Table 3 Mathematics formulas of the extracted
features
Mathematical expression
Jitter




 
 
Signal acquisition
Segmentation & filtering
Dysphonia measurement
Validation & decision
Train
SVM
classifie
r
Train
k-NN
classifie
r
Train
Naive-
Bayes
classifier
Features selection by ACP
DIAGNOSTYKA, Vol. 22, No. 1 (2021)
Bourouhou A, Jilbab A, Nacir C, Hammouch A.: Classification of cardiovascular diseases using
34

󰇻󰇡

 󰇢󰇻





󰇻󰇡

 󰇢󰇻





󰇛 󰇜󰇛󰇜




Shimmer




 






󰇻󰇡

 󰇢󰇻





󰇻󰇡

 󰇢󰇻





󰇻󰇡


 󰇢󰇻





󰇛 󰇜

 󰇛󰇜

Harmonicity
 󰇛󰇜󰇛󰇜
󰇛󰇜
 󰇛󰇜
󰇛󰇜󰇛󰇜
Where, Ti represents the length of period, N
represents the number of periods and Ai represents
the Peak-to-peak amplitude.
3.4. Features selection by PCA
Now, we have a training matrix where columns
represent 26 extracted features and each row is an
observation (a record). To get reliable results in
classification, we proceeded into a features
selection technique, firstly, to reduce the matrix
and, secondly, to increase the accuracy of
discrimination.
The used technique is the Principal Component
Analysis (PCA); it’s the mostly used statistical
technique in data analysis and data compression.
The main idea consists of a data projection from the
original space of D variables to a subset
characterized by d variables uncorrelated; this
subset contains the principal components and
conserves the information contained in the original
space.
In our case, we used an algorithm of PCA
selection to reduce features number from 26 to 23.
The choice of 23 is not random, but it is determined
by running the PCA algorithm many times
changing the number of principal components to
keep in each time. Then, we compared the
classification result for all the tries. We noted that
the PCA features selection reduces the number of
features by computing automatically new “n”
features (n can be a number from 1 to 26) from the
26 original features. In our case the best projection
was for 23 new features computed automatically
from the original 26 features because it provides the
best result.
3.5. Training & Validation of Classifiers
Finally, we had to train and validate our
classifiers. We used 3 types of classifiers that are
the k near neighbor, the support vector machine,
and the Naive Bayes. All classifiers are subjected to
the same cross-validation technique.
3.5.1. Training phase
We have 75 people from whom we took 3
records each, so we collected 225 records. Then we
extracted from each record 26 features which was
reduced to 23 by the PCA features selection
technique. Thus, we built a training dataset (225
rows = 225 observations; 23 columns = 23 features)
for our used classifiers.
3.5.2. The k-NN classifier
It is one of the most used classifiers in machine
learning; it is meant to determine the k which
represents the number of neighbors to take into
consideration in the classification. A new entity will
be classified as the same class as the majority of
neighbors.
The nearest neighbor from the entity to classify
depends on the distance which can be Euclidean,
Cosine, Minkowski, or other types of distance. So,
the number of neighbor k and the type of distance
must be chosen carefully because it influences the
accuracy of classification.
We applied an optimization algorithm with the
aim to select the optimum parameters to train our k-
NN classifier. And the chosen parameters were
Cosine distance and k=5 as a number of the
neighbor.
3.5.3. The SVM classifier
It is a technique intended for discrimination and
regression problems. Generally, this type of
classifier is used for two-class discrimination
problems but can be extended for multiclass
problems.
The concept consists in building a decision limit
named hyperplane separator in the features space,
DIAGNOSTYKA, Vol. 22, No. 1 (2021)
Bourouhou A, Jilbab A, Nacir C, Hammouch A.: Classification of cardiovascular diseases using
35
maximizing the margin between samples from two
different classes. These hyperplane separators are
the support vectors, which determine the class of
the new entity.
To calculate the support vectors, the SVM
classifier algorithm uses different kernels, and each
kernel has specific proprieties. Examples of these
kernels are the Gaussian kernel, the RBF kernel,
and the Linear kernel. The choice of the kernel is
crucial for prediction accuracy, so we run an
optimization algorithm to get the most appropriate
SVM classifier proprieties, and the result was the
Gaussian kernel.
3.5.4. The Naïve Bayes classifier
The Naïve Bayes classifier is a type of classifier
probabilistic based on the theorem of Bayes; it is
simple and belongs to the linear classifier family.
The prediction accuracy result for this classifier
depends on the kernel choice, Gaussian kernel,
Triangular kernel, Box kernel, or Epanechnikov
kernel. The difference between all these kernels is
determined by the formulas used in the algorithm.
The best result using the Naïve Bayes classifier was
for the Triangular kernel.
3.5.5. Validation phase
To validate our classifiers, we proceeded into k-
folds cross-validation. This type of cross-validation
consists of a random subdivision of the database by
“k”, keep one of “k” subsets for validation, and
training the classifier by all the rest “k-1” subsets.
We repeated the operation “k” times until all the
subsets may be used one time as a validation set;
we calculated the performance score each time. The
mean of the “k” squared errors average is finally
calculated to estimate the prediction error. In our
case the k=5.
After building the cross-validation model, the
judgment consists of calculating three essential
parameters the accuracy, the sensitivity, and the
specificity; their formulas are given below:
󰇛󰇜
󰇛󰇜
󰇛󰇜
󰇛󰇜
󰇛󰇜
󰇛󰇜
4. RESULTS & DISCUSSION
In this section, we will present the findings.
Fig.3 below presents two extracted features, from
people with CVD and from healthy ones plotted
together to note the difference between our two
classes of people.
In the first part of the figure, we plotted the
standard deviation of the pitch for records from the
people with CVD and for the healthy one records.
In the second part of the figure, we have the
standard deviation of the period for the two classes.
We can see that the area occupation for the
people with CVD is more important and differs
significantly from that of the healthy people. So, we
come to conclude that the distinction can be done
using speech. Table 4 describes the reached results
for a first classification try, using all 26 features
extracted from records to train classifiers.
Table 4 Classifiers result without application of
features selection method
Classifier
Accuracy
(%)
Sensitivity
(%)
Specificity
(%)
K- Near
Neighbor
78.46
80.07
76.85
Support
Vectors
Machine
70.54
69.08
72.00
Naïve Bayes
63.88
65.45
62.31
Fig. 3. Comparison between healthy people
& people with CVD
DIAGNOSTYKA, Vol. 22, No. 1 (2021)
Bourouhou A, Jilbab A, Nacir C, Hammouch A.: Classification of cardiovascular diseases using
36
We have proceeded to a first classification try,
by using all the extracted features, which provides
us a training matrix 26*225, where 225 is the
number of observations. We have subjected this
matrix to different classifiers algorithms. The best
provided result was reached by the KNN classifier
that means it is the most adapted classifier for our
database.
Fig.4 below shows the confusion matrix of the
k-near-neighbor classifier; this confusion matrix
specifies that our KNN classifier can detect the true
positive feature by 80.7%, and the true negative one
by only 76.85%. These results reveal the relation
supposed to be holding between speech and CVD,
and also that we have to improve the classification
accuracy.
Fig. 4. Confusion matrix for the k-NN
Classifier before features selection method
To increase the classification accuracy, we have
proceeded to a PCA features selection technique.
Table 5 describes the results for our second try of
classification; this time we trained our classifiers by
the matrix generated by the PCA algorithm, which
contains 23 principal components instead of our 26
extracted features.
Table 5 Classifiers result after application of features
selection method “PCA”
Classifier
Accuracy
(%)
Sensitivity
(%)
Specificity
(%)
K- Near
Neighbor
81.51
82.46
80.56
Support
Vectors
Machine
75.28
78.50
72.06
Naïve Bayes
70.45
70.00
70.09
The used features selection technique has
improved all the performances of our used
classifiers, and the KNN classifier is still the best.
Fig.5 shows the confusion matrix of the k-near-
neighbor classifier after PCA technique of features
selection.
We can also plot the Receiver Operating
Characteristic of our three classifiers to compare
the results achieved. Fig. 6 below describes that
comparison.
As shown in the figure, the area under the plot
of the KNN classifier is larger than those of the
SVM and Naïve Bayes classifiers; therefore, we
conclude that the KNN classifier is better in this
case.
Fig. 5 Confusion matrix for the k-NN
Classifier after using "PCA" algorithm
Fig. 6. ROC Curve for all classifiers result
after PCA method
5. CONCLUSION
The CVD is still the cause number one of death
worldwide, but we can avoid 80% of death if we
detect earlier people with CVD, and to ensure
detection we have to make the assessment and
diagnostic precise, fast, and inexpensive.
To achieve this, we used the speech as a tool of
diagnostic, to distinguish between people with
CVD and healthy ones. So, we have collected a
database that contains multiple records from
different people who pronounce sustained vowels
DIAGNOSTYKA, Vol. 22, No. 1 (2021)
Bourouhou A, Jilbab A, Nacir C, Hammouch A.: Classification of cardiovascular diseases using
37
/a/, /o/ and /i/. Then, we have extracted 26 voice
features from each record.
To improve the assessment of CVD, we have
reduced the training matrix which contains 26
features and 225 observations, to another matrix
23*222 by the PCA feature selection technique.
We have used 3 classifiers, the K-near-
neighbor, the support vectors machine, and the
Naive Bayes. Choosing the K- near neighbor and
cosine distance made the KNN classifier the most
successful classifier. The best classification
accuracy result was 81.50%.
REFERENCES
1. https://www.euro.who.int/en/health-
topics/noncommunicable-diseases/cardiovascular-
diseases/data-and-statistics
2. https://ourworldindata.org/what-does-the-world-die-
from
3. Rawther NN, Cheriyan J. Detection and classification
of cardiac arrhythmias based on ECG and PCG using
temporal and wavelet features. IJARCCE. 2015; 4(4).
4. Bouguila Z, Moukadem A, Dieterlen A, Ahmed
Benyahia A, Hajjam A, Talha S, Andres E.
Autonomous cardiac diagnostic based on
synchronized ECG and PCG signal. In: 7th
International Joint Conference on Biomedical
Engineering Systems and TechnologiesESEO,
Angers. 2014
5. Ghassemian H, Kenari AR. Early detection of
pediatric heart disease by automated spectral analysis
of phonocardiogram in children. J. Inf. Syst.
Telecommun. 2015; 3(2): 6675.
6. Nabih-Ali M, El-Dahshan E-SA, Yahia AS. Heart
diseases diagnosis using intelligent algorithm based
on PCGsignal analysis. Circuits Syst. 2017;
8(7):184190.
7. Levanon Y, Lossos-Shifrin L. Inventors; Google,
assignee. Method and system for diagnosing
pathological phenomenon using a voice signal 2008.
US patent 7,398,213 B1.
8. Bonneh YS, Levanon Y, Dean-Pardo O, Lossos L,
Adini Y. Abnormal speech spectrum and increased
pitch variability in young autistic children. Front
Hum Neurosci. 2011;4:237.
9. Uma Rani K, Holi MS. Automatic detection of
neurological disordered voices using mel cepstral
coefficients and neural networks. In: 2013 IEEE
Point-of-Care Healthcare Technologies (PHT)
2013:76-79. Bangalore, India, 2013.
10. Titze Ingo. Phonation into a straw as a voice building
exercise. Journal of Singing. 2000; 57: 27-28.
11. Cnockaert L, Schoentgen J, Auzou P, Ozsancak C,
Defebvre L, Grenez F. Low-frequency vocal
modulations in vowels produced by Parkinsonian
subjects, Speech Communication. 2008;50(4):288-
300. https://doi.org/10.1016/j.specom.2007.10.003
12. Little MA, McSharry PE, Hunter EJ, Spielman J,
Ramig, LO. Suitability of dysphonia measurements
for telemonitoring of Parkinson's Disease. IEEE
Transactions on Biomedical Engineering.
2009;56(4):1015-1022.
https://doi.org/10.1109/TBME.2008.2005954
13. Tsanas A, Little MA, McSharry PE, Spielman J,
Ramig LO. Novel speech signal processing
algorithms for high-accuracy classification of
Parkinson's disease. IEEE Trans Biomed Eng.
2012;59(5):1264-1271.
https://doi.org/10.1109/TBME.2012.2183367
14. https://www.cuidevices.com/product/resource/cmm-
3729ab-38308-tr.pdf
15. Bourouhou A, Jilbab A, Nacir C, Hammouch A.
Detection and localization algorithm of the S1 and S2
heart sounds. 2017 International Conference on
Electrical and Information Technologies (ICEIT),
Rabat. 2017:1-4
https://doi.org/10.1109/EITech.2017.8255217
16. Bourouhou A, Jilbab A, Nacir C, Hammouch A.
Comparison of classification methods to detect the
Parkinson disease. 2016 International Conference on
Electrical and Information Technologies (ICEIT),
Tangiers, 2016:421-424.
https://doi.org/10.1109/EITech.2016.7519634
17. Bourouhou A, Jilbab A, Nacir C, Hammouch A.
Heart Sounds classification for a medical diagnostic
assistance. International Journal of Online and
Biomedical Engineering (iJOE) 2019; 15(11): 88
103.
Received 2020-09-27
Accepted 2021-01-19
Available online 2021-01-25
Abdelhamid BOUROUHOU
was born in Rabat, Morocco on
December 26th, 1989. Received
the Master degree in Electrical
Engineering from ENSET,
Rabat Mohammed V
University, Morocco, in 2014
he is a research student of
Sciences and Technologies of
the Engineer in ENSIAS,
Research Laboratory in Electrical Engineering LRGE,
Research Team in Computer and Telecommunication
ERIT at ENSET, Mohammed V University, Rabat,
Morocco. His interests are in sounds classification for
medical diagnostic assistance.
Abdelilah JILBAB Professor at
ENSET Rabat, Morocco; he
graduated in electronic and
industrial computer aggregation in
1995. Since 2003, he is a member
of the laboratory LRIT (Unit
associated with the CNRST, FSR,
Mohammed V University, Rabat,
Morocco). He acquired his PhD in
Computer and Telecommunication from Mohammed V-
Agdal University, Rabat, Morocco in 2009. His domains
of interest include signal processing and embedded
systems.
NACIR Chafik Teacher
Researcher in Mathematics.
Former Head of the Department of
Mathematics and Computer
Science.
Former member of the Scientific
Commission ENSET of Rabat
Morocco.
DIAGNOSTYKA, Vol. 22, No. 1 (2021)
Bourouhou A, Jilbab A, Nacir C, Hammouch A.: Classification of cardiovascular diseases using
38
Ahmed HAMMOUCH received
the master degree and the PhD in
Automatic, Electrical, Electronic
by the Haute Alsace University
of Mulhouse (France) in 1993
and the PhD in Signal and Image
Processing by the Mohammed V
University of Rabat in 2004.
From 1993 to 2013 he was a
professor in the Mohammed V
University in Morocco. Since 2009 he manages the
Research Laboratory in Electronic Engineering. He is an
author of several papers in international journals and
conferences. His domains of interest include multimedia
data processing and telecommunications. He is with
National Center for Scientific and Technical Research in
Rabat.
... In the aim to make CVDs detection and diagnosis more accurate, fast and accessible for everyone, several researches had launched to develop an automatic diagnosis by using different signal processing methods, and different source of information such as electrocardiogram "ECG" [3,4], echography and phonocardiogram "PCG" [5,6]. [2] In our previous work [7], we tried to differentiate people with CVDs and healthy people by their voices using a dysphonia measurement. After pre-processing phase which consist on segmentation and filtering, we have extracted 26 sound features from all records and we construct models by training each chosen classifier. ...
... The used database in this work was collected and used previously in BOUROUHOU et al. [7], it's about 35 CVDs patients (17 women and 18 men), and 40 healthy people (19 women and 21 men). The ranged age of patients is between 30 and 81 (average 56, standard deviation 10.79), and the age healthy people ranges between 40 and 75 (average 55, standard deviation 6.64). ...
Article
Full-text available
Heart diseases cause many deaths around the world every year, and his death rate makes him the leader of the killer diseases. But early diagnosis can be helpful to decrease those several deaths and save lives. To ensure good diagnose, people must pass a series of clinical examinations and analyzes, which make the diagnostic operation expensive and not accessible for everyone. Speech analysis comes as a strong tool that can resolve the task and give back a new way to discriminate between healthy people and cardiovascular disease patients. Our latest paper treated this task but using a dysphonia measurement to differentiate between people with cardiovascular disease and the healthy one, and we were able to reach 81.5% in prediction accuracy. This time we choose to change the method to increase the accuracy by extracting the voiceprint using 13 Mel-Frequency Cepstral Coefficients and the pitch, extracted from the people's voices provided from 75 subjects (35 has cardiovascular diseases, 40 healthy), three records of sustained vowels (aaaaa…,ooooo…and iiiiiiii….) has been collected from each one. We used the k-near-neighbor classifier to train a model and to classify the test entities. We were able to outperform the previous results, reaching 95.55% of prediction accuracy.
Article
Full-text available
Background: Normal voice production depends on the synchronized cooperation of multiple physiological systems, which makes the voice sensitive to changes. Any systematic, neurological, and aerodigestive distortion is prone to affect voice production through reduced cognitive, pulmonary, and muscular functionality. This sensitivity inspired using voice as a biomarker to examine disorders that affect the voice. Technological improvements and emerging machine learning (ML) technologies have enabled possibilities of extracting digital vocal features from the voice for automated diagnosis and monitoring systems. Objective: This study aims to summarize a comprehensive view of research on voice-affecting disorders that uses ML techniques for diagnosis and monitoring through voice samples where systematic conditions, nonlaryngeal aerodigestive disorders, and neurological disorders are specifically of interest. Methods: This systematic literature review (SLR) investigated the state of the art of voice-based diagnostic and monitoring systems with ML technologies, targeting voice-affecting disorders without direct relation to the voice box from the point of view of applied health technology. Through a comprehensive search string, studies published from 2012 to 2022 from the databases Scopus, PubMed, and Web of Science were scanned and collected for assessment. To minimize bias, retrieval of the relevant references in other studies in the field was ensured, and 2 authors assessed the collected studies. Low-quality studies were removed through a quality assessment and relevant data were extracted through summary tables for analysis. The articles were checked for similarities between author groups to prevent cumulative redundancy bias during the screening process, where only 1 article was included from the same author group. Results: In the analysis of the 145 included studies, support vector machines were the most utilized ML technique (51/145, 35.2%), with the most studied disease being Parkinson disease (PD; reported in 87/145, 60%, studies). After 2017, 16 additional voice-affecting disorders were examined, in contrast to the 3 investigated previously. Furthermore, an upsurge in the use of artificial neural network-based architectures was observed after 2017. Almost half of the included studies were published in last 2 years (2021 and 2022). A broad interest from many countries was observed. Notably, nearly one-half (n=75) of the studies relied on 10 distinct data sets, and 11/145 (7.6%) used demographic data as an input for ML models. Conclusions: This SLR revealed considerable interest across multiple countries in using ML techniques for diagnosing and monitoring voice-affecting disorders, with PD being the most studied disorder. However, the review identified several gaps, including limited and unbalanced data set usage in studies, and a focus on diagnostic test rather than disorder-specific monitoring. Despite the limitations of being constrained by only peer-reviewed publications written in English, the SLR provides valuable insights into the current state of research on ML-based voice-affecting disorder diagnosis and monitoring and highlighting areas to address in future research.
Thesis
Full-text available
La voix est un des outils les plus prometteurs de la médecine numérique. En association avec les compagnons virtuels médicaux, l’estimation de symptômes à partir de marqueurs vocaux permettra à la fois le suivi à domicile de patients souffrant de maladies neuropsychiatriques chroniques, et l’accès à des conseils personnalisés d’hygiène de vie pour la population générale. La somnolence, présente dans de nombreuses pathologies et présentant une très forte prévalence à la fois chez les patients souffrant de maladies chroniques et en population générale, est un symptôme privilégié pour cette approche. L’objectif des travaux présentés dans ce manuscrit est ainsi de compléter les informations collectées par les assistants virtuels lors de l’interaction des sujets avec ceux-ci, en utilisant des marqueurs vocaux validés comme étant des marqueurs fiables de la somnolence. La démarche suivie est la suivante. Dans un premier temps, nous introduisons les mécanismes de production de la voix et l’ensemble des pathologies qui peuvent interférer avec les différentes fonctions musculaires et neuro-musculaires impliquées, avec une attention particulière portée sur les méthodologies employées pour l’enregistrement et l’annotation des corpus utilisés. Ensuite, nous tentons d’établir une définition consensuelle de la somnolence en utilisant trois dictionnaires de référence de la langue française ; deux approches de fouille de texte ; et enfin par l’intermédiaire d’une revue générale des outils conçus pour la mesurer. Nous présentons ensuite notre propre corpus de patients atteints d’hypersomnies, enregistrés au pôle universitaire de médecine du sommeil du CHU de Bordeaux sur une tâche de lecture à voix haute, annotés avec des mesures de somnolence à la fois subjectives (questionnaires) et objectives (latence d’endormissement au Test Itératif de Latence d’Endormissement – TILE) validées par les médecins du CHU. Ce corpus est ensuite comparé avec les autres corpus de l’état de l’art sur la détection de la somnolence dans la voix, à partir desquels nous proposons des recommandations sur l’élaboration de tels corpus. Puis, à l’aide d’une étude perceptuelle, nous validons l’utilisation de la base TILE pour la détection de la somnolence dans la voix. Sur la base de ce corpus, nous élaborons ensuite quatre catégories de descripteurs vocaux, mesurant deux dimensions de l’impact de la somnolence sur la voix et la production de parole. D’une part, nous étudions des marqueurs de qualité acoustique de la voix ; d’autre part nous concevons des marqueurs de qualité de lecture, divisés en trois sous-catégories : les erreurs de lecture faites par les patients, leur automatisation à travers les erreurs faites par des systèmes de reconnaissance automatique de la parole, et enfin les durées et emplacements des pauses de lecture. Ces marqueurs sont validés sur différentes formes de somnolence (objective et subjective). Enfin, nous proposons une méthodologie pour entraîner un classifieur dans la visée d’une utilisation clinique de ces descripteurs vocaux pour la détection de trois symptômes liés à la somnolence excessive. Nous proposons une analyse détaillée des résultats obtenus et des descripteurs employés par le classifieur. Pour aller plus loin, nous proposons ensuite de rapprocher le problème de classification de la réalité du raisonnement clinique en classifiant deux syndromes dérivés des précédents symptômes. Enfin, dans cette même direction, nous proposons des perspectives de recherche autour des réseaux de symptômes, dans la cadre de la recherche en médecine numérique sur la somnolence et sur la psychiatrie numérique de manière plus générale.
Chapter
Around the world, cardiovascular disease is the leading cause of death. Cardiovascular disease prediction is a big goal in the world of medical science when it comes to clinical data analysis. The prediction of the heart disease model is examined with various combinations of characteristics, and several needful classification techniques have been considered. In addition to grid search CV, majority voting ensemble classifier has considered enhanced performance level concerning accuracy for the heart disease diagnosis. Implications of data pre-processing techniques are important to manipulate datasets structure plan to the fast and accurate result of cardiovascular disease prediction. In this paper, author had been used the statistical pre-processing method Z-score normalization is the process of rescaling the features. For precise and crisp analysis, various machine learning techniques have been studied. Three computational techniques had been used, namely K-nearest neighbor (KNN), and support vector machine (SVM), stochastic gradient descent (SGD), and random forest (RF) is used for the first stage of implementation after that comparison to grid search CV with all four classification model and finally incorporate ensemble classifier for result analysis. The result shows that the ensemble model and random forest with grid search CV produce a high accuracy of 87% on given datasets. The heart disease dataset is taken from the department of cardiology, Excelcare Hospitals, Guwahati, Assam, for the analysis of performance in each technique which has been evaluated.
Article
Full-text available
span lang="EN-US">In order to develop the assessment of phonocardiogram “PCG” signal for discrimination between two of people classes – individuals with heart disease and healthy one- we have adopted the database provided by "The PhysioNet/Computing in Cardilogy Challenge 2016", which contains records of heart sounds 'PCG '. This database is chosen in order to compare and validate our results with those already published. We subsequently extracted 20 features from each provided record. For classification, we used the Generalized Linear Model (GLM), and the Support Vector Machines (SVMs) with its different types of kernels (i.e.; Linear, polynomial and MLP). The best classification accuracy obtained was 88.25%, using the SVM classifier with an MLP kernel.</span
Article
Full-text available
This paper presents an intelligent algorithm for heart diseases diagnosis using phonocardiogram (PCG). The proposed technique consists of four stages: Data acquisition, pre-processing, feature extraction and classification. PASCAL heart sound database is used in this research. The second stage concerns with removing noise and artifacts from the PCG signals. Feature extraction stage is carried out using discrete wavelet transform (DWT). Finally, artificial neural network (ANN) has been used for classification stage with an overall accuracy 97%.
Article
Full-text available
Suggests a practice exercise using a straw. Notes that the basic intent is to alter the acoustic load, which is normally very low in comparison to the glottal impedance in a vowel. Contends that phonating into a small-diameter straw establishes an overall large positive pressure throughout the vocal tract with the semi-occlusion at the lips, the vocal folds are kept apart, vibrating only with a small amplitude in a horizontal plane, and this is healthy for the tissues while the abdominal muscles get a good workout. Includes references.
Article
Full-text available
We present an assessment of the practical value of existing traditional and non-standard measures for discriminating healthy people from people with Parkinson's disease (PD) by detecting dysphonia. We introduce a new measure of dysphonia, Pitch Period Entropy (PPE), which is robust to many uncontrollable confounding effects including noisy acoustic environments and normal, healthy variations in voice frequency. We collected sustained phonations from 31 people, 23 with PD. We then selected 10 highly uncorrelated measures, and an exhaustive search of all possible combinations of these measures finds four that in combination lead to overall correct classification performance of 91.4%, using a kernel support vector machine. In conclusion, we find that non-standard methods in combination with traditional harmonics-to-noise ratios are best able to separate healthy from PD subjects. The selected non-standard methods are robust to many uncontrollable variations in acoustic environment and individual subjects, and are thus well-suited to telemonitoring applications.
Article
Full-text available
Children with autism spectrum disorder (ASD) who can speak often exhibit abnormal voice quality and speech prosody, but the exact nature and underlying mechanisms of these abnormalities, as well as their diagnostic power are currently unknown. Here we quantified speech abnormalities in terms of the properties of the long-term average spectrum (LTAS) and pitch variability in speech samples of 83 children (41 with ASD, 42 controls) ages 4–6.5 years, recorded while they named a sequence of daily life pictures for 60 s. We found a significant difference in the group's average spectra, with ASD spectra being shallower and exhibiting less harmonic structure. Contrary to the common impression of monotonic speech in autism, the ASD children had a significantly larger pitch range and variability across time. A measure of this variability, optimally tuned for the sample, yielded 86% success (90% specificity, 80% sensitivity) in classifying ASD in the sample. These results indicate that speech abnormalities in ASD are reflected in its spectral content and pitch variability. This variability could imply abnormal processing of auditory feedback or elevated noise and instability in the mechanisms that control pitch. The current results are a first step toward developing speech spectrum-based bio-markers for early diagnosis of ASD.
Article
Congenital heart disease is now the most common severe congenital abnormality found in live births and the cause of more than half the deaths from congenital anomalies in childhood. Heart murmurs are often the first signs of pathological changes of the heart valves, and they are usually found during auscultation in the primary health care. Auscultation is widely applied in clinical activity; nonetheless sound interpretation is dependent on clinician training and experience. Distinguishing a pathological murmur from a physiological murmur is difficult and prone to error. To address this problem we have devised a simplified approach to pediatric cardiac scanning. This will not detect all forms of congenital heart disease but will help in the diagnosis of many defects. Cardiac auscultatory examinations of 93 children were recorded, digitized, and stored along with corresponding echocardiographic diagnoses, and automated spectral analysis using discrete wavelet transforms was performed. Patients without heart disease and either no murmur or an innocent murmur (n = 40) were compared to patients with a variety of cardiac diagnoses and a pathologic systolic murmur present (n = 53). A specificity of 100% and a sensitivity of 90.57% were achieved using signal processing techniques and a k-nn as classifier.
Conference Paper
Parkinson's disease is a chronic neurological degenerative disease affecting the central nervous system responsible for essentially progressive evolution movement disorders. The detection of this disease is made using a clinical diagnosis made by an expert.
Article
Low-frequency vocal modulations here designate slow disturbances of the phonatory frequency F0. They are present in all voiced speech sounds, but their properties may be affected by neurological disease. An analysis method, based on continuous wavelet transforms, is proposed to extract the phonatory frequency trace and low-frequency vocal modulation in sustained speech sounds. The method is used to analyze a corpus of vowels uttered by male and female speakers, some of whom are healthy and some of whom suffer from Parkinson’s disease. The latter present general speech problems but their voice is not perceived as tremulous. The objective is to discover differences between speaker groups in F0 low-frequency modulations. Results show that Parkinson’s disease has different effects on the voice of male and female speakers. The average phonatory frequency is significantly higher for male Parkinsonian speakers. The modulation amplitude is significantly higher for female Parkinsonian speakers. The modulation frequency is significantly higher and the ratio between the modulation energies in the frequency bands [3 Hz, 7 Hz] and [7 Hz, 15 Hz] is significantly lower for Parkinsonian speakers of both genders.
Article
There has been considerable recent research into the connection between Parkinson's disease (PD) and speech impairment. Recently, a wide range of speech signal processing algorithms (dysphonia measures) aiming to predict PD symptom severity using speech signals have been introduced. In this paper, we test how accurately these novel algorithms can be used to discriminate PD subjects from healthy controls. In total, we compute 132 dysphonia measures from sustained vowels. Then, we select four parsimonious subsets of these dysphonia measures using four feature selection algorithms, and map these feature subsets to a binary classification response using two statistical classifiers: random forests and support vector machines. We use an existing database consisting of 263 samples from 43 subjects, and demonstrate that these new dysphonia measures can outperform state-of-the-art results, reaching almost 99% overall classification accuracy using only ten dysphonia features. We find that some of the recently proposed dysphonia measures complement existing algorithms in maximizing the ability of the classifiers to discriminate healthy controls from PD subjects. We see these results as an important step toward noninvasive diagnostic decision support in PD.