Variance in Classifying Affective State via
Electrocardiogram and Photoplethysmography
1st Zachary Dair
Munster Technological University
2nd Dr Samantha Dockray
University College Cork
3rd Dr Ruairi O’Reilly
Munster Technological University
Abstract—Advances in wearable technology have signiﬁcantly
increased the sensitivity and accuracy of devices for recording
physiological signals. Commercial off-the-shelf wearable devices
can gather large quantities of physiological data un-obtrusively.
This enables momentary assessments of human physiology, which
provide valuable insights into an individual’s health and psycho-
logical state. Leveraging these insights provides signiﬁcant bene-
ﬁts for human-to-computer interaction and personalised health-
care. This work contributes an analysis of variance occurring in
features representative of affective states extracted from electro-
cardiograms and photoplethysmography; subsequently identiﬁes
the cardiac measures most descriptive of affective states from
both signals and provides insights into signal and emotion-speciﬁc
cardiac measures; ﬁnally baseline performance for automated
affective state detection from physiological signals is established.
Index Terms—Affective Computing, Psychophysiology, Elec-
trocardiogram, Photoplethysmography, Affective States
A signiﬁcant goal of Affective Computing is to improve hu-
man to computer interaction by providing a system with a level
of emotional intelligence that aids natural communications and
is capable of including emotional components . This has
commonly been approached by deriving emotional states from
speech, facial expressions, gesture and body posture analysis.
Utilising physiological signals to communicate psychological
information is a recent exploration in the domain, likely
stemming from the increased and growing accessibility of
signals from wearable devices.
A physiological signal represents an individual’s biologi-
cal processes, derived from core aspects of human biology.
Analysis of these signals can enable diagnostics, for instance,
analysing heart rate to detect arrhythmia . Psychological
analysis can also be enabled as mental states originating from
unconscious effort typically present a noticeable physiological
change in the relevant human system . The combined
analysis enables a richer understanding of individuals’ in terms
of their mental and physical health .
Psychological states are complex processes comprised of
several components, including feelings, cognitive reactions,
behaviour and thoughts . Mapping psychological states to a
corresponding experience of the individual provides valuable
information in the context of well-being, health (physical and
mental), social contexts, experiences and emotional responses
Electrocardiograms (ECG) are physiological signals that
measure the electrical activity of the heart. Typically recorded
in a clinical setting using multiple leads and electrodes at-
tached to the chest or whole body of the individual. Recent
improvements have seen the development of wearable ECG
monitors predominately limited to research-grade (RG) and
medical-grade (MG) devices, with a small number of com-
mercial off-the-shelf (COTS) devices. Photoplethysmography
(PPG) was developed to measure heart activity through vari-
ations in blood volume of the skin, using a light-emitting-
diode and photodetector. Until recently, PPG was the sole
method provided by (COTS) devices such as smart-rings or
smart-watches to enable individuals to monitor heart activity
A concern with the analysis of ECG and PPG is the variance
caused by differing sensor placement and signal granularity
, . As the sampling frequency of PPG is lower to
reduce battery consumption in COTS devices. These variances
motivate the investigation of the suitability and performance
of both signals to detect a range of affective states.
This work investigates the variance in classifying affective
states from physiological signals representative of heart activ-
ity by addressing the following research aims (i) To evaluate
signal speciﬁc variances in standard cardiac measures utilised
for emotive classiﬁcation; (ii) To highlight the precedence of
cardiac features in classifying affective state via feature im-
portance; (iii) To evaluate the variance in automated affective
state classiﬁcation between ECG and PPG.
II. RE LATE D WORK
A. Psychological Constucts
Multiple psychological constructs exist to describe human
psychology. These constructs range from discrete models,
where each psychological state is an individual component,
to dimensional models where the emotions span two or more
dimensions. For example, Basic Emotion Theory is a discrete
model containing the emotions: Anger, Joy, Sadness, Surprise,
Fear and Disgust , and the Circumplex Model  is a two-
dimensional model consisting of, Arousal, the activation level
of the individual commonly seen as excitement, and Valence,
which is the unpleasantness or pleasantness of the experience.
Existing affective state detection research has focused on
discrete and dimensional models . Additionally, speciﬁc
arXiv:2207.02916v1 [cs.HC] 6 Jul 2022
mental states such as Stress and Anxiety have received sub-
stantial interest due to their health impact , . This work
utilises Arousal and Valence as represented in the Circumplex
model and discrete psychological states relating to stress,
providing analysis from both perspectives. Stress is a complex
mental state, included in some emotion models discretely or
as a combination of high arousal and negative valence .
B. Heart-Related Physiological Signals
The prevalence of ECG and heart-related data in wearable
health monitors stems from a desire to monitor a critical organ.
This data has clear ties to health through arrhythmia detection
and heart rate as a measure of ﬁtness . Furthermore, as the
heart is controlled involuntarily through the autonomic nervous
system (ANS), it facilitates the identiﬁcation of a relationship
between involuntary physiological changes and psychological
states. Multiple psychophysiological theories aim to explain
this relationship, such as Polyvagal Theory .
Heart activity is complex to capture. In medicine, the gold
standard utilises a 12-lead ECG, resulting in comprehensive
data recorded from multiple electrodes on the human body.
However, in ambulatory research and daily life, this method is
not feasible. Typically RG equipment uses several electrodes,
commonly 3-lead ECG; and occasionally including PPG as an
additional measure. COTS devices tend to rely solely on PPG
to monitor heart activity, however with recent advances, top
of the range smart-watches (Apple Watch 4-9, Galaxy Active
2, Fitbit) now include a 1-lead ECG, which is promising for
portable ECG analysis .
The Apple Watch records the time between R waves of
the QRS signal, these RR intervals (RR) enable Heart Rate
Variability (HRV) analysis for the detection of relaxation
and stress. The statistical analysis approach presented in
 demonstrates the capability of COTS devices to suit-
ably detect the RR intervals to a high standard. Notably,
approximately 10% of heartbeats were missing from the Apple
Watch recording. This missing data impacts the computation
of HRV features and subsequently the classiﬁcation accuracy
of automated affective state detection.
Additional physiological signals such as electrodermal ac-
tivity (EDA), respiration, skin temperature, electromyogram
(EMG), and electrooculogram (EOG) have demonstrated po-
tential for affective state detection , , however, due to
additional sensor requirements are excluded from this work.
C. Affective ECG Analysis
Numerous studies of affective states conduct custom data
collection, providing precise control over the psychological
domain explored. Varied stimuli have aided the elicitation
of psychological states, for example, images, movie clips,
music, and dedicated tasks to elucidate stress, such as the Trier
Social Stress Test , . Additionally a large number of
open-access or on-request datasets have been created, a subset
are utilised in , namely “AMIGOS”, “DREAMER” ,
“WESAD” , “SWELL-KW” . There is a distinct lack
of emotionally labelled ECG signals from COTS devices. This
is likely due to only recent COTS devices providing ECG
monitoring capabilties .
ECG signals contain noise introduced by motion arte-
facts, biological differences and sensor de-attachment. Sig-
nal processing techniques such as Butterworth Bandpass and
Notch ﬁlters are utilised to reduce these noise levels .
Subsequently, features suitable for emotive classiﬁcation are
extracted from the pre-processed signals. Often statistical and
frequency measures of ECG are utilised, however, arguably the
most valuable features relate directly to physiological changes
in the heart, commonly analysed by the parameters P,Q,R,S,T,
each relating to a process in the heartbeat cycle. Features
derived from these parameters, such as the QRS complex,
Beats-Per-Minute (BPM), and HRV, are inherently linked to
human psychology as they originate from involuntary changes
due to the ANS, as explored by Poly-Vagal theory.
Recent approaches have favoured deep learning method-
ologies , achieving signiﬁcant accuracies on multi-class
classiﬁcations. However, older studies focusing on linear and
quadratic discriminant analysis remain relevant, achieving
suitable accuracy for their respective classiﬁcations , .
Furthermore, adapted ML classiﬁers and combinations of
ML classiﬁers forming ensembles have demonstrated potential
for binary classiﬁcations in emotion detection , . In
comparison to other studies,  achieved the highest accuracy
for multiple emotion detection from ECG data and reported
setting the new state of the art for ECG emotion detection.
D. Affective PPG Analysis
PPG analysis provided by COTS devices has typically
focused on tracking medical conditions, physical activity, and
stress. The detrimental effects of stress on human health is a
signiﬁcant motivator for physiological analysis, and preventa-
tive healthcare research .
Instances of PPG have demonstrated similar noise levels to
ECG, in addition skin tone and environmental light effects
can impact the signal quality. PPG, EDA (a measure of the
electrical potential of the skin) and acceleration were recorded
in  to create a multi-modal stress detection model. Heart
rate features such as inter-beat intervals, HRV and frequency
measures alongside EDA were leveraged with a range of ML
classiﬁers to achieve 88.20% across all subjects.
Further stress detection studies focused on the WESAD
dataset,  utilising raw PPG data, with the aim of re-
moving the requirement for hand-crafted features. In their
“Neutral, Stress, Amusement” experiment, an LDA classiﬁer
reached 65.3% accuracy classifying 60-second windows of
PPG, demonstrating the utility of raw PPG for stress detection.
A recent approach  reports 99% accuracy on WESAD
detecting baseline, amusement, meditation and stress. This
approach conducts extensive signal processing techniques,
including windowing the PPG data into 5-second windows.
Furthermore, a complex feature extraction stage method was
adopted, relying on autoencoder features and recursive feature
elimination contributing to the high accuracy achieved.
TABLE I: Details per dataset utilised
Dataset ECG kHz PPG kHz # Windows Label
CASE 1000 1000 14650 Arousal/Valence
WESAD 700 64 9106 B S A M
B: Baseline, S: Stress, A: Amusement, M: Meditation
For the purposes of this work, the focus was narrowed to RG
physiological signals due to a lack of publicly available data
for COTS devices. “The Dataset of Continuous Affect Anno-
tations and Physical Signals for Emotion Analysis” (CASE)
 and “The Wearable Stress and Affect Detection Dataset”
(WESAD) see Table I) were the datasets utilised in this
work. The datasets were selected due to their inclusion of
ECG and PPG with psychological annotations, these signals
were recorded using RG devices in a laboratory environment.
CASE incorporates Arousal and Valence annotations, achieved
by collecting joystick movement resulting from emotionally
stimulating video clips. WESAD focuses on stress detection
with limited affective states: a baseline state elicited from
“neutral reading”, amusement caused by comedic video clips,
a Trier Social Stress Test  to provoke stress, and a med-
itation stage aimed at “de-exciting” the individual following
the amusement and stress stages.
ECG and PPG signals recorded per subject within these
datasets span the duration of the experiment resulting in
approx 91/58 minutes for WESAD/CASE, utilising a signal
processing method known as windowing, 10-second segments
of data were isolated. A 10-second duration was selected due
to efﬁcient performance demonstrated in , additionally, this
duration enables low latency as classiﬁcation occurs every 10-
seconds and contains adequate data for feature computation.
A Butterworth-Bandpass ﬁlter is used to reduce signal
noise, facilitating the extraction of selected features while
maintaining a degree of “rawness” in the signal. A simplistic
ﬁlter is used as it more closely aligns with COTS devices and
their reduced computational power.
Once ﬁltered and windowed, the data is aligned with its
psychological annotations. For WESAD, annotations were
numeric values sampled at 700Hz. Each value from 0-4 is
associated with the psychological states: Transient, Baseline,
Stress, Amusement and Meditation. Annotations 5-7, and
Transient data are omitted as per the author’s instructions .
Certain windows may include multiple emotive annotations;
hence to identify the most pertinent emotion, the mean of all
annotation values per window is calculated and rounded to the
nearest annotation (1-4) using Euclidean distance. Alternative
approaches  omit these windows and the neighbouring
segments to prevent confusion from mixed emotions.
A similar procedure is required for CASE; the raw annota-
tion data is provided as values on an x and y-axis representing
Arousal and Valence , these values are normalised to a
TABLE II: Cardiac features extracted from ECG and PPG
Beats Per Minute BPM
Interbeat Interval IBI
Std dev. of RR Intervals SDNN
Std dev. of successive differences SDSD
RMSE of successive differences RMSSD
Proportion of differences above (20ms)/(50ms) (pNN20)/(pNN50)
Median absolute deviation of RR intervals MAD
Low frequency spectrum 0.05-0.15Hz LF
High frequency spectrum 0.15-0.5Hz HF
High/Low frequency ratio HF/LF
Estimated breathing rate BR
range of 0.5 to 9.5, and subsequently converted to discrete
representations, resulting in low (0.5-3.5), neutral (3.5-6.5) and
high (6.5-9.5) Arousal and Valence for each window.
C. Emotion Characteristics in Cardiac Signals
Cardiac signals provide a wide array of features. Many
exhibit high efﬁcacy indicating emotional information from
the source signal. The python toolkit HeartPy  extracts
the features seen in Table II from 10-second windows of data.
D. Signal Variation Analysis
This work measures the absolute difference in extracted
features BPM, IBI and breathing rate from each signal as
a means of evaluating ECG and PPG variance. The method
utilises the features extracted from 60-second windows of data
for each signal.
Identifying the most inﬂuential cardiac features for psy-
chological classiﬁcations provides valuable insights for signal
choice. Furthermore, a variance in the importance of the
cardiac features between ECG and PPG may indicate higher
suitability in one signal over another. A game theory approach
for ML explanations known as “Shapley Additive exPlana-
tions“  is adopted to identify feature importance. This
approach uses SHAP values which represent the degree of
change on the model output caused by each individual feature,
the magnitude of change and number of samples affected
indicate the impact factor of a given feature.
E. Automated Emotion Detection
Comparing performance measures across ten ML classiﬁers
provides a method for identifying the suitability of ECG and
PPG for automated affective state detection. The annotated
features for each signal are loaded individually on a per dataset
basis. 20% of the data acts as a hold-out set, essentially
un-seen data to evaluate the ﬁnal classiﬁer. Five-fold cross-
validation is utilised, transforming the remaining 80% of data
into “folds”, enabling a per fold classiﬁcation. Subsequently,
comparing the best inter-fold classiﬁer enables the identiﬁca-
tion of the more robust and performant classiﬁers. Finally, the
selected classiﬁer is trained on the initial 80% of data and
tested on the hold-out set.
Fig. 1: Five-Fold cross-validation of selected models detecting
multiple affective states
Fig. 2: Absolute difference for the IBI, BPM and BR feature
values across ECG and PPG
IV. RES ULT S AN D DISCUSSION
A. Cardiac Feature Variance
The wearables’ sample rate disparity (See Table I) is ev-
ident in the feature level results displayed in Figure 2. The
reduced sample rates in WESAD result in slightly decreased
granularity of ECG data and signiﬁcantly in PPG data com-
pared to CASE. Furthermore, it leads to a visible impact
on feature computation, as demonstrated by IBI and BPM.
In CASE, IBI and BPM contain a small degree of variance
with substantial spikes relative to the average. These occur
in isolated data segments, likely caused by electrode discon-
nection, movement, or subject-speciﬁc factors. However, a
signiﬁcantly higher ﬂuctuation occurs in WESAD in terms
of magnitude and frequency, likely stemming from the high
sample rate disparity and reduced PPG granularity. Notably,
a similar degree of variance occurs in BPM and IBI due to
their inherent links to heart rate. Interestingly, the BR feature
exhibits a high deviation between ECG and PPG in both
datasets. This deviation indicates that one of the signals is
unreliably computing BR, likely caused by the wrist and ﬁnger
Fig. 3: Classiﬁcation impact of ECG and PPG features from
WESAD indicated by Shapley Additive exPlanations (SHAP)
placement of the PPG sensors. Furthermore, the previously
identiﬁed variance spikes of IBI and BPM in CASE are present
in BR, further demonstrating that speciﬁc data segments may
beneﬁt from additional signal processing to reduce noise and
increase classiﬁcation accuracy.
B. Cardiac Feature Importance
Twenty data windows failed feature extraction due to a
lack of discernible heart rate. These problematic windows
demonstrated signiﬁcant signal spikes and sporadic behaviour,
akin to electrode disconnection, motion artefacts, and high
signal noise. For this analysis, the occurrence has minimal
impact. However, it will occur more frequently in ambulatory
analysis, requiring further signal processing to overcome.
Analysing the SHAP values per feature indicates that IBI,
BPM, and BR have the most impact on classiﬁcation. The
remaining features, most notably those relating to frequency,
exhibit inconsistent inﬂuence between the signals. Further-
more, the adopted feature importance approach enables the
evaluation of feature inﬂuence per affective state, as demon-
strated in Figure 3. The disparity in the inﬂuence of the
same feature across multiple affective states indicates a higher
utility for certain features to represent speciﬁc affective states,
speciﬁcally shown with RMSSD in PPG for stress detection.
C. Automated Affective State Classiﬁcation Variance
Random Forest (RF) was selected as the most perfor-
mant classiﬁer and subsequently evaluated on the hold-out
set, where ECG outperforms PPG consistently across both
datasets. The full model comparison is shown in Figure 1.
In contrast with the state of the art ,  the performance
achieved is much lower for ECG and PPG; however, this
work focuses on the variance between the signals for affective
analysis rather than achieving high classiﬁcation accuracy.
Analysing the ROC curves from RF demonstrates the true
and false positive rates per signal for each affective state, see
Figure 4. On average, ECG demonstrates increased capabilities
for affective classiﬁcation by achieving a higher ROC area than
PPG, varying with a range of 0.02-0.10. Interestingly, ECG
Fig. 4: ROC Curves representing the OVR classiﬁcation vari-
ance between ECG and PPG
and PPG reach an identical area value for Stress classiﬁcations
in WESAD data. Further investigations identify a potential
trade-off between true and false positives using PPG for Stress
The identiﬁed variance of BPM and IBI in WESAD demon-
strates the importance of sampling rates to prevent inconsistent
computations of features due to signal granularity differences.
Furthermore, feature variance analysis enables the identiﬁca-
tion of abnormal signal activity from sensor disconnection or
motion artefacts, valuable for ambulatory analysis.
Feature importance identiﬁes IBI, BPM and BR as the most
impactful features for affective classiﬁcation across ECG and
PPG. Notably, the remaining features exhibit inconsistent im-
pacts, speciﬁcally SD1/SD2 and RMSSD, which demonstrate
a greater impact in PPG, warranting the exploration of signal-
speciﬁc features. Moreover, a variance per affective state indi-
cates that certain features provide a greater degree of emotion-
speciﬁc information beneﬁcial for tailored applications.
Standard ML classiﬁers achieve moderate classiﬁcation
accuracy for detecting multiple affective states using heart-
activity features extracted from minimally ﬁltered ECG and
PPG signals. Therefore providing a baseline for automated
multi-class affective state detection and demonstrating the
validity of heart-activity features from ECG and PPG.
 Rosalind W. Picard. Affective computing: challenges. International
Journal of Human-Computer Studies, 59(1):55–64, 2003.
 Eduardo Jos´
e da S. Luz et al. Ecg-based heartbeat classiﬁcation for
arrhythmia detection: A survey. CMPB, 127:144–164, 2016.
 Jonghwa Kim and E. Andr´
e. Emotion recognition based on physiological
changes in music listening. IEEE PAMI, 30(12):2067–2083, 2008.
 Andrius Dzedzickis, A. Kaklauskas, and V. Bucinskas. Human emotion
recognition: Review of sensors and methods. Sensors, 20(3), 2020.
 Foteini Agraﬁoti et al. Ecg pattern analysis for emotion detection. IEEE
Trans. Affective Comput., 3(1):102–115, 2012.
 Samantha Dockray, Siobh´
an O’Neill, and Owen Jump. Measuring the
psychobiological correlates of daily experience in adolescents. Journal
of Research on Adolescence, 29(3):595–612, 2019.
 Shadi Mahdiani et al. Is 50 hz high enough ecg sampling frequency for
accurate hrv analysis? In EMBC, pages 5948–5951, 2015.
 Paul Ekman. Basic emotions. Handbook of cognition and emotion,
pages 45–60, 1999.
 Jonathan Posner et al. The circumplex model of affect. Development
and Psychopathology, 17(3):715–734, 2005.
 Lin Shu et al. A review of emotion recognition using physiological
signals. Sensors, 18(7):2074, Jun 2018.
 Huseyin Uyarel, Ertan O, Necati C, Ahmet K, and Nese C. Effects of
anxiety on qt dispersion in healthy young men. Acta Cardio., 2006.
 Can Yekta Said et al. Continuous stress detection using wearable sensors
in real life: Algorithmic programming contest case study. Sensors, 2019.
 A. K. Johnson and E. A. Anderson. Stress and arousal. In J. T. Cacioppo
and L. G. Tassinary (Eds.), Principles of psychophysiology: Physical,
social, and inferential elements. Cambridge University Press, 1990.
 Porges S W et al. Vagal tone and the physiological regulation of emotion.
Monogr. Soc. Res. Child Dev., 59(2-3):167–186, 1994.
 Nabeel Saghir et al. A comparison of manual electrocardiographic
interval and waveform analysis in lead 1 of 12-lead ecg and apple watch
ecg: A validation study. Cardiovascular Digital Health Journal, 2020.
 David Hernando, S. Roca, J. Sancho, ´
A. Alesanco, and R. Bail´
Validation of the apple watch for heart rate variability measurements
during relax and mental stress in healthy subjects. Sensors, 18(8), 2018.
 Philip at al. Schmidt. Introducing wesad, a multimodal dataset for
wearable stress and affect detection. In ICMI 20. ACM, 2018.
 Melissa A Birkett. The trier social stress test protocol for inducing
psychological stress. J. Vis. Exp., October 2011.
 Pritam Sarkar and Ali Etemad. Self-supervised ECG representation
learning for emotion recognition. IEEE Trans. Affective Comput., 2021.
 Stamos Katsigiannis and Naeem Ramzan. Dreamer: A database for
emotion recognition through eeg and ecg signals from wireless low-cost
off-the-shelf devices. IEEE Journal of Bio. and Health Info., 2018.
 Saskia Koldijk, Maya Sappelli, Suzan Verberne, Mark A. Neerincx, and
Wessel Kraaij. The SWELL knowledge work dataset for stress and user
modeling research. In ICMI, pages 291–298. ACM, 2014.
 Mimma Nardelli, G. Valenza, A. Greco, A. Lanata, and P. Scilingo.
Recognizing emotions induced by affective sounds through heart rate
variability. IEEE Trans. Affective Comput., 6(4):385–394, 2015.
 Yu-Liang Hsu, Jeen-Shing Wang, Wei-Chun Chiang, and Chien-Han
Hung. Automatic ECG-based emotion recognition in music listening.
IEEE Trans. Affective Comput., 11(1):85–99, 2020.
 Theekshana Dissanayake, Y. Rajapaksha, R. Ragel, and I. Nawinne. An
ensemble learning approach for electrocardiogram sensor based human
emotion recognition. Sensors, 19(20), 2019.
 Yekta Said Can, Niaz Chalabianloo, Deniz Ekiz, and Cem Ersoy. Con-
tinuous stress detection using wearable sensors in real life: Algorithmic
programming contest case study. Sensors, 19(8):1849, 2019.
 Aneta Lisowska, Szymon Wilk, and Mor Peleg. Catching patient’s
attention at the right time to help them undergo behavioural change.
In Artiﬁcial Intelligence in Medicine, pages 72–82. Spr. Int. Pub., 2021.
 Nilava Mukherjee, Sumitra Mukhopadhyay, and Rajarshi Gupta. Real-
time mental stress detection technique using neural networks towards a
wearable health monitor. Meas. Sci. Technol., 33(4):044003, 2022.
 Sharma Karan et al. A dataset of continuous affect annotations and
physiological signals for emotion analysis, 2018.
 Paul van Gent et al. Heartpy: A novel heart rate algorithm for the
analysis of noisy signals. Transportation Research Part F: Trafﬁc
Psychology and Behaviour, 66:368–378, 2019.
 Christoph Molnar. Interpretable Machine Learning. Lulu.com, 2022.