ArticlePDF Available

Biomechanical and machine learning approaches to automating the identification of musical styles and emotions through human motion analysis

Authors:

Abstract and Figures

This study explores the intricate relationship between biomechanical movements and musical expression, focusing on the identification of musical styles and emotions. Violin performance is characterized by complex interactions between physical actions—such as bowing techniques, finger placements, and posture—and the resulting acoustic output. Recent advances in motion capture technology and sound analysis have enabled a more objective examination of these processes. However, the current literature frequently addresses biomechanics and acoustic features in isolation, lacking an integrated understanding of how physical movements translate into specific musical expressions. Machine Learning (ML), particularly Long Short-Term Memory (LSTM) networks, provides a promising avenue for bridging this gap. LSTM models are adept at capturing temporal dependencies in sequential data, making them suitable for analyzing the dynamic nature of violin performance. In this work, they have proposed a comprehensive model that combines biomechanical analysis with Mel-spectrogram-based LSTM modeling to automate the identification of musical styles and emotions in violin performances. Using motion capture systems, Inertial Measurement Units (IMUs), and high-fidelity audio recordings, we collected synchronized biomechanical and acoustic data from violinists performing various musical excerpts. The LSTM model was trained on this dataset to learn the intricate connections between physical movements and the acoustic features of each performance. Key findings from the study demonstrate the effectiveness of this integrated approach. The LSTM model achieved a validation accuracy of 92.5% in classifying musical styles and emotions, with precision, recall, and F1-score reaching 94.3%, 92.6%, and 93.4%, respectively, by the 100th epoch. The analysis also revealed strong correlations between specific biomechanical parameters, such as shoulder joint angle and bowing velocity, and acoustic features, like sound intensity and vibrato amplitude.
Content may be subject to copyright.
Molecular & Cellular Biomechanics 2024, 21(2), 397.
https://doi.org/10.62617/mcb.v21i2.397
1
Article
Biomechanical and machine learning approaches to automating the
identification of musical styles and emotions through human motion analysis
Yuan Ding
Nantong Normal College, Nantong 226010, China; YuanDing19@outlook.com
Abstract: This study explores the intricate relationship between biomechanical movements
and musical expression, focusing on the identification of musical styles and emotions. Violin
performance is characterized by complex interactions between physical actionssuch as
bowing techniques, finger placements, and postureand the resulting acoustic output. Recent
advances in motion capture technology and sound analysis have enabled a more objective
examination of these processes. However, the current literature frequently addresses
biomechanics and acoustic features in isolation, lacking an integrated understanding of how
physical movements translate into specific musical expressions. Machine Learning (ML),
particularly Long Short-Term Memory (LSTM) networks, provides a promising avenue for
bridging this gap. LSTM models are adept at capturing temporal dependencies in sequential
data, making them suitable for analyzing the dynamic nature of violin performance. In this
work, they have proposed a comprehensive model that combines biomechanical analysis with
Mel-spectrogram-based LSTM modeling to automate the identification of musical styles and
emotions in violin performances. Using motion capture systems, Inertial Measurement Units
(IMUs), and high-fidelity audio recordings, we collected synchronized biomechanical and
acoustic data from violinists performing various musical excerpts. The LSTM model was
trained on this dataset to learn the intricate connections between physical movements and the
acoustic features of each performance. Key findings from the study demonstrate the
effectiveness of this integrated approach. The LSTM model achieved a validation accuracy of
92.5% in classifying musical styles and emotions, with precision, recall, and F1-score reaching
94.3%, 92.6%, and 93.4%, respectively, by the 100th epoch. The analysis also revealed strong
correlations between specific biomechanical parameters, such as shoulder joint angle and
bowing velocity, and acoustic features, like sound intensity and vibrato amplitude.
Keywords: biomechanical movements; machine learning; musical styles; emotions; physical
movements; joint angle and bowing velocity; motion capture
1. Introduction
Music performance, particularly in instruments like the violin, is a complex
interplay of Physical Movements (PM) and Emotional Expression (EE) [1,2].
Violinists convey a wide range of emotions and musical styles through precise
control of their movements, such as bowing techniques, finger placements, and body
posture [3,4]. The study of these movements and their impact on musical output has
been a subject of interest in fields like musicology, biomechanics, and music
cognition [5]. Traditionally, research in this area has focused on qualitative analyses
and subjective interpretations to understand how musicians produce expressive
performances [68]. However, these methods frequently lack the precision to fully
capture the intricate biomechanical processes involved in violin playing [9].
CITATION
Ding Y. Biomechanical and machine
learning approaches to automating
the identification of musical styles
and emotions through human motion
analysis. Molecular & Cellular
Biomechanics. 2024; 21(2): 397.
https://doi.org/10.62617/mcb.v21i2.397
ARTICLE INFO
Received: 20 September 2024
Accepted: 8 October 2024
Available online: 6 November 2024
COPYRIGHT
Copyright © 2024 by author(s).
Molecular & Cellular Biomechanics
is published by Sin-Chn Scientific
Press Pte. Ltd. This work is licensed
under the Creative Commons
Attribution (CC BY) license.
https://creativecommons.org/licenses/
by/4.0/
Molecular & Cellular Biomechanics 2024, 21(2), 397.
2
In recent years, technological advancements have enabled more detailed
investigations into the biomechanics of musical performance [1012]. Current studies
employ motion capture systems and Inertial Measurement Units (IMUs) to record the
PM of musicians, offering a more objective view of how different playing techniques
contribute to ME [13,14]. Similarly, acoustic analysis, mainly through the use of Mel-
spectrograms, has provided insights into the audio features that correspond with
emotional and stylistic variations in music [15,16]. Despite these advances, the
relationship between Biomechanical Movements (BM) and acoustic features remains
underexplored in an integrated and automated manner [17]. Most existing research
approaches this relationship through isolated analyses, frequently lacking a unified
framework that comprehensively connects performances physical and acoustic
aspects [18]. This limitation makes it challenging to understand how specific
movements influence the perceived musical output.
Machine Learning (ML), particularly Neural Network (NN) models like Long
Short-Term Memory (LSTM), offers a promising solution to this problem [19]. LSTM
networks can model complex temporal dependencies in sequential data, making them
ideal for analyzing the dynamic nature of music performance [20]. They can capture
long-term relationships between a violinists movements and the resulting sound,
facilitating identifying patterns corresponding to various musical styles and emotional
contexts. Unlike traditional analysis methods, ML can process large, multi-
dimensional datasets, allowing for a more nuanced and automated analysis of the
interplay between biomechanics and music [2125]. This provides an opportunity to
move beyond subjective evaluation and uncover the underlying mechanisms of ME
with greater accuracy and detail [2630].
The proposed work aims to address the limitations of current research by
developing a framework that integrates biomechanical analysis with a Mel-
spectrogram-based LSTM model to automate the identification of musical styles and
emotions in violin performances. The proposed work explores the intricate
relationship between BM and sound features in violin performances to automate the
identification of musical styles and emotions. By combining Motion Capture (MC)
technology, IMUs, and audio recording, detailed data on joint angles, forces, and
sound characteristics are collected from violinists performing a variety of musical
excerpts. Mel-spectrograms are used to extract time-frequency features from the
audio, providing a nuanced representation of the musics expressive elements. The
study leverages a Mel-spectrogram-based LSTM model to analyze this multi-
dimensional dataset. The LSTM, trained on synchronized biomechanical and acoustic
data, learns to recognize patterns corresponding to different musical styles and
emotional contexts. The work not only enhances the understanding of how PM
contributes to ME but also demonstrates the potential of ML in automating the analysis
of performance characteristics in music.
The paper is organized as follows: Section 2 presents the methodology, Section
3 presents the methodology, Section 4 presents the analysis, and Section 5 concludes
the paper.
Molecular & Cellular Biomechanics 2024, 21(2), 397.
3
2. Methodology
2.1. Participants
The study involved 17 violinists, comprising 11 Males and 6 Females, aged
between 20 and 35 years, with a mean age of 27.4 years. All participants were skilled
violinists with varying experience levels, ensuring diverse performance styles and
emotional expressions. Specifically, 10 participants were professional violinists with
over five years of experience performing in various settings such as orchestras,
chamber music groups, and solo performances. The remaining 7 participants were
advanced amateurs with at least three years of dedicated practice and performance
experience, often participating in community orchestras, ensembles, or solo recitals.
This blend of professional and amateur musicians provided a rich dataset for
examining the biomechanical nuances of violin playing across different levels of
expertise [31,32].
Participants were selected based on their familiarity with a broad repertoire of
musical styles, including classical, contemporary, and folk music. This variety ensured
that their physical expressionssuch as bowing techniques, posture, and body
movementcould be analyzed concerning different musical genres and emotional
contexts. Educational backgrounds varied, with 13 participants holding formal music
education degrees, ranging from undergraduate to masters levels in violin
performance. The remaining 4 participants, while not formally educated in music, had
extensive training through private lessons and had performed regularly in semi-
professional settings.
Each participant provided informed consent and completed a detailed
questionnaire outlining their performance experience, preferred musical genres, and
previous participation in biomechanical or music cognition studies. This information
helped contextualize their movement patterns during the study. By focusing solely on
violinists, the study aimed to explore the intricate relationship between a musicians
physical motionsuch as bowing dynamics, hand movements, and body posture
and the expressive qualities of their performance, thus contributing to the
understanding of how musical styles and emotions are embodied in the art of violin
playing.
2.2. Measurements
To comprehensively analyze the biomechanical and expressive elements of violin
performance, a multi-dimensional set of measurements was employed (Table 1),
capturing various aspects of the participants movements, posture, and the resulting
musical output. Kinematic, kinetic, and acoustic parameters were the primary
categories of measurements used to gain a holistic understanding of how violinists
express musical styles and emotions.
1) Kinematic Measurements were primarily obtained through the motion capture
system and IMUs. These measurements included joint angles at the shoulder,
elbow, and wrist to assess the range of motion and the fluidity of bowing and
fingering movements. Segmental velocities, both linear and angular, of the upper
arm, forearm, and hand were also recorded to understand the speed and dynamics
Molecular & Cellular Biomechanics 2024, 21(2), 397.
4
of the violinists techniques. The trajectory and angle of the bow relative to the
strings were meticulously tracked, providing insights into the nuances of bowing
style and technique that are essential for identifying variations in musical
expression. Additionally, postural adjustments of the head, torso, and lower body
were monitored to evaluate the violinists overall posture and balance, capturing
subtle shifts in body movement that might indicate emotional expression or
stylistic interpretation.
2) Kinetic Measurements were collected using the force plate and IMUs to explore
the forces and torques involved in violin performance. Ground reaction forces
recorded by the force plate provided weight distribution and balance data during
different performance phases, highlighting how the violinists stance and lower
body dynamics contributed to their overall expressiveness. The IMUs also
offered detailed information on the forces exerted by the arms and hands,
including the intensity of bowing and the pressure applied to the strings, which
are crucial for creating sound and emotional tone variations.
3) Acoustic Measurements were captured using a high-fidelity microphone,
synchronized with the MC and kinetic data. This allowed for a direct correlation
between PM and the resulting sound. Acoustic parameters such as sound
intensity, articulation, and vibrato were analyzed to understand how specific
biomechanical actions influenced the musical output. By integrating these
kinematic, kinetic, and sound measurements, the study aimed to unravel the
complex interplay between a violinists physical movements and their expressive
musical performance, providing a detailed framework for understanding the
embodiment of musical styles and emotions.
Table 1. Measurements.
Measurement type
Parameter
Description
Units
Kinematic
Joint angles
Angles at shoulder, elbow, and wrist joints.
Degrees (°)
Segmental velocities
Linear and angular velocities of the arm segments.
Meters per second (m/s), Degrees
per second (°/s)
Bow trajectory and angle
Path and angle of the bow relative to violin
strings.
Meters (m), Degrees (°)
Postural adjustments
Movements of the head, torso, and lower body.
Meters (m), Degrees (°)
Kinetic
Ground reaction forces
Forces exerted on the ground during the
performance.
Newtons (N)
Arm and hand forces
Forces applied by arms and hands.
Newtons (N)
Acoustic
Sound intensity
The volume of the sound produced.
Decibels (dB)
Articulation
Clarity and distinctness of musical notes.
Qualitative (Categorical)
Vibrato characteristics
Frequency and amplitude of vibrato.
Hertz (Hz), Amplitude (mm)
2.3. Conceptual framework
The conceptual framework of this study (Figure 1) integrates biomechanics,
music cognition, and ML to explore how violinists movements are linked to musical
styles and emotional expression. The framework is built on the premise that the PM
involved in violin playingsuch as bowing techniques, finger placements, and
postural adjustmentsare not merely mechanical features but are deeply intertwined
Molecular & Cellular Biomechanics 2024, 21(2), 397.
5
with the ME. By analyzing these movements with the resulting sound, the study aims
to decode the underlying patterns that characterize different musical styles and
emotional contexts.
Figure 1. Framework of the study.
Central to this framework is the dual analysis of biomechanical data and audio
output. The biomechanical feature involves capturing the violinists movements,
including joint angles, segmental velocities, and force dynamics, to understand the
physical expression of their performances. The acoustic aspect focuses on the audio
features of the performance, with a specific emphasis on how the energy distribution
across frequencies reflects the nuances of the music being played. By combining these
two aspects, the framework seeks to uncover the complex relationship between a
musicians physical movements and the expressive qualities of their performance.
The ML component is then employed to automate the identification of musical
styles and emotions. By processing both biomechanical and audio data, the framework
leverages advanced algorithms to identify patterns and correlations within this multi-
dimensional dataset. The aim is to develop a system that recognizes and categorizes
ME by analyzing how violinists embody different styles and emotions through their
movements. This integrated approach offers a holistic view of ME, advancing the
understanding of how PM and sound interact in violin playing.
2.4. Data collection
Data collection was meticulously designed to capture both the BM of the
violinists and the corresponding acoustic output during the performance. Participants
were instructed to perform a series of musical excerpts that spanned a range of styles
and emotional contexts. High-fidelity audio recordings were made of each
performance, which were then processed to extract detailed acoustic features. This
included capturing the musics frequency, intensity, and timbral features, contributing
a nuanced view of the expressive elements in each piece.
In parallel, biomechanical data were collected using a motion capture system with
12 high-speed infrared cameras and wearable IMUs. Reflective markers were placed
on key anatomical landmarks, such as the head, shoulders, elbows, and wrists, to
capture the complex motions involved in violin playing. This setup enabled the precise
measurement of kinematic parameters, including joint angles, segmental velocities,
and bowing trajectories. The IMUs complemented this data by providing additional
insights into angular velocities and forces exerted by the arms and hands. A force plate
Molecular & Cellular Biomechanics 2024, 21(2), 397.
6
embedded in the floor measured ground reaction forces, showing the participants
balance and weight distribution during the performance.
To ensure a comprehensive dataset, the audio and motion data were
synchronized, allowing for integrated analysis of how specific movements influenced
the acoustic output. Each musical excerpt was performed multiple times to capture
various expressive variations. The collected data were then processed into a format
suitable for ML analysis, with sequences representing temporal performance windows.
This comprehensive data collection approach aimed to provide a rich foundation for
examining the interplay between the biomechanical features of violin playing and the
resulting musical expression, facilitating an in-depth exploration of how physical
movement and sound are interconnected in conveying musical styles and emotions.
2.5. Mel-Spectrogram-based LSTM
The Mel-spectrogram-based LSTM was employed to analyze the temporal
relationships between violinists BM and the audio features of their performances. The
model utilized Mel-spectrograms, which provide a time-frequency representation of
the audio signal, combined with biomechanical data to capture patterns corresponding
to different musical styles and emotions. The LSTM was chosen due to its ability to
learn and model long-term dependencies in sequential data, making it ideal for
understanding violin performances dynamic and expressive nature.
1) Input to the LSTM: The input to the LSTM consisted of sequences of feature
vectors representing biomechanical and audio data. Each input sequence was
structured as follows:
󰇝󰇛󰇜󰇛󰇜󰇛󰇜󰇞
(1)
where is the biomechanical feature vector at time including joint angles,
velocities, forces, and is the Mel-spectrogram frame at the same time . Mel-
spectrograms were derived from the audio signal using the Short-Time Fourier
Transform (STFT):
󰇝󰇛󰇜󰇞󰇛 󰇜
 󰇟󰇠 󰇟 󰇠
(2)
The Mel-spectrogram 󰇛󰇜 was then calculated by mapping the frequency
components onto the Mel scale:
󰇛󰇜
󰇝󰇛󰇜󰇞󰇛󰇜󰇛󰇜
(3)
where 󰇛󰇜 is the Mel filter bank. This transformation provided a feature-rich
representation of the audio that captured the expressive elements of the musical
performance.
2) LSTM: The models core comprised multiple layers of LSTM units. Each LSTM
unit maintained an internal state  and an output at each time step. It used
three gates-input, forget, and output gates-to regulate the flow of information:
Input Gate :
󰇛󰇟󰇠 󰇜
(4)
Molecular & Cellular Biomechanics 2024, 21(2), 397.
7
Forget Gate :
󰇟󰇠
(5)
Output Gate :
󰇛󰇟󰇠󰇜
(6)
Cell state update:
 󰇛󰇟󰇠󰇜
(7)
  
(8)
Hidden state :
󰇛󰇜
(9)
Here, is the sigmoid activation function, and tanh is the hyperbolic tangent
function. , and are the weight matrices, while , and are the
biased terms. These LSTM layers processed the input sequences, learning the temporal
dependencies between the biomechanical movements and the Mel-spectrogram
features.
3) Fully Connected Layer and SoftMax Classification: After processing the
sequences through the LSTM layers, the output was passed to a fully connected
(dense) layer. This dense layer aggregated the learned features from the LSTM
layers, refining the representation of the data:
(10)
where dense and dense are the weights and biases of the dense layer, and  is the
final hidden state from the LSTM layers. The dense layers output  was then fed
into a softmax activation function to produce a probability distribution over the
classes:
(11)
where is the predicted probability for class . The SoftMax function ensured that
the output values were summed to 1, making them interpretable as probabilities for
the different musical styles and emotional states.
4) Model Training: The model was trained using the categorical cross-entropy loss
function, which measured the discrepancy between the predicted class
probabilities and the true labels:
(12)
where is the true label and is the predicted probability for each class. The
models parameters, including the weights of the LSTM and dense layers, were
updated using Backpropagation Through Time (BPTT) to minimize this loss. The
training process involved updating the model weights to improve its ability to classify
the input sequences accurately.
Molecular & Cellular Biomechanics 2024, 21(2), 397.
8
The final output layer provided a probability distribution over the classes,
allowing the model to classify each input sequence into one of the predefined
categories representing musical styles and emotions (Algorithm 1). By leveraging the
sequential nature of biomechanical and Mel-spectrogram data, the LSTM model could
recognize complex patterns that feature different expressive nuances in violin
performances.
Algorithm 1 Mel-spectrogram-based LSTM model
1: Input:
2: Biomechanical Data: 󰇝󰇞 where is the biomechanical feature vector at time .
3: Mel-Spectrogram Data: 󰇝󰇞 where is the Mel-spectrogram frame at time .
4: Number of Epochs:
5: Learning Rate:
6: Output:
7: Predicted class probabilities for musical styles and emotions.
8: Steps:
9: Data Preprocessing:
10: 1.1. Clean and normalize biomechanical data to obtain features like joint angles, segmental velocities, and forces.
11: 1.2. Convert audio recordings into Mel-spectrograms using STFT and Mel filter banks.
13: 1.3. Synchronize biomechanical data and Mel-spectrogram data into sequences: 󰇝󰇛󰇜󰇛󰇜󰇛󰇜󰇞
14 1.4. Normalize and segment the data into fixed-size windows.
15: Initialize LSTM Model Parameters:
16: 2.1. Randomly initialize weights and biases for LSTM layers.
17: 2.2. Initialize weights dense and biases dense for the fully connected layer.
18: Training Phase:
19: For epoch in 1 to:
20: 3.1. For Each training sequence 󰇝󰇛󰇜󰇛󰇜󰇞 :
21: 3.2. Initialize LSTM cell state  and hidden state .
22: For to:
23: Compute LSTM Gates: 󰇛󰇟󰇠󰇜
󰇟󰇠
󰇛󰇟󰇠 󰇜
24: Update cell state :  󰇛󰇟󰇠 󰇜
  
25: Update hidden state : 󰇛󰇜
26: End For
27: 3.3. Obtain final hidden state and pass it to the fully connected layer: dense dense
28: 3.4. Apply SoftMax to obtain class probabilities:

29: 3.5. Compute loss using categorical cross-entropy:
 󰇛󰇜
30: 3.6. Backpropagate the error through the LSTM and fully connected layers using BPTT to compute gradients.
31: 3.7. Update weights and biases using gradient descent: 
32: 3.8. End For
33: Prediction Phase:
34: 4.1. For a new input sequence test, compute the class probabilities using the trained LSTM model following steps 3.1 to
3.4.
35: 4.2. Assign the class with the highest probability as the predicted label for musical style or emotion.
36: End Algorithm
3. Results
3.1. Descriptive statistics
Molecular & Cellular Biomechanics 2024, 21(2), 397.
9
The descriptive statistics for the kinematic data, as shown in Table 2 and Figure
2, indicate a mean shoulder joint angle of 47° with a Standard Deviation (SD) of 12°,
ranging from 26° to 67°. The elbow joint angle averaged 72° with an 11° SD, spanning
51° to 89°, while the wrist joint angle had a mean of 23° with an 8° deviation, ranging
from 11° to 39°. Bowing velocity showed a mean of 0.74 m/s, with a SD of 0.21 m/s,
and varied between 0.32 m/s and 1.18 m/s. The bowing trajectory deviation averaged
with a 4° SD, with a range from 2° to 11°. For the kinetic data, the ground reaction
force had a mean of 347.6 N and an SD of 64.9 N, ranging from 251.3 N to 468.7 N.
The arm force averaged 13.1 N with a 3.3 N SD, from 7.2 N to 18.9 N. Hand force
showed a mean of 7.1 N with a 2.3 N-SD, ranging from 3.4 N to 10.7 N. Regarding
acoustic data, the sound intensity had a mean of 74.8 dB and an SD of 5.4 dB, with
values ranging from 66.1 dB to 87.6 dB. Vibrato frequency averaged 5.7 Hz with a
0.7 Hz deviation from 4.3 Hz to 6.9 Hz. Vibrato amplitude had a mean of 2.9 mm with
a 1.2 mm deviation, ranging from 1.1 mm to 4.4 mm. Articulation clarity, measured
on a scale of 1 to 5, averaged 4.1 with an SD of 0.6, ranging from 3.1 to 4.9.
Table 2. Results for the descriptive statistics.
Parameter
Mean
Standard deviation
Minimum
Maximum
Kinematic data
Shoulder joint angle (°)
47
12
26
67
Elbow joint angle (°)
72
11
51
89
Wrist joint angle (°)
23
8
11
39
Bowing velocity (m/s)
0.74
0.21
0.32
1.18
Bowing trajectory deviation (°)
7
4
2
11
Kinetic data
Ground reaction force (n)
347.6
64.9
251.3
468.7
Arm force (n)
13.1
3.3
7.2
18.9
Hand force (N)
7.1
2.3
3.4
10.7
Acoustic data
Sound intensity (dB)
74.8
5.4
66.1
87.6
Vibrato frequency (Hz)
5.7
0.7
4.3
6.9
Vibrato amplitude (mm)
2.9
1.2
1.1
4.4
Articulation clarity (Scale 1–5)
4.1
0.6
3.1
4.9
Molecular & Cellular Biomechanics 2024, 21(2), 397.
10
Figure 2. Results of description statistics.
3.2. Biomechanical analysis
The kinematic findings across different musical styles and emotional contexts are
displayed in Table 3 and Figure 3, which show that the shoulder joint angle had a
mean of 44° (SD: 10°) in “Calm” performances, ranging from 28° to 61°, while in
“Energetic” performances, it averaged 53° (SD: 14°) with a range of 35° to 72°. In
“Joyful” contexts, the mean was 48° (SD: 12°) with a range from 32° to 65°, and in
“Melancholic” contexts, the mean was 42° (SD: 9°) with a range of 27° to 55°. The
elbow joint angle showed a mean of 70° (SD: 8°) for “Calm,” with a range from 55°
to 83°, while “Energetic” performances had a higher mean of 76° (SD: 10°) ranging
from 58° to 89°. In “Joyful” contexts, the mean was 74° (SD: 9°) with a range from
57° to 88°, and “Melancholic” had the lowest mean at 68° (SD: 7°), ranging from 51°
to 82°. Wrist joint angle in “Calm” performances had a mean of 22° (SD: 7°), ranging
from 10° to 37°, while in “Energetic” contexts, it averaged 27° (SD: 9°) with a range
of 13° to 41°. For “Joyful,” the mean was 25° (SD: 8°) with a range of 12° to 39°, and
in “Melancholic,” the mean was 20° (SD: 6°) with a range of to 32°. Bowing
velocity had a mean of 0.61 m/s (SD: 0.18 m/s) for “Calm” performances, ranging
Molecular & Cellular Biomechanics 2024, 21(2), 397.
11
from 0.35 to 0.93 m/s. In “Energetic” contexts, it increased to a mean of 0.89 m/s (SD:
0.24 m/s), ranging from 0.48 to 1.23 m/s. “Joyful” performances showed a mean of
0.78 m/s (SD: 0.21 m/s), ranging from 0.44 to 1.11 m/s, and “Melancholic” had the
lowest mean at 0.58 m/s (SD: 0.16 m/s), with a range of 0.30 to 0.82 m/s. Bowing
trajectory deviation in “Calm” performances had a mean of 6° (SD: 2°), with a range
from 2° to 10°. In “Energetic” contexts, the mean increased to (SD: 3°), ranging
from 4° to 14°. “Joyful” performances had a mean of 8° (SD: 3°) with a range of 3° to
12°, while “Melancholic” performances had the lowest mean at 5° (SD: 2°), ranging
from 1° to 9°.
Table 3. Kinematic findings across different musical styles and emotional contexts.
Kinematic Parameter
Calm
Energetic
Joyful
Melancholic
Shoulder joint angle (°)
Mean:
44
53
48
42
SD:
10
14
12
9
Range:
2861
3572
3265
2755
Elbow joint angle (°)
Mean:
70
76
74
68
SD:
8
10
9
7
Range:
5583
5889
5788
5182
Wrist joint angle (°)
Mean:
22
27
25
20
SD:
7
9
8
6
Range:
1037
1341
1239
9–32
Bowing velocity (m/s)
Mean:
0.61
0.89
0.78
0.58
SD:
0.18
0.24
0.21
0.16
Range:
0.35–0.93
0.48–1.23
0.44–1.11
0.30–0.82
Bowing trajectory deviation (°)
Mean:
6
9
8
5
SD:
2
3
3
2
Range:
2–10
4–14
3–12
1–9
Molecular & Cellular Biomechanics 2024, 21(2), 397.
12
Figure 3. Kinematic findings for. (a) shoulder joint angle ); (b) elbow joint angle (°); (c) wrist joint angle (°); (d)
bowing velocity (m/s); (e) bowing trajectory deviation (°).
3.3. Kinetic findings
The kinetic findings across different musical styles and emotional contexts are
shown in Table 4 and Figure 4, and they indicate that the ground reaction force in
“Calm” performances had a mean of 312 N (SD: 47 N), ranging from 257 N to 416 N.
In “Energetic” performances, the mean increased to 368 N (SD: 59 N), ranging from
275 N to 474 N. “Joyful” contexts showed a mean of 342 N (SD: 53 N), with a range
of 263 N to 435 N, while “Melancholic” had the lowest mean at 298 N (SD: 44 N),
ranging from 240 N to 386 N. Arm force had a mean of 11.2 N (SD: 3.1 N) for “Calm”
performances, ranging from 6.5 N to 16.8 N. In “Energetic” contexts, the mean
increased to 14.7 N (SD: 3.8 N), ranging from 8.4 N to 19.5 N. “Joyful” performances
had a mean of 13.3 N (SD: 3.4 N), ranging from 7.1 N to 17.6 N, while “Melancholic”
had the lowest mean arm force at 10.6 N (SD: 2.9 N), with a range of 6.2 N to 15.0 N.
Hand force in “Calm” performances had a mean of 5.8 N (SD: 1.7 N), ranging from
3.4 N to 9.2 N. In “Energetic” contexts, the mean was 7.6 N (SD: 2.1 N), ranging from
4.0 N to 10.7 N. “Joyful” performances showed a mean of 6.9 N (SD: 1.9 N), with a
Molecular & Cellular Biomechanics 2024, 21(2), 397.
13
range from 3.8 N to 9.9 N. “Melancholic” had the lowest hand force mean at 5.2 N
(SD: 1.6 N), ranging from 3.0 N to 8.3 N. Force plate balance in “Calm” performances
had a mean distribution of 48/52% (SD: 3/3%), with a range from 44/56% to 51/49%.
In “Energetic” contexts, the mean balance was 45/55% (SD: 4/4%), ranging from
40/60% to 50/50%. The mean balance for “Joyful” performances was 47/53% (SD:
3/3%), ranging from 43/57% to 51/49%. “Melancholic” performances exhibited a
balanced mean of 50/50% (SD: 2/2%), ranging from 47/53% to 52/48%.
Table 4. The kinetic findings across different musical styles and emotional contexts.
Kinetic Parameter
Calm
Energetic
Joyful
Melancholic
Ground reaction force (N)
Mean:
312
368
342
298
SD:
47
59
53
44
Range:
257416
275474
263435
240386
Arm force (N)
Mean:
11.2
14.7
13.3
10.6
SD:
3.1
3.8
3.4
2.9
Range:
6.516.8
8.419.5
7.117.6
6.215.0
Hand force (N)
Mean:
Mean: 5.8
Mean: 7.6
Mean: 6.9
Mean: 5.2
SD:
SD: 1.7
SD: 2.1
SD: 1.9
SD: 1.6
Range:
3.49.2
4.010.7
3.89.9
3.08.3
Force plate balance (%)
Mean:
48/52
45/55
47/53
50/50
SD:
3/3
4/4
3/3
2/2
Range:
44/5651/49
40/6050/50
43/5751/49
47/5352/48
Figure 4. Kinetic findings for. (a) ground reaction force (N); (b) arm force (N); (c) hand force (N); (d) force plate
balance (%) for left balance; (e) force plate balance (%) for right balance.
Molecular & Cellular Biomechanics 2024, 21(2), 397.
14
3.4. Correlation with audio features
The correlations between key biomechanical parameters and acoustic features are
shown in Table 5 and Figure 5, which reveal that shoulder joint angle had a moderate
positive correlation with sound intensity across all contexts, being strongest in
“Energetic” (0.68) and weakest in “Melancholic” (0.38). The correlation between
shoulder joint angle and vibrato amplitude was also highest in “Energetic” (0.60) and
lowest in “Melancholic” (0.30). The elbow joint angle showed a moderate correlation
with sound intensity, peaking in “Energetic” (0.63) and lowest in “Melancholic”
(0.35). Its correlation with vibrato frequency was highest in “Energetic” (0.58) and
lowest in “Melancholic” (0.32). Bowing velocity exhibited a strong correlation with
sound intensity, particularly in “Energetic” (0.75) and a moderate one in
“Melancholic” (0.50). Its correlation with articulation clarity was highest in
“Energetic” (0.70) and lowest in “Melancholic” (0.46). Ground reaction force showed
a moderate correlation with sound intensity, being highest in “Energetic” (0.57) and
lowest in “Melancholic” (0.33). Arm force had a moderate to strong correlation with
vibrato amplitude, most vital in “Energetic” (0.65) and weakest in “Melancholic”
(0.37). Its correlation with sound intensity was highest in “Energetic” (0.69) and
lowest in “Melancholic” (0.39). Hand force had a moderate correlation with
articulation clarity, peaking in “Energetic” (0.63) and lowest in “Melancholic” (0.40).
Table 5. The correlations between key biomechanical (kinematic and kinetic) parameters and acoustic features across
different musical styles and emotional contexts.
Biomechanical parameter
Acoustic feature
Calm
Energetic
Joyful
Melancholic
Shoulder joint angle (°)
Sound intensity (dB)
0.45
0.68
0.52
0.38
Vibrato amplitude (mm)
0.34
0.60
0.41
0.30
Elbow joint angle (°)
Sound intensity (dB)
0.40
0.63
0.48
0.35
Vibrato frequency (Hz)
0.37
0.58
0.44
0.32
Bowing velocity (m/s)
Sound intensity (dB)
0.56
0.75
0.62
0.50
Articulation clarity
0.48
0.70
0.55
0.46
Ground reaction force (N)
Sound intensity (dB)
0.38
0.57
0.49
0.33
Arm force (N)
Vibrato amplitude (mm)
0.42
0.65
0.50
0.37
Sound intensity (dB)
0.44
0.69
0.54
0.39
Hand force (N)
Articulation clarity
0.46
0.63
0.51
0.40
Molecular & Cellular Biomechanics 2024, 21(2), 397.
15
Figure 5. Correlations between. (a) shoulder joint angle )sound intensity (dB); (b) shoulder joint angle (°)
vibrato amplitude (mm); (c) elbow joint angle (°)sound intensity (dB); (d) elbow joint angle (°)vibrato frequency
(Hz); (e) bowing velocity (m/s)and intensity (dB); (f) bowing velocity (m/s)articulation clarity; (g) ground
reaction force (N)sound intensity (dB); (h) arm force (N)vibrato amplitude (mm); (i) arm force (N)sound
intensity (dB); (j) hand force (N)articulation clarity.
3.5. Audio analysis
The Mel-spectrogram features across different musical styles and emotional
contexts are shown in Table 6 and Figure 6, and they indicate that peak frequency
was highest in “Energetic” performances with a mean of 947 Hz (SD: 113 Hz), ranging
Molecular & Cellular Biomechanics 2024, 21(2), 397.
16
from 779 Hz to 1103 Hz, and lowest in “Melancholic” with a mean of 553 Hz (SD: 68
Hz), ranging from 423 Hz to 668 Hz. In “Calm” and “Joyful” contexts, the means were
603 Hz (SD: 77 Hz) and 807 Hz (SD: 92 Hz), respectively. The spectral centroid,
representing the “brightness” of the sound, was highest in “Energetic” performances
with a mean of 1093 Hz (SD: 117 Hz), ranging from 903 Hz to 1253 Hz, and lowest
in “Melancholic” at 647 Hz (SD: 71 Hz), with a range from 523 Hz to 779 Hz. “Calm”
and “Joyful” had means of 703 Hz (SD: 82 Hz) and 953 Hz (SD: 98 Hz), respectively.
Spectral bandwidth, indicating the range of frequencies, was broadest in “Energetic”
with a mean of 273 Hz (SD: 41 Hz), ranging from 203 Hz to 319 Hz, and narrowest
in “Melancholic” with a mean of 163 Hz (SD: 24 Hz), ranging from 121 Hz to 199
Hz. In “Calm” and “Joyful,” the means were 183 Hz (SD: 31 Hz) and 227 Hz (SD: 36
Hz), respectively. Spectral contrast was highest in “Energetic” performances with a
mean of 31 dB (SD: 6 dB), ranging from 25 dB to 37 dB, and lowest in “Melancholic”
at 21 dB (SD: 2 dB), ranging from 17 dB to 27 dB. “Calm” and “Joyful” had means
of 23 dB (SD: 3 dB) and 27 dB (SD: 4 dB), respectively. Spectral flatness, which
indicates the noisiness of the sound, was highest in “Melancholic” with a mean of 0.34
(SD: 0.02), ranging from 0.31 to 0.39, and lowest in “Energetic” with a mean of 0.22
(SD: 0.03), ranging from 0.17 to 0.28. “Calm” and “Joyful” had means of 0.33 (SD:
0.04) and 0.27 (SD: 0.05), respectively. The zero-crossing rate, indicating the rate of
signal changes, was highest in “Energetic” with a mean of 0.13 (SD: 0.03), ranging
from 0.09 to 0.15, and lowest in “Melancholic” with a mean of 0.06 (SD: 0.01),
ranging from 0.05 to 0.08. “Calm” and “Joyful” contexts had means of 0.09 (SD: 0.02)
and 0.11 (SD: 0.02), respectively.
Table 6. Mel-spectrogram features across different musical styles and emotional contexts.
Mel-Spectrogram Feature
Calm
Energetic
Joyful
Melancholic
Peak Frequency (Hz)
Mean:
603
947
807
553
SD:
77
113
92
68
Range:
482–719
779–1103
654–963
423–668
Spectral Centroid (Hz)
Mean:
703
1093
953
647
SD:
82
117
98
71
Range:
552–849
903–1253
751–1103
523–779
Spectral Bandwidth (Hz)
Mean:
183
273
227
163
SD:
31
41
36
24
Range:
131–229
203–319
181–279
121–199
Spectral Contrast (dB)
Mean:
23
31
27
21
SD:
3
6
4
2
Range:
1829
2537
2333
1727
Spectral Flatness
Mean:
0.33
0.22
0.27
0.34
SD:
0.04
0.03
0.05
0.02
Range:
0.26–0.39
0.17–0.28
0.21–0.33
0.31–0.39
Zero-Crossing Rate
Mean:
0.09
0.13
0.11
0.06
SD:
0.02
0.03
0.02
0.01
Range:
0.06–0.10
0.09–0.15
0.08–0.13
0.05–0.08
Molecular & Cellular Biomechanics 2024, 21(2), 397.
17
Figure 6. Mel-spectrogram attribute for. (a) Peak Frequency (Hz); (b) spectral centroid (Hz); (c) spectral bandwidth
(Hz); (d) spectral contrast (dB); (e) spectral flatness; (f) zero-crossing rate.
3.6. Sound and expressiveness
The sound features and their association with expressive elements across
different musical styles and emotional contexts are depicted in Table 7 and Figure 7,
which reveal that sound intensity was highest in “Energetic” performances with a
mean of 83.7 dB (SD: 6.3 dB), ranging from 72.5 dB to 92.4 dB, and lowest in
“Melancholic” with a mean of 71.9 dB (SD: 4.9 dB), ranging from 63.2 dB to 80.8
dB. In “Calm” and “Joyful” contexts, the means were 74.2 dB (SD: 5.2 dB) and 78.6
dB (SD: 5.7 dB), respectively. Vibrato frequency, indicating the rate of pitch variation,
was highest in “Energetic” performances with a mean of 6.8 Hz (SD: 0.9 Hz), ranging
from 5.4 Hz to 7.9 Hz, and lowest in “Melancholic” with a mean of 4.8 Hz (SD: 0.6
Hz), ranging from 3.9 Hz to 5.8 Hz. “Calm” and “Joyful” had means of 5.4 Hz (SD:
0.7 Hz) and 6.1 Hz (SD: 0.8 Hz), respectively. Vibrato amplitude, reflecting the extent
of pitch variation, was widest in “Energetic” performances with a mean of 3.4 mm
(SD: 1.3 mm), ranging from 2.0 mm to 5.6 mm, and narrowest in “Melancholic” with
a mean of 2.3 mm (SD: 0.8 mm), ranging from 1.0 mm to 3.6 mm. “Calm” and
“Joyful” contexts had means of 2.7 mm (SD: 1.0 mm) and 3.1 mm (SD: 1.1 mm),
respectively. Articulation clarity, measured on a scale of 1 to 5, was highest in
“Energetic” performances with a mean of 4.6 (SD: 0.5), ranging from 4.0 to 5.0, and
Molecular & Cellular Biomechanics 2024, 21(2), 397.
18
lowest in “Melancholic” with a mean of 3.9 (SD: 0.5), ranging from 3.0 to 4.8. “Calm”
and “Joyful” had means of 4.1 (SD: 0.6) and 4.4 (SD: 0.7), respectively. Dynamic
range, indicating the difference between the loudest and softest sounds, was most
significant in “Energetic” performances with a mean of 24.9 dB (SD: 4.5 dB), ranging
from 19.0 dB to 29.8 dB, and most minor in “Melancholic” with a mean of 17.3 dB
(SD: 3.2 dB), ranging from 13.1 dB to 22.6 dB. “Calm” and “Joyful” contexts had
means of 18.4 dB (SD: 3.7 dB) and 21.7 dB (SD: 4.0 dB), respectively.
Table 7. Sound features and their association with expressive elements across different musical styles and emotional
contexts.
Sound characteristic
Calm
Energetic
Joyful
Melancholic
Sound intensity (dB)
Mean:
74.2
83.7
78.6
71.9
SD:
5.2
6.3
5.7
4.9
Range:
65.184.3
72.592.4
69.487.5
63.280.8
Vibrato frequency (Hz)
Mean:
5.4
6.8
6.1
4.8
SD:
0.7
0.9
0.8
0.6
Range:
4.36.5
5.47.9
4.97.2
3.95.8
Vibrato amplitude (mm)
Mean:
2.7
3.4
3.1
2.3
SD:
1.0
1.3
1.1
0.8
Range:
1.24.5
2.05.6
1.54.8
1.03.6
Articulation clarity
Mean:
4.1 (scale 15)
4.6 (scale 15)
4.4 (scale 15)
3.9 (scale 15)
SD:
0.6
0.5
0.7
0.5
Range:
3.24.9
4.05.0
3.35.0
3.04.8
Dynamic range (dB)
Mean:
18.4
24.9
21.7
17.3
SD:
3.7
4.5
4.0
3.2
Range:
12.524.7
19.029.8
15.626.4
13.122.6
(a)
Molecular & Cellular Biomechanics 2024, 21(2), 397.
19
(b)
(c)
(d)
(e)
Figure 7. Sound characteristics and their association with. (a) sound intensity (dB); (b) vibrato frequency (Hz); (c)
vibrato amplitude (mm); (d) articulation clarity; (e) dynamic range (dB).
Molecular & Cellular Biomechanics 2024, 21(2), 397.
20
3.7. ML model performance
As displayed in Table 8 and Figure 8, during the ML models training and
validation process, the key performance metrics show a steady decrease in training
and validation loss over the epochs, with training loss starting at 1.203 in epoch 1 and
reducing to 0.277 by epoch 100. Validation loss followed a similar trend, decreasing
from 1.315 to 0.459 over the same period. Training accuracy increased consistently
from 58.4% at epoch 1 to 96.5% at epoch 100, while validation accuracy improved
from 54.3% to 92.5%. Precision started at 55.6% in epoch 1 and rose to 94.3% by
epoch 100. Recall followed a similar pattern, beginning at 53.7% and reaching 92.6%.
The F1-score, reflecting the harmonic mean of precision and recall, improved from
54.6% at the start of training to 93.4% at epoch 100. These metrics indicate a consistent
improvement in the models ability to accurately identify musical styles and emotions,
with both training and validation metrics stabilizing in the latter epochs, suggesting
effective learning and generalization.
Table 8. Performance metrics during the ML models training and validation process.
Epoch
Training loss
Validation loss
Training accuracy (%)
Validation accuracy (%)
Precision (%)
Recall (%)
F1-score (%)
1
1.203
1.315
58.4
54.3
55.6
53.7
54.6
5
0.854
0.926
72.1
68.9
70.2
67.8
69.0
10
0.672
0.705
81.4
78.3
79.1
77.5
78.3
15
0.551
0.622
86.5
82.7
84.6
81.2
82.9
20
0.497
0.583
88.3
84.6
86.1
83.4
84.7
25
0.459
0.548
89.7
86.1
87.8
85.2
86.5
30
0.430
0.524
90.8
87.5
89.1
86.9
88.0
35
0.410
0.510
91.6
88.2
89.7
87.4
88.5
40
0.391
0.501
92.3
88.7
90.4
88.1
89.2
45
0.376
0.492
92.8
89.1
90.9
88.6
89.7
50
0.362
0.486
93.3
89.6
91.5
89.2
90.3
55
0.349
0.481
93.7
90.1
91.8
89.9
90.8
60
0.337
0.476
94.1
90.4
92.2
90.3
91.2
65
0.327
0.473
94.5
90.8
92.6
90.7
91.6
70
0.318
0.470
94.8
91.0
92.9
91.1
92.0
75
0.310
0.467
95.1
91.3
93.1
91.4
92.3
80
0.302
0.465
95.4
91.6
93.4
91.7
92.5
85
0.295
0.463
95.7
91.8
93.7
91.9
92.8
90
0.288
0.461
96.0
92.1
93.9
92.2
93.0
95
0.282
0.460
96.2
92.3
94.1
92.4
93.2
100
0.277
0.459
96.5
92.5
94.3
92.6
93.4
Molecular & Cellular Biomechanics 2024, 21(2), 397.
21
Figure 8. Performance of the ML model against. (a) training and validation loss; (b) training and validation accuracy;
(c) precision, recall, and F1-score.
4. Conclusion and future work
The findings of this study demonstrate the effectiveness of integrating
biomechanical analysis with MLg, specifically Mel-spectrogram-based LSTM
modeling, to automate the identification of musical styles and emotions in violin
performances. By analyzing synchronized biomechanical and acoustic data, the study
reveals a strong relationship between a violinist’s physical movements and the
expressive qualities of their performance. Key biomechanical parameters such as joint
angles, bowing velocity, and force dynamics correlate significantly with audio features
like sound intensity, vibrato frequency, and articulation clarity, highlighting the
nuanced ways musicians convey different styles and emotions. The LSTM model
showed high accuracy in classifying musical styles and emotions, achieving a
validation accuracy of 92.5%, with precision, recall, and F1-score exceeding 92%.
These results indicate that ML can effectively capture the complex temporal patterns
linking PM and ME, contributing to a more objective and comprehensive
understanding than traditional qualitative methods. This automated approach has
significant implications for music analysis, education, and interactive performance
systems, providing a framework for real-time feedback and training tools for
musicians. Despite the promising outcomes, this study also highlights areas for further
exploration. The model’s performance, while robust, could be enhanced by
incorporating additional features such as emotional context in real-time performance
settings.
Future work could extend this framework to other musical instruments and
genres, furthering our understanding of the interplay between biomechanics and ME.
Overall, this study contributes to bridging the gap between the artistic and scientific
perspectives on music performance, demonstrating the potential of combining
biomechanics and machine learning to enrich our understanding of violin playing.
Molecular & Cellular Biomechanics 2024, 21(2), 397.
22
Ethical approval: Not applicable.
Conflict of interest: The author declares no conflict of interest.
References
1. Kucherenko S, Sediuk I. Aesthetic Experience and Its Expressions in Music Performance. International Review of the
Aesthetics and Sociology of Music. 2020; 51(1): 1928.
2. Bedoya D, Arias P, Rachman L, et al. Even violins can cry: specifically vocal emotional behaviours also drive the perception
of emotions in non-vocal music. Philosophical Transactions of the Royal Society B: Biological Sciences. 2021; 376(1840).
doi: 10.1098/rstb.2020.0396
3. Trollmo S. A method-based approach to historical violin playing: performance practice from a contemporary perspective
[PhD thesis]. Boston University; 2020.
4. Kim N. Expressivity in the Melodic Line: Developing Musicality in Violin Students [PhD thesis]. The Florida State
University; 2020.
5. Meissner H. Theoretical Framework for Facilitating Young Musicians’ Learning of Expressive Performance. Frontiers in
Psychology. 2021; 11: 584171. doi: 10.3389/fpsyg.2020.584171
6. Lerch A, Arthur C, Pati A, Gururani S. An interdisciplinary review of music performance analysis. Transactions of the
International Society for Music Information Retrieva. 2021; 3(1): 221245. doi: 10.5334/tismir.53
7. Wolf E, Möller D, Ballenberger N, et al. Marker-Based Method for Analyzing the Three-Dimensional Upper Body
Kinematics of Violinists: Reproducibility. Medical Problems of Performing Artists. 2022; 37(3): 176191. doi:
10.21091/mppa.2022.3025
8. Turner C, Visentin P, Oye D, et al. Pursuing Artful Movement Science in Music Performance: Single Subject Motor
Analysis with Two Elite Pianists. Perceptual and Motor Skills. 2021; 128(3): 12521274. doi: 10.1177/00315125211003493
9. Kyriakou T, de la Campa Crespo MÁ, Panayiotou A, et al. Virtual Instrument Performances (VIP): A Comprehensive
Review. Computer Graphics Forum. 2024; 43(2). doi: 10.1111/cgf.15065
10. Erdem C, Lan Q, Jensenius AR. Exploring relationships between effort, motion, and sound in new musical instruments.
Human Technology. 2020; 16(3): 310347.
11. Freire S, Santos G, Armondes A, et al. Evaluation of Inertial Sensor Data by a Comparison with Optical Motion Capture
Data of Guitar Strumming Gestures. Sensors. 2020; 20(19): 5722. doi: 10.3390/s20195722
12. Jensenius AR. Sound actions: Conceptualizing musical instruments. MIT Press; 2022.
13. Sharma, G. (2023). Audio Texture Analysis of Biomedical Audio Signals [PhD thesis]. Toronto Metropolitan University;
2023.
14. Gupta C, Li H, Goto M. (2022). Deep learning approaches in topics of singing information processing. IEEE/ACM
Transactions on Audio, Speech, and Language Processing; 2022. pp. 24222451.
15. Alfaras M, Primett W, Umair M, et al. Biosensing and ActuationPlatforms Coupling Body Input-Output Modalities for
Affective Technologies. Sensors. 2020; 20(21): 5968. doi: 10.3390/s20215968
16. Napier T, Ahn E, Allen-Ankins S, et al. Advancements in preprocessing, detection and classification techniques for
ecoacoustic data: A comprehensive review for large-scale Passive Acoustic Monitoring. Expert Systems with Applications.
2024; 252: 124220. doi: 10.1016/j.eswa.2024.124220
17. Van Houdt G, Mosquera C, Nápoles G. A review on the long short-term memory model. Artificial Intelligence Review.
2020; 53(8): 59295955. doi: 10.1007/s10462-020-09838-1
18. Chikkamath S, Nirmala SR. Melody generation using LSTM and BI-LSTM Network. In: Proceedings of the 2021
International Conference on Computational Intelligence and Computing Applications (ICCICA); 2021. pp. 16.
19. Sarangi V. Biological and biomimetic machine learning for automatic classification of human gait [PhD thesis]. University
of York; 2020.
20. Indumathi N et al., Impact of Fireworks Industry Safety Measures and Prevention Management System on Human Error
Mitigation Using a Machine Learning Approach, Sensors, 2023, 23 (9), 4365; DOI:10.3390/s23094365.
21. Parkavi K et al., Effective Scheduling of Multi-Load Automated Guided Vehicle in Spinning Mill: A Case Study, IEEE
Access, 2023, DOI:10.1109/ACCESS.2023.3236843.
Molecular & Cellular Biomechanics 2024, 21(2), 397.
23
22. Ran Q et al., English language teaching based on big data analytics in augmentative and alternative communication system,
Springer-International Journal of Speech Technology, 2022, DOI:10.1007/s10772-022-09960-1.
23. Ngangbam PS et al., Investigation on characteristics of Monte Carlo model of single electron transistor using Orthodox
Theory, Elsevier, Sustainable Energy Technologies and Assessments, Vol. 48, 2021, 101601,
DOI:10.1016/j.seta.2021.101601.
24. Huidan Huang et al., Emotional intelligence for board capital on technological innovation performance of high-tech
enterprises, Elsevier, Aggression and Violent Behavior, 2021, 101633, DOI:10.1016/j.avb.2021.101633.
25. Sudhakar S, et al., Cost-effective and efficient 3D human model creation and re-identification application for human digital
twins, Multimedia Tools and Applications, 2021. DOI:10.1007/s11042-021-10842-y.
26. Prabhakaran N et al., Novel Collision Detection and Avoidance System for Mid-vehicle Using Offset-Based Curvilinear
Motion. Wireless Personal Communication, 2021. DOI:10.1007/s11277-021-08333-2.
27. Balajee A et al., Modeling and multi-class classification of vibroarthographic signals via time domain curvilinear divergence
random forest, J Ambient Intell Human Comput, 2021, DOI:10.1007/s12652-020-02869-0.
28. Omnia SN et al., An educational tool for enhanced mobile e-Learning for technical higher education using mobile devices
for augmented reality, Microprocessors and Microsystems, 83, 2021, 104030, DOI:10.1016/j.micpro.2021.104030 .
29. Firas TA et al., Strategizing Low-Carbon Urban Planning through Environmental Impact Assessment by Artificial
Intelligence-Driven Carbon Foot Print Forecasting, Journal of Machine and Computing, 4(4), 2024, doi:
10.53759/7669/jmc202404105.
30. Shaymaa HN, et al., Genetic Algorithms for Optimized Selection of Biodegradable Polymers in Sustainable Manufacturing
Processes, Journal of Machine and Computing, 4(3), 563-574, https://doi.org/10.53759/7669/jmc202404054.
31. Hayder MAG et al., An open-source MP + CNN + BiLSTM model-based hybrid model for recognizing sign language on
smartphones. Int J Syst Assur Eng Manag (2024). https://doi.org/10.1007/s13198-024-02376-x
32. Bhavana Raj K et al., Equipment Planning for an Automated Production Line Using a Cloud System, Innovations in
Computer Science and Engineering. ICICSE 2022. Lecture Notes in Networks and Systems, 565, 707717, Springer,
Singapore. DOI:10.1007/978-981-19-7455-7_57.
... As a result, there appears to be a limited dialogue between the studies examining the impact of performance and abstract parameters related to music expression on pianists' kinematics and muscle activity. Ding (2024) reported a similar observation in violin performance research related to music expression, where biomechanical elements (e.g., muscle activity, joint kinematics) and musical elements (e.g., acoustical features) have been generally addressed in isolation. ...
Article
Full-text available
Bodily gestures are essential in piano performance. They allow sound production and, at the same time, facilitate the communication of the expressive content of music. From pianists’ perspective, music expression-related parameters include not only single performance parameters (timing, sound intensity, articulation, etc.), but also more complex parameters (named hereafter abstract parameters), such as music structure features (e.g., phrasing) and extra-musical ideas (e.g., emotions, narratives, etc.). This systematic review aimed to investigate the impact of both performance and abstract parameters related to music expression on kinematics and muscle activity of expert pianists. As complementary objectives, we documented ontological and methodological differences between the studies included, and we addressed how music expression-related parameters affect pianists’ exposure to risk factors of injuries. The search strategy consisted of using concepts and keywords in Medline, Embase, SPORTDiscus, and Web of Science databases, and we followed the PRISMA guidelines. Sixteen studies were included. Eleven studies focused on performance parameters, four studies focused on abstract parameters, and one study addressed both performance and abstract parameters. Performance and abstract music expression-related parameters impacted pianists’ kinematics and muscle activity in a variety of ways. The specific effects were dependent on the type of task and the gestural variable investigated by studies. Important differences in ontological (performance or abstract parameters studied, gestural variable investigated) and methodological choices (experimental task and instrument used, data acquisition and processing procedures) prevent the establishment of a thorough dialogue between music research studies and biomechanics and motor control studies. A set of performance parameters (playing loud, playing fast, staccato articulation, large handspan chords) were identified as potential risk factors of injuries. Further interdisciplinary research mixing methods from empirical music research and biomechanics would help enhance knowledge on the impact of music expression on pianists’ gestures for both performance and injury prevention purposes.
Article
Full-text available
Sustainable Manufacturing Practices (SMP), particularly in the selection of materials, have become essential due to environmental issues caused by the expansion of industry. Compared to conventional polymers, biodegradable Polymer Materials (BPM) are growing more commonly as an approach to reducing trash pollution. Suitable materials can be challenging due to numerous considerations, like ecological impact, expenditure, and material properties. When addressing sophisticated trade-offs, standard approaches drop. To compete with such challenges, employing Genetic Algorithms (GA) may be more successful, as they have their foundation in the basic concepts of biological development and the natural selection process. With a focus on BPM, this study provides a GA model for optimal packaging substance selection. Out of the four algorithms for computation used for practical testing—PSO, ACO, and SA—the GA model is the most effective. The findings demonstrate that GA can be used to enhance SMP and performs well in enormous search spaces that contain numerous different combinations of materials.
Article
Full-text available
The communication barriers experienced by deaf and hard-of-hearing individuals often lead to social isolation and limited access to essential services, underlining a critical need for effective and accessible solutions. Recognizing the unique challenges this community faces—such as the scarcity of sign language interpreters, particularly in remote areas, and the lack of real-time translation tools. This paper proposes the development of a smartphone-runnable sign language recognition model to address the communication problems faced by deaf and hard-of-hearing persons. This proposed model combines Mediapipe hand tracking with particle filtering (PF) to accurately detect and track hand movements, and a convolutional neural network (CNN) and bidirectional long short-term memory based gesture recognition model to model the temporal dynamics of Sign Language gestures. These models use a small number of layers and filters, depthwise separable convolutions, and dropout layers to minimize the computational costs and prevent overfitting, making them suitable for smartphone implementation. This article discusses the existing challenges handled by the deaf and hard-of-hearing community and explains how the proposed model could help overcome these challenges. A MediaPipe + PF model performs feature extraction from the image and data preprocessing. During training, with fewer activation functions and parameters, this proposed model performed better to other CNN with RNN variant models (CNN + LSTM, CNN + GRU) used in the experiments of convergence speed and learning efficiency.
Article
Full-text available
Driven by recent advancements in Extended Reality (XR), the hype around the Metaverse, and real‐time computer graphics, the transformation of the performing arts, particularly in digitizing and visualizing musical experiences, is an ever‐evolving landscape. This transformation offers significant potential in promoting inclusivity, fostering creativity, and enabling live performances in diverse settings. However, despite its immense potential, the field of Virtual Instrument Performances (VIP) has remained relatively unexplored due to numerous challenges. These challenges arise from the complex and multi‐modal nature of musical instrument performances, the need for high precision motion capture under occlusions including the intricate interactions between a musician's body and fingers with instruments, the precise synchronization and seamless integration of various sensory modalities, accommodating variations in musicians' playing styles, facial expressions, and addressing instrument‐specific nuances. This comprehensive survey delves into the intersection of technology, innovation, and artistic expression in the domain of virtual instrument performances. It explores musical performance multi‐modal databases and investigates a wide range of data acquisition methods, encompassing diverse motion capture techniques, facial expression recording, and various approaches for capturing audio and MIDI data (Musical Instrument Digital Interface). The survey also explores Music Information Retrieval (MIR) tasks, with a particular emphasis on the Musical Performance Analysis (MPA) field, and offers an overview of various works in the realm of Musical Instrument Performance Synthesis (MIPS), encompassing recent advancements in generative models. The ultimate aim of this survey is to unveil the technological limitations, initiate a dialogue about the current challenges, and propose promising avenues for future research at the intersection of technology and the arts.
Article
Full-text available
In the fireworks industry (FI), many accidents and explosions frequently happen due to human error (HE). Human factors (HFs) always play a dynamic role in the incidence of accidents in workplace environments. Preventing HE is a main challenge for safety and precautions in the FI. Clarifying the relationship between HFs can help in identifying the correlation between unsafe behaviors and influential factors in hazardous chemical warehouse accidents. This paper aims to investigate the impact of HFs that contribute to HE, which has caused FI disasters, explosions, and incidents in the past. This paper investigates why and how HEs contribute to the most severe accidents that occur while storing and using hazardous chemicals. The impact of fireworks and match industry disasters has motivated the planning of mitigation in this proposal. This analysis used machine learning (ML) and recommends an expert system (ES). There were many significant correlations between individual behaviors and the chance of HE to occur. This paper proposes an ML-based prediction model for fireworks and match work industries in Sivakasi, Tamil Nadu. For this study analysis, the questionnaire responses are reviewed for accuracy and coded from 500 participants from the fireworks and match industries in Tamil Nadu who were chosen to fill out a questionnaire. The Chief Inspectorate of Factories in Chennai and the Training Centre for Industrial Safety and Health in Sivakasi, Tamil Nadu, India, significantly contributed to the collection of accident datasets for the FI in Tamil Nadu, India. The data are analyzed and presented in the following categories based on this study’s objectives: the effect of physical, psychological, and organizational factors. The output implemented by comparing ML models, support vector machine (SVM), random forest (RF), and Naïve Bayes (NB) accuracy is 86.45%, 91.6%, and 92.1%, respectively. Extreme Gradient Boosting (XGBoost) has the optimal classification accuracy of 94.41% of ML models. This research aims to create a new ES to mitigate HE risks in the fireworks and match work industries. The proposed ES reduces HE risk and improves workplace safety in unsafe, uncertain workplaces. Proper safety management systems (SMS) can prevent deaths and injuries such as fires and explosions.
Article
Full-text available
In the Flexible Manufacturing System (FMS), where material processing is carried out in the form of tasks from one department to another, the use of Automated Guided Vehicles (AGVs) is significant. The application of multiple-load AGVs can be understood to boost FMS throughput by multiple orders of magnitude. For the transportation of materials and items inside a warehouse or manufacturing plant, an AGV, a mobile robot, offers extraordinary industrial capabilities. The technique of allocating AGVs to tasks while taking into account the cost and time of operations is known as AGV scheduling. Most research has exclusively addressed single-objective optimization, whereas multi-objective scheduling of AGVs is a complex combinatorial process without a single solution, in contrast to single-objective scheduling. This paper presents the integrated Local Search Probability-based Memetic Water Cycle (LSPM-WC) algorithm using a spinning mill as a case study. The scheduling model’s goal is to maximize machine efficiency. The scheduling of the statistical tests demonstrated the applicability of the proposed model in lowering the makespan and fitness values. The mean AGV operating efficiency was higher than the other estimated models, and the LSPM-WC surpassed the different algorithms to produce the best result.
Article
Addressing the associated rise in Carbon Emissions (CE) as smart cities expand becomes paramount. Effective low-carbon urban planning demands robust, precise assessments. This research introduces a cutting-edge solution via an Artificial Intelligence (AI) -driven Carbon Footprint (CF) impact assessment. A detailed dataset, collected over 3 years, was harnessed to gather insights into vital urban factors, including CE, Energy Consumption (EC) patterns, variations in land use, transportation dynamics, and changes in air quality. The cornerstone of this research is developing the Multi-modal Stacked VAR-LSTM model. This model proposes to provide accurate CF predictions for urban environments by merging the capabilities of Vector Autoregression (VAR) with Long Short-Term Memory (LSTM) neural networks. The process encompasses dedicated assessments for each data segment, harnessing VAR to delineate interdependencies and refining these predictions with the LSTM network using the residuals from the VAR analysis. By interweaving AI-driven carbon footprint impact assessments into the urban planning discourse, this study underscores the vast potential in sculpting future urban development strategies that are sustainable and sensitive to carbon impact.
Conference Paper
Lean Manufacturing (LM), which reduces waste, maximises labour efficiency and offers lean production advantages, operational costs, and low product quality. A production line includes workstations linked via transmission and electrical control. An automated manufacturing line reduces production and labour costs, improves output quality, and minimises human error. Integration and machines automate industrial processes, so the process is computer-controlled. Adopting automated manufacturing and lean and automated Logistics Methods (LM) may enhance product quality, minimise waste, and increase agility. A survey indicates a new strategy for optimum manufacturing and distribution applications. Case studies, theoretical analysis, and real-world examples of cutting-edge technology all have an impact on the application and production processes. This suggests that LM and automation are essential in manufacturing. Practical examples are the starting point for lean and advanced automation applications that allow Lena Automation System (LAS) deployment of the whole production and LM.KeywordsCloud computingAutomationManufacturing processLean
Book
A techno-cognitive look at how new technologies are shaping the future of musicking. “Musicking” encapsulates both the making of and perception of music, so it includes both active and passive forms of musical engagement. But at its core, it is a relationship between actions and sounds, between human bodies and musical instruments. Viewing musicking through this lens and drawing on music cognition and music technology, Sound Actions proposes a model for understanding differences between traditional acoustic “sound makers” and new electro-acoustic “music makers.” What is a musical instrument? How do new technologies change how we perform and perceive music? What happens when composers build instruments, performers write code, perceivers become producers, and instruments play themselves? The answers to these pivotal questions entail a meeting point between interactive music technology and embodied music cognition, what author Alexander Refsum Jensenius calls “embodied music technology.” Moving between objective description and subjective narrative of his own musical experiences, Jensenius explores why music makes people move, how the human body can be used in musical interaction, and how new technologies allow for active musical experiences. The development of new music technologies, he demonstrates, has fundamentally changed how music is performed and perceived.
Article
Background: Recently, Wolf et al. proposed a novel, marker-based method to analyze the three-dimensional upper-body kinematics of high string players for clinical application. The method provides an objective evaluation of high string players' motor strategies, especially in the shoulder complex, by distinguishing between the scapulothoracic (ST) and glenohumeral (GH) joints, while minimizing skin movement artifacts, marker occlusions, and limitations due to instrument placement. Nevertheless, reproducibility of kinematic measurements is crucial for clinical applications. The aim of this study was to assess the method's reproducibility in terms of reliability and repeatability. Methods: One healthy professional violinist underwent a total of nine bowing trials in three different laboratory sessions. Each trial was conducted by one of two different examiners. A biomechanical model was applied to motion capture data of the pelvis, thorax, spine, and head, as well as both upper limbs (consisting of the scapula, upper arm, forearm and hand). Reproducibility was assessed by calculating inter- and intra-tester, inter-session, and intra-subject measurement errors for each rotational degree of freedom in the upper-body segments and joints. Findings: Small measurement errors were accepted to be good indicators for reproducibility. Intra- and inter-tester errors were found to be small (< 3° for the most part). Both inter-session and intra-subject repeatability were found to be larger (< 5° for the most part). Interpretation: This study generally showed the novel, marker-based method to have good reproducibility for a healthy violinist. This indicates that the proposed method is a reliable tool for quantifying upper-body movements during violin playing across subjects, examiners, laboratories, and motion capture systems.