ArticlePDF Available

The use and effective analysis of vocal spectrum analysis method in vocal music teaching

Authors:

Abstract and Figures

As computer science and technology continue to evolve and become more pervasive, their application in analyzing the audio spectrum of vocalizations offers valuable insights for vocal music education. This study introduces a method utilizing Fourier transform analysis to examine time-frequency domain signals in vocal teaching. Initially, voice frequencies are collected during vocal music instruction. Subsequently, these frequencies are processed to extract characteristic sequences, which are then reduced in scale to develop a model for voice spectrum recognition tailored to vocal music education. This model facilitates detailed spectral analysis, enabling the investigation of its auxiliary benefits in vocal music teaching, particularly in identifying prevalent instructional challenges. Our findings indicate that during training on vowels “a” and “i,” professional singers’ pitch at 4kHz declined to between −15 and −18 dB, whereas students’ pitch varied around ±6dB, trending upwards. In cases of air leakage, significant gaps were observed at frequencies of 5500Hz, 10500Hz, and 14500Hz. At the same time, students exhibited missing frequencies at 7kHz, 12kHz, and 14kHz during glottal tone production, with pronounced, abrupt peaks occurring when vocal folds were tightly constricted and devoid of excessive links. This research substantiates the theoretical and practical benefits of digital spectrum technology in enhancing vocal music education, thereby providing a scientific and supportive role.
Content may be subject to copyright.
Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-14
Applied Mathematics and Nonlinear Sciences
https://www.sciendo.com
†Corresponding author.
Email address: 15855616959@163.com
ISSN 2444-8656
https://doi.org/10.2478/amns-2024-1361
© 2023 Bo Zhang, published by Sciendo.
This work is licensed under the Creative Commons Attribution alone 4.0 License.
The use and effective analysis of vocal spectrum analysis method in vocal music
teaching
Bo Zhang1,†
1. Department of Art and Design, Tongcheng Teachers College, Tongcheng, Anhui, 231400, China.
Submission Info
Communicated by Z. Sabir
Received February 12, 2024
Accepted April 14, 2024
Available online June 3, 2024
Abstract
As computer science and technology continue to evolve and become more pervasive, their application in analyzing the
audio spectrum of vocalizations offers valuable insights for vocal music education. This study introduces a method
utilizing Fourier transform analysis to examine time-frequency domain signals in vocal teaching. Initially, voice
frequencies are collected during vocal music instruction. Subsequently, these frequencies are processed to extract
characteristic sequences, which are then reduced in scale to develop a model for voice spectrum recognition tailored to
vocal music education. This model facilitates detailed spectral analysis, enabling the investigation of its auxiliary benefits
in vocal music teaching, particularly in identifying prevalent instructional challenges. Our findings indicate that during
training on vowels a and i, professional singers pitch at 4kHz declined to between -15 and -18 dB, whereas students
pitch varied around ±6dB, trending upwards. In cases of air leakage, significant gaps were observed at frequencies of
5500Hz, 10500Hz, and 14500Hz. At the same time, students exhibited missing frequencies at 7kHz, 12kHz, and 14kHz
during glottal tone production, with pronounced, abrupt peaks occurring when vocal folds were tightly constricted and
devoid of excessive links. This research substantiates the theoretical and practical benefits of digital spectrum technology
in enhancing vocal music education, thereby providing a scientific and supportive role.
Keywords: Spectrum analysis; Fourier transform; Audio feature sequence; Vocal spectrum recognition; Vocal music
teaching.
AMS 2010 codes: 68T05
Bo Zhang. Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-14
2
1 Introduction
All sounds have their unique frequencies. Sound is generated by vibration, transmitted through a
certain medium, and received by the human ear. Throughout the transmission process, whether it is
music or noise, people do not intuitively understand and analyze the acoustic characteristics of the
sound. The use of spectrum analyzers will be the sound through the spectral coordinates of the
analysis of the map to solve this problem [1-3]. Through the image, the signal size of all frequency
bands can be analyzed intuitively to understand and filter the frequency bands you need to practice
efficiently to achieve sound adjustment [4-6].
The human voice can be considered a musical instrument in the broadest sense of the word. Still, the
human vocal organs are composed of soft muscles, unlike the hard materials of other musical
instruments, and the nerves that control these muscles are different from the nerves that control the
fingers, lips, and tongue [7-9]. It is because of these characteristics that vocal music seems so abstract
and raw, with a large degree of intuitive, more subjective feeling rather than objective clarity [10-12].
The technical requirements, voice aesthetics, and singing state embodied in the singing of vocalists
are explored to analyze the spectrum of different voices, visualize the abstract vocal music, and form
intuitive images [13-15].
With the continuous development of vocal art, more and more people may begin to study the nature
of the sound. Then with the constant popularization of computers, people can understand the sound
and analyze the sound by means of science and technology, and understand the unseen and
untouchable vocal music from another perspective [16-18]. In vocal music learning, the use of
spectral analysis is even more able to quickly find the strengths and weaknesses of the sound emitted
so as to clearly adjust the direction and improve learning efficiency [19].
In this paper, assuming that the reception vector of a given vocal frequency signal has been
determined, the detected vocal frequency signal is subjected to spectral sequence feature extraction,
and the trained audio signal feature sequence is used to identify unknown signals, based on which,
the vocal frequency dataset is constructed. The Fourier Transform is used to determine the vocal
spectra of vocal music teaching and to perform the signal analysis in the time domain and frequency
domain. The model is used to analyze the spectra of “a” and “i” vocalizations of professional singers
and vocal students in order to compare and find the differences and to test its positive auxiliary effect
in vocal music teaching. Finally, the three typical problems of “air leakage,” “glottal sound,” and
“vocal folds squeezing” are analyzed spectrally to find out the causes of students’ wrong vocalizations
so as to improve the efficiency of vocal teaching. The study was conducted to identify the causes of
students’ vocal errors in order to improve the efficiency of vocal teaching.
2 Construction of the human voice spectral analysis model
2.1 Spectrum analysis algorithm
2.1.1 Time-frequency analysis of vocal signals
The Fourier transform is a linear integral transformation. A conversion method has been established
to transform a time-domain signal into a frequency-domain signal in order to analyze and understand
the intrinsic properties of the signal from a frequency-domain perspective.
The use and effective analysis of vocal spectrum analysis method in vocal music teaching
3
Equation (1) is the mathematical representation of the Fourier series,
k
e
x
is the helper form of the
complex sinusoidal signal
( )
xt
, where
X
represents the amplitude of the sinusoidal signal. And
represents the frequency of the complex sinusoidal signal. This decomposes the actual periodic
signal
( )
yt
into a superposition of an infinite number of sinusoidal signals. From the expression,
we know that
( )
yt
is also a periodic function of
2T
=
:
( ) ( )
jk
k
y t X k e
=−
=
(1)
The discrete frequency and time domain signals of the DFT transform enable a computer system to
digitally represent the DFT. The actual signals are of infinite length, but computers cannot represent
infinite-length signals, so a truncation of the original signal is used. This truncation is periodically
delayed as the main sequence of the DFT transform series. Let the length of sequence
( )
xn
be
M
and define
( )
xn
:
The
N
-point DFT is:
( ) ( )
12/
0
( ) , 0,1, , 1
Nj in N
Nn
X k DFT x n x n e k L N
=
= = =


(2)
Where
N
is called the interval length of the discrete Fourier transform. For ease of writing, let
/2 N
N
We
=
. Thus, the
N
-point DFT is usually denoted as:
( ) ( ) ( )
1
0
, 0,1, , 1
Nkn
N
Nn
X k DFT x n x n W k L N
=
= = =


(3)
The process of obtaining the frequency-domain signal properties of a signal from its corresponding
time-domain signal is known as the Fourier inverse transform, defining the
( )
Xk
discrete Fourier
inverse transform (IDFT) of a
N
-point as:
( ) ( ) ( )
1
0
1, 0,1, , 1
Nkn
N
k
x n IDFT X k X k W n L N
N
=
= = =


(4)
The DFT transform also has many useful properties. If
( )
1
xn
and
( )
2
xn
are a discrete
N
-point
sequence.
( )
1
Xk
and
( )
2
Xk
are their corresponding DFT transforms, then there are:
( ) ( ) ( ) ( )
1 2 1 2
DFT ax n bx n aX k bX k+ = +


(5)
This is called the linearity of the DFT transform, i.e., if the original signal is a linear combination of
signals
( )
1
xn
and
( )
2
xn
, then the DFT of this sequence is also a linear combination of the DFTs
corresponding to
( )
1
xn
and
( )
2
xn
. The shift property of the DFT transform means that if the
original sequence of signals is
( )
xn
shifted to the left or the right of the
m
sampling point, its
Bo Zhang. Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-14
4
corresponding DFT is equal to the DFT transform of the
( )
xn
sequence multiplied by a coefficient
related to
m
and
k
. I.e.:
( ) ( )
( ) ( )
km
km
DFT x n m W X k
DFT x n m W X k
−=


−=


(6)
The odd, even, imaginary, and real symmetry of DFT is the theoretical basis for the filter design and
optimization of the real FFT/IFFT algorithm in this topic. For a complex sequence
( )
xn
, if
( )
*
xn
is the conjugate complex of the original signal then his DFT transform
( )
*
Xk
satisfies the following
relationship with the DFT transform of the original signal:
( ) ( )
**
DFT x n X k

=−

(7)
In real-time, the source sequences of the DFT transform are often sufficiently real rather than
imaginary when applied in the field. For example, the sound pressure signals in this topic are discrete
real sequences. The DFT transform also has some special properties when
( )
xn
is a real number:
( ) ( ) ( )
( ) ( ) ( )
( ) ( )
( ) ( )
( ) ( )
*
11
arg arg
k R k
r
X k X k X N k
X k X k X N k
X X k X N k
X k X N k
X k X k
= =
= =
= =
=−
=
(8)
Where
( )
R
Xk
and
( )
l
Xk
are the real and imaginary parts of the Fourier transform
( )
Xk
,
respectively. And when
( )
xn
is a real even function, its Fourier transform is a pure real sequence,
if
( )
xn
is an odd function, its Fourier transform is a pure imaginary sequence.
2.1.2 Fast implementation of the Fourier transform
Completing the DFT transform for a sequence of points in column
N
requires
2
N
complex
multiplications and
( )
1NN
complex additions. In contrast, a complex multiplication requires
four real multiplications, and a complex addition requires two real additions. For the sequence of
columns
2048N=
, a total of 16777216 real multiplications and 8384512 additions are necessary.
The Fast Fourier Transform requires the DFT to have a transform interval of length
2N
=
, where
M
is a natural number. According to the defining equation (3) of the DFT transform, the above
equation is decomposed by the parity of
n
as:
( ) ( ) ( )
( )
/2 1 /2 1 21
00
2 2 1
NN
kl
kl
NN
ll
X k x l W x l W
−−
+
==
= + +

(9)
Let
( ) ( )
12x l x l=
,
( ) ( )
221x l x l=+
, due to
2/2
N kl
NN
WW=
, so the above equation can be written as:
The use and effective analysis of vocal spectrum analysis method in vocal music teaching
5
( ) ( ) ( )
/2 1 /2 1
1 /2 2 / 2
00
, 0,1, 2, , 1
NN
k k k
N N N
ll
X k x l W W x l W k L N
−−
==
= + =

(10)
Denote by
( )
1
Xk
and
( )
2
Xk
the
/2N
-point DFT of
( )
1
xl
and
( )
2
xl
, respectively, i.e:
( ) ( ) ( )
( ) ( ) ( )
/2 1
1 1 1 /2
/2 0
/2 1
2 2 2 /2
/2 0
, 0,1, , / 2 1
, 0,1, , / 2 1
Nkl
N
Ni
Nkl
N
Ni
X k DFT x l x l W k L N
X k DFT x l x l W k L N
=
=
= = =


= = =


(11)
Using the implied periodicity of
/2 *kN
Nk
WW
=−
and
( )
1
Xk
and
( )
2
Xk
, one obtains:
( ) ( ) ( )
( ) ( )
12
12
0,1, 1
22
k
N
k
N
X k X k W X k
NN
X k X k W X k k L
=+

+ = =


(12)
When
3
28N==
, the first parity extraction decomposition is represented by a butterfly diagram. It
can be calculated that after one extraction decomposition, when
1N
, the operation of making a
N
-point DFT is approximately halved. Therefore, the decomposition should be continued. After
M
levels of time-domain parity extraction, it can be decomposed into
N
1-point DFTs and
M
levels
of butterfly computation with
2N
butterflies per level. And the 1-point DFT is the 1-point time-
domain sequence itself.
2.2 Vocal Spectrum Recognition in Vocal Music Teaching
2.2.1 Detection of vocal frequency signals
Assuming that the received vector for a given audio signal of the human voice has been determined,
traverse all the transmit channel matrices, calculate the Euclidean distance between the target vector
and the received vector, and use the vector with the smallest distance as the vector of the audio signal.
At a certain transmission rate at the transmitter point of the audio signal, estimate the audio signal
x
at the receiver, denoted as:
2
2
ˆargmin
Ni
xA
x y Gx
=−
(13)
Where,
i
N
denotes the number of transmitting antennas,
y
denotes the target vector,
A
denotes
the amplitude set of the audio signal, and
G
denotes the channel matrix.
In order to control the complexity of the algorithm, the concept of cost function is introduced in the
detection algorithm, and the cost function is obtained to be expressed as:
( )
2
G G G
x x G Gx y Gx
=−
(14)
Bo Zhang. Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-14
6
After the above calculations are completed,
( )
x
is calculated as a negative value; the smaller the
result of the calculation indicates the better performance of the algorithm, and vice versa, means the
poor performance of the algorithm.
2.2.2 Audio signal feature sequence extraction
Spectral sequence feature extraction is performed on the detected vocal frequency signal to obtain a
high dimensional feature vector set
1 2 N
, , , ,B B b b b=
, the feature vector set representation is in
the form of a multidimensional spectrogram, and
N
in the vector set denotes the frame index and
the difference distance between frame
i
and the rest of the
1N
is computed for the spectral
sequence respectively
0
D
The computational formula is as follows:
, 1, ,
ij j i
D b bi j N= =
(15)
Combining all the computed distance vectors
i
D
gives the
N
nd order matrix
H
. i.e.:
12 13 1
12 23 2
13 23 3
1 2 3
0
0
0
0
N
N
N
N N N
D D D
D D D
H D D D
D D D




=




(16)
Where
i
and
j
denote the spectral frame indexes,
i
D
denotes the difference between the two
spectrograms,
i
b
and
3
b
denote the spectral sequence of frame
i
or
j
, respectively, and
H
denotes the vector matrix consisting of all the difference values. By default, the spectral information
within
J
and
T
J
is identical, and the upper triangular matrix is used to represent the entire matrix
in the subsequent operations to reduce the computational complexity.
In order to be able to obtain all the information from the spectrogram to construct the audio signal
feature sequence, the average value of the multidimensional spectrogram feature set and the distance
between the spectrogram features and the average value of each frame are calculated using the
formula as follows:
1
N
i
i
ave
b
bN
=
=
(17)
i i ave
diag b b=−
(18)
Where
ave
b
denotes the average of the set of spectrogram features and
i
diag
denotes the distance
difference, which are combined to form matrix
mo
H
.
The use and effective analysis of vocal spectrum analysis method in vocal music teaching
7
1 12 13 1
2 23 2
33
0
00
0 0 0
N
N
mN
n
diag D D D
diag D D
H diag D
diag




=




(19)
The spectral sequence extracted through the above process is the feature sequence of the vocal audio
signal. After obtaining the target signal feature sequence, it is used as a training sample to recognize
the vocal audio signal.
2.2.3 Signal Recognition
In the stage of human voice audio signal recognition, the trained sequence of audio signal features is
used to recognize the unknown signal and the audio signal recognition task is divided into a binary
classification task, which constitutes
( )
12CC
binary classification model, denoted as:
( )
1
N
i i i i
i
R B S

=
=
(20)
( )
( )
( )
1
N
i i i i i i j i
i
E R B S R B S

=
=
(21)
Where
denotes the normal vector of the binary classification model,
E
denotes the bias,
i
R
denotes the regularization parameter, and
i
denotes the indicator function. For the input feature
vector
m
H
, the judgment process of the binary classification model in the above is:
( )
,,
ii
S B E R

=
(22)
Where
is the recognition result of the binary classification model. So far, the design of the human
voice frequency signal recognition algorithm based on a multi-dimensional spectrogram has been
completed.
2.2.4 Building a Human Voice Frequency Dataset
The open-source MIDI sheet music is used as the recognition target to extract the audio map features,
set the sampling rate to 44kHz and the sampling bit to 16bit, extract the start and stop time of each
note in the map, and synthesize the WAV file as the standard template sequence for the experiment.
For a certain pitch subband signal
s
, the short-time prescription energy of the signal at
nN
is
calculated using the formula as follows:
( ) ( )
2
22
dd
r n n
W s s r

+


=
(23)
Where
d
denotes the rectangular window length and
r
denotes the resolution of each window.
Bo Zhang. Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-14
8
After processing, the standard human voice frequency template sequence is obtained to complete the
construction of the experimental dataset.
3 Results and Discussion
3.1 Vocal Spectrum Analysis as an Aid to Vocal Teaching and Learning
After a comparative study of singer sampling as well as student sampling to obtain the form, it can
be put to use in actual vocal teaching. The aim was to select a set of vocal students from a training
school to participate in the new teaching experience. First of all, the beginner vocalists who
participated in the course were trained to pronounce “a” and “i” in practice, and then the spectrum
was utilized to display the pronunciation graphs in real time. The results are shown in Fig. 1 after two
tests of the same pronunciation at the same tuning position and different distances from different
pitches of the same pronunciation tuning test.
The spectrum of the singer is blue, while the spectrum of the vocal student is orange. After comparison,
it is found that during the pronunciation process, the vocal learner’s loudness in the pronunciation of
the bass frequency is obviously missing. In contrast, the middle and high-frequency positions are
significantly higher than the singer’s audio spectrum. This phenomenon is especially obvious as the
pitch rises, and the position of “cough,” which should be declined at 4 kHz, appears to be maintained
or maintained at different pitches. At 4kHz, the position of the cough” sound is maintained or
increased. While the professional singer’s spectral curve at 4kHz has decreased in pitch to a range of
-15db to -18db, the student’s pitch remains in the range of ±6db, and there is even a tendency for it
to go further up. The student has a significant problem with their larynx.
Figure 1. Professional singers contrast with vocal students singing
After determining the direction of the problem, the vocal teacher can correct it through a series of
training, such as relaxing the jaw, guiding the student’s oral cavity to remain active, preventing the
lower part of the throat from exerting force, relaxing the tongue and a series of methods targeted at
correcting the student’s voice after adjusting the state of most of the intentional force from the
laryngeal position to the respiratory state. Obviously, the throat sound is reduced. The student’s
spectrum was also closer to the singer’s spectrum, and the student’s spectrum after correction and
improvement is shown in Figure 2. Although the student still did not reach the singers standard at 4
The use and effective analysis of vocal spectrum analysis method in vocal music teaching
9
kHz, the original upward trend had been changed to a downward trend in the -6 db to -3 db range.
This fully proves the great and effective role of spectrum analysis in vocal teaching.
Figure 2. Professional singers and vocal students singing contrast (improved)
Another student’s articulation was similar to the first one, with a tight laryngeal sound. The same
sampling method was used to obtain a comparison chart, which is shown in Fig. 3, a comparison of
the student’s and the professional singer’s vocalization spectra.
In the figure, it can be clearly seen that under the same loudness, the student’s spectrum, compared
to the singer, the mid-frequency and high-frequency part of the lack of obvious, and accompanied by
part of the regional laryngeal problem characteristics, such as in the 3kHz part, the singer ’s spectral
curve ranges from -6db to -9db down to the range of -15db to -18db, and the student only down to
the range of -9db to -12db. There is a gap between the singer and the singer. In the traditional teaching
process, due to articulation in the auditory and laryngeal problems close, easy for inexperienced
teachers to solve the breakthrough point on the breath, but the spectrum shows that the main
characteristics in the table categorized in the nasal problem, emphasizing the breath problem is
counterproductive, the situation can be analyzed that the student appeared in the laryngeal problem
is caused by excessive nasal sound, the teacher because of the focus on the solution of the students
nasal sound problem, the vocal teacher through the Teachers focus on solving the student’s nasal
sound problem, voice teachers through a series of methods to solve the problem of nasal sound, such
as adjusting the internal state of the mouth, singing with the state of speech, voice concentration,
leaning, before the closed-mouth voice as the main, open-mouth voice as a supplement to a series of
solutions to break through the student’s problem. After correction, the student also made significant
progress.
Bo Zhang. Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-14
10
Figure 3. Professional singers contrast with vocal students singing
3.2 Spectral analysis of common problems in vocal music
In the process of vocal learning, singers may experience problems with their voice when they have
not yet mastered the correct vocal technique. These problems are essentially subtle in their impact on
the student and can only be corrected and solved by the teacher. However, during the process of
learning, singers often do not realize that their voices have the problems above if there is no teacher
beside them. We can analyze our voice spectrally through spectrum analysis software. In vocal
teaching, the teacher can analyze the student’s voice using spectrum analysis software. In vocal
teaching, the teacher can assist in judging the student’s incorrect vocal habits by observing spectrum
charts from spectrum analysis in order to correct the mistakes in time.
3.2.1 Spectral curves of sound leakage and their solutions
For students who have not mastered the correct method of vocalization, the leakage of the vocal cords
is one of the more likely problems, so the voice sounds like an airflow sound. The voice is not focused
and scattered. For beginners, it is easiest to hide the vocal cords for fear of singing the voice badly,
so a teacher should correctly guide the students to dare to sing with the vocal cords, not to hide. The
more you hide the vocal cords, the more problems will occur. Figure 4 is the spectrogram of the
correct state, and Figure 5 is the spectrogram of the state of the sound leakage. Through the
comparison can be seen the vocal folds in the state of the leakage of some special frequency bands,
such as 5500Hz, 10,500Hz, and 14,500Hz near the band, there will be a lack of the situation, which
is due to the lack of full vibration of the vocal folds.
The use and effective analysis of vocal spectrum analysis method in vocal music teaching
11
Figure 4. Unleakable vocal cord spectrum
Figure 5. Spectrum of acoustic leakage
There are two general causes of vocal fold leakage: the first is a damaged, diseased, or severely
defective vocal fold. There are many diseases of the vocal folds, such as knots on the vocal folds,
polyps on the vocal folds, acute inflammation of the vocal folds, etc. These diseases can seriously
affect the vocal folds at the closure, resulting in uneven force when the vocal folds are closed and the
closure is not tight, which leads to the leakage of air from the vocal folds. The second reason is that
the singer’s vocal folds are not capable of closing enough, the breath is too strong, and the vocal folds
are powerless to close and block the airflow, leading to vocal fold leakage. For the first case, singers
should seek medical treatment in time, preferably no vocal treatment, because it is fundamental to
have a good vocal fold so as to produce good sound quality, so singers need to protect their vocal
folds. Singers should adhere to the principle of gradual and orderly progress in daily practice and
gradually increase the difficulty of practice to avoid excessive use of the vocal cords in the second
case.
3.2.2 Spectral curves and solutions for laryngeal sounds
Laryngeal voice is also a problem that is more likely to occur in beginning voice students, such as a
voice that is heavy, rough in tone, and dull. Why does the glottal voice sound dull and heavy? Figure
6 shows a typical frequency curve with a glottal voice problem. It can be seen that the lower positions
are louder and the higher frequencies are lacking, so such a voice sounds dull and lackluster, with
missing values at 7kHz, 12kHz, and 14kHz.
Bo Zhang. Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-14
12
Figure 6. The spectral curve of the guttural sound
“Open throat” for beginners may not understand the right situation. Such a situation is also the main
reason for the large throat, but also in the learning of singing, deliberately imitating the voice of the
singer in the recording, the pursuit of volume, and the quest for thickness resulting in the throat sound.
Students can not send their voice through the breath into the nasopharyngeal cavity or nasopharyngeal
cavity to obtain real resonance, so the students sing when the larynx tense force, the voice is blocked
in the throat, and the breath can not flow. Some, in order to pursue resonance, will also be held up
their throat, resulting in the voice and breath of the incoherence. The vocal cords are extremely
straining when singing this way, which can easily lead to vocal fatigue.
For the laryngeal voice, beginners should listen more and more to see, not just to imitate the voice,
which is a bad habit. Secondly, to have a number of their level, do not challenge for their difficulty is
too large work to establish a correct concept so that the entire throat is in a state of natural relaxation,
to relax the root of the tongue and the muscles around the throat, the heart can not think of holding it
open, a correct understanding of the concept of strengthening the training for the breath to find the
breath flowing up and down in front of the pharyngeal wall. The breath flows up and down in front
of the pharyngeal wall, driving the vocal cords to vibrate and produce a feeling of relaxation in the
throat.
3.2.3 Spectral curves of vocal folds jammed and their solutions
Vocal folds “squeeze card” is usually because the breath is not smooth; the vocal folds do not have
breath support, but the vocal folds have to produce sound. The vocal folds only rely on the
surrounding muscles to help the vocal folds vibrate and sing, and the jaw, throat, and chest cavity feel
very tense. Such a voice does not sound loose, stiff, or pale. A vocal teacher can analyze a student’s
voice with the help of spectral analysis software to help determine what problems the student may
have. Figure 7 shows the spectrum of a typical vocal fold squeeze, with very pronounced and abrupt
peaks and no natural transitions. Through observation, it is observed that the voice is straight and
does not vibrate as it typically does when it is relaxed.
The use and effective analysis of vocal spectrum analysis method in vocal music teaching
13
Figure 7. The spectral curve of Cord tension
To solve the sound “squeeze card” problem, first of all, to solve the breath problem, first have gas
after the sound, the sound in the breath, only breath problem solved, in order to liberate the vocal
folds around the excess strength, and then to sound to establish a correct understanding: singing can
not blindly pursue high, the pursuit of bright, and only based in the proper method of vocalization of
the voice, in order to go! Lyric song emotion, which is the beautiful sound.
4 Conclusion
In this paper, we constructed a vocal spectrum analysis model. We verified its auxiliary effect on
vocal teaching based on the comparison of professional singers’ and students’ vocal spectra. We
explored the performance of common vocal teaching problems in the spectrum and came to the
following conclusions:
1) In the vocal training of a and i, the professional singers spectral curve at 4kHz has
dropped to the range of -15db to -18db, but the student still maintains the range of ±6db and
is still in the rise. The spectrum analysis corrected it, and it now has a downward trend
reaching the range of -6 to -3, which confirms the effective auxiliary effect of life spectrum
analysis on vocal music teaching.
2) Through the spectrum analysis of the three typical problems of air leakage, laryngeal
sound, and vocal folds squeezing in vocal music teaching, it is found that the state of air
leakage is 5500hz, 10500hz, 14500hz, 10500hz, 14500hz and 14500hz, 10500hz, 14500hz,
and other nearby frequency bands are missing. When there is a guttural sound, there are
missing values at 7 kHz, 12 kHz, and 14 kHz. The peaks are very pronounced and abrupt when
the vocal folds are squeezed out of the throat, and there is no excessive resonance.
References
[1] Cao, W. (2022). Evaluating the vocal music teaching using backpropagation neural network. Mobile
Information Systems.
[2] Liu, W., & Shapii, A. B. (2022). Study on aesthetic teaching methods in ethnic music teaching in
universities in the context of intelligent internet of things. Scientific programming(Pt.16), 2022.
[3] Ma, X. (2021). Analysis on the application of multimedia-assisted music teaching based on ai technology.
Advances in multimedia(Pt.1), 2021.
Bo Zhang. Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-14
14
[4] Bittner, R. M., Demetriou, A., Gulati, S., Humphrey, E. J., Reddy, S., & Seetharaman, P., et al. (2019).
An introduction to signal processing for singing-voice analysis: high notes in the effort to automate the
understanding of vocals in music. IEEE Signal Processing Magazine.
[5] Xu, Y. (2021). Systematic study on expression of vocal music and science of human body noise based on
wireless sensor node. Mobile information systems.
[6] Huang, M., & Zhang, Y. (2021). Design and construction of a pbl based evaluation index system for
classroom music education. International Journal of Emerging Technologies in Learning (iJET), 16(17),
107.
[7] Dimitrova-Grekow, T., Klis, A., & Igras-Cybulska, M. (2019). Speech emotion recognition based on
voice fundamental frequency. Archives of acoustics, 44(2), 277-286.
[8] Takeuchi, M., Soejima, Y., Ahn, J., Lee, K., Takaki, K., & Ifukube, T., et al. (2022). Development of a
hands-free electrolarynx for obtaining a human-like voice using the lpc residual wave. Electrical
engineering in Japan.
[9] Xiang, X., Zhang, X., & Chen, H. (2021). Acquisition and enhancement of remote human vocal signals
based on doppler radar. IEEE sensors journal(21-18).
[10] Han, Jae HyunKwak, Jun-HyukJoe, Daniel JuhyungHong, Seong KwangWang, Hee SeungPark, Jung
HwanHur, ShinLee, Keon Jae. (2018). Basilar membrane-inspired self-powered acoustic sensor enabled
by highly sensitive multi tunable frequency band. Nano Energy, 53.
[11] Karthika, Vijayan, Haizhou, Li, Tomoki, & Toda. (2018). Speech-to-singing voice conversion: the
challenges and strategies for improving vocal conversion processes. IEEE Signal Processing Magazine.
[12] Raymundo, A. A., Akhtar, M. Z., Felipe, S. J., Douglas, O., & Henrique, F. T. (2018). Feature pooling of
modulation spectrum features for improved speech emotion recognition in the wild. IEEE Transactions
on Affective Computing, PP, 1-1.
[13] Zhang, Y., & Yi, D. (2021). A new music teaching mode based on computer automatic matching
technology. International Journal of Emerging Technologies in Learning (iJET)(16).
[14] Li, W. (2019). Design and implementation of music teaching assistant platform based on internet of things.
Transactions on Emerging Telecommunications Technologies.
[15] Nam, J., Choi, K., Lee, J., Chou, S. Y., & Yang, Y. H. (2019). Deep learning for audio-based music
classification and tagging: teaching computers to distinguish rock from bach. IEEE Signal Processing
Magazine.
[16] Xia, X., & Yan, J. (2021). Construction of music teaching evaluation model based on weighted nave bayes.
Scientific Programming.
[17] Zhang, Y., & Li, Z. (2021). Automatic synthesis technology of music teaching melodies based on
recurrent neural network. Scientific programming(Pt.13), 2021.
[18] Gan, L., Wang, D., Wang, C., Xiao, D., & Li, F. (2021). Design and implementation of multimedia
teaching platform for situational teaching of music appreciation course based on virtual reality.
International Journal of Electrical Engineering Education, 002072092098609.
[19] Neilsen, T. B., Vongsawad, C. T., & Onwubiko, S. G. (2020). Teaching musical acoustics with simple
models and active learning. The Journal of the Acoustical Society of America, 148(4), 2528-2528.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The conventional electrolarynx (EL) restores the voice of people who have lost their voice, however, its voice sounds like a robot and it occupies user's one hand. Therefore, we developed a hands‐free wearable electrolarynx that improves the voice quality with the LPC residual wave as the vibration sound. Our device was able to reproduce sounds below 500 Hz, which is close to the fundamental frequency, but from 1700 Hz to 4000 Hz, the conventional EL continuously showed a frequency response closer to the spectrum of the user's vocal sound than our device. Thus, the periodic sound of the fundamental frequency band was found to be closer to human speech than individuality or sound quality.
Article
Full-text available
Folk vocal music is an important part of music majors in colleges and universities (CaU), and the core of music education is AE. Therefore, in vocal music teaching, students should be guided to understand the beauty of music and to use vocal performance as an aesthetic experience so as to cultivate their aesthetic appreciation ability. As one of the national musical instruments, Chinese Zither has a unique musical charm. Its traditional aesthetic value is high, which can bring people beautiful music enjoyment. This paper takes Chinese Zither as an example to study the aesthetic teaching methods in college national instrumental music teaching. In the process of research, we use a combination of a variety of research methods to study two classes of music major in our school. This paper first uses the literature analysis method to elaborate the AE and then uses the questionnaire survey method to study the college students’ understanding and views on the national instrumental music. Then, through the case analysis method, it analyzes the achievements and interest changes of the students after the introduction of aesthetic teaching in Chinese Zither teaching. At the same time, through expert interviews, it summarizes the teaching of national instrumental music in the CaU aesthetic teaching strategy and the use of mathematical analysis on the relevant data processing. This study found that before the introduction of aesthetic teaching in Chinese Zither teaching, the good rate of class 2 students was only 33.33%, and after the introduction of aesthetic teaching in Chinese Zither teaching, the good rate of students in class 2 reached 86.67%. Therefore, the introduction of aesthetic teaching in Chinese Zither teaching can effectively improve students’ learning level. This also shows that in the teaching of ethnic music, the introduction of aesthetic teaching and the combination of aesthetic art and music art can effectively improve the learning level of students.
Article
Full-text available
The vocal music teaching for evaluating performers is affected by multiple factors. Evaluators are greatly influenced by subjective factors in scoring outputs. The backpropagation (BP) neural network provides a novel technology that can theoretically simulate any nonlinear continuous function within a certain accuracy range. The backpropagation neural network is composed of adaptive feedforward learning network that is widely used in artificial intelligence (AI). In addition, the backpropagation neural network can simulate the nonlinear mapping composed of various factors. The novelty of the neural network is that it can model the nonlinear process without knowing the cause of the data, which can overcome the human subjective arbitrariness and make the evaluation outcomes. Furthermore, accurate and effective scoring systems can be designed using neural networks. In this paper, we establish a vocal music evaluation research system in order to objectivize each vocal music teaching evaluation index. To do so, we use the score vector as the input and obtain a reasonable and objective output score through the backpropagation neural network. Moreover, according to the characteristics of the backpropagation neural network, the factors of vocal music teaching evaluation are analyzed, and a backpropagation neural network model for vocal music teaching evaluation and evaluation is constructed. The experimental outcomes demonstrate that the trained backpropagation network can simulate a stable vocal music teaching evaluation research system. Furthermore, we observed that the backpropagation neural network can be well utilized for vocal music teaching evaluation research.
Article
Full-text available
In order to improve the effect of modern music teaching, this paper combines AI technology to construct a multimedia-assisted music teaching system, combines music teaching data processing requirements to improve the algorithm, proposes appropriate music data filtering algorithms, and performs appropriate data compression processing. Moreover, the functional structure analysis of the intelligent music teaching system is carried out with the support of the improved algorithm, and the three-tier framework technology that is currently more widely used is used in the music multimedia teaching system. Finally, in order to realize the complex functions of the system, the system adopts a layered approach. From the experimental research results, it can be seen that the multimedia-assisted music teaching system based on AI technology proposed in this paper can effectively improve the effect of modern music teaching.
Article
Full-text available
Computer music creation boasts broad application prospects. It generally relies on artificial intelligence (AI) and machine learning (ML) to generate the music score that matches the original mono-symbol score model or memorize/recognize the rhythms and beats of the music. However, there are very few music melody synthesis models based on artificial neural networks (ANNs). Some ANN-based models cannot adapt to the transposition invariance of original rhythm training set. To overcome the defect, this paper tries to develop an automatic synthesis technology of music teaching melodies based on recurrent neural network (RNN). Firstly, a strategy was proposed to extract the acoustic features from music melody. Next, the sequence-sequence model was adopted to synthetize general music melodies. After that, an RNN was established to synthetize music melody with singing melody, such as to find the suitable singing segments for the music melody in teaching scenario. The RNN can synthetize music melody with a short delay solely based on static acoustic features, eliminating the need for dynamic features. The proposed model was proved valid through experiments.
Article
Full-text available
Evaluation of music teaching is a highly subjective task often depending upon experts to assess both the technical and artistic characteristics of performance from the audio signal. This article explores the task of building computational models for evaluating music teaching using machine learning algorithms. As one of the widely used methods to build classifiers, the Naïve Bayes algorithm has become one of the most popular music teaching evaluation methods because of its strong prior knowledge, learning features, and high classification performance. In this article, we propose a music teaching evaluation model based on the weighted Naïve Bayes algorithm. Moreover, a weighted Bayesian classification incremental learning approach is employed to improve the efficiency of the music teaching evaluation system. Experimental results show that the algorithm proposed in this paper is superior to other algorithms in the context of music teaching evaluation.
Article
Full-text available
The quality of classroom education is essential to the construction and development of departments and colleges, and a mirror of the teaching level and quality of teachers. Project-based learning (PBL) tries to create problem scenarios for learners, and encourage them to think deeper and communicate more. Inspired by PBL, this paper analyzes the existing evaluation index systems (EISs) for classroom music education in colleges, and summarizes the application of PBL in music teaching. Besides, a novel EIS is designed and constructed for classroom music education. The results show that: PBL stresses the changes of learners through the learning, and enables teachers to combine pre-, in-, and post-class methods to comprehensively evaluate the music literacy of students; PBL exerts a positive effect on students’ learning. The results lay a theoretical basis for reforming the EIS of classroom music education.
Article
Full-text available
The rapidly developing computer technology has been extensively applied in music teaching. The computer automatic matching technology supports the automatic generation of randomized teaching contents, providing a good tool to develop the musical thinking of students. This paper tentatively introduces this technology to music teaching, and derives a new music teaching mode. The results show that, computer technology can effectively assist in music teaching; the inclusion of computer technology in music teaching arouses students’ interest in music activities; the application of computer automatic matching technology has improved students’ professional skills, live music performance, as well as music ability. The research results greatly promote the reform of music teaching modes and methods.
Article
Full-text available
In this paper, we use a wireless sensor node to investigate the relationship between vocal art composition and human body noise. For the vocal music, the suggested predictor created a wireless sensor node convergence link model and the HART graph routing model is used to systematically design the human noise science. The wireless sensor node optimizes the acquisition of information on vocal art and analyzes the acoustic signal characteristics. The optimally balanced scheduling model has been used to scientifically and distributedly regulate vocal art performance and human body noise. The proposed model enhances the signal detection ability and raises the performance level of vocal art, music art, and human body noise signal. The results of the simulation demonstrate that the proposed model enhances the signal acquisition capability of vocal music and the analysis of human noise. It has a powerful ability to generate vocal signal characteristics and a stronger ability to handle interference with noise.
Article
In order to achieve remote voice acquisition, a 9.75-GHz microwave Doppler radar with a high-sensitivity coherent homodyne demodulator is used to detect voice signals. However, the combined noise sources, including background noise and shock noise, are included in the Doppler audio radar-captured voice, which seriously damages the quality of speech. In the current work, a multi-stage enhancement method is used to enhance Doppler audio radar-captured voice. The first stage is dictionary learning, which uses the K-singular value decomposition algorithm to train speech and noise dictionaries. The noise training data come from the Doppler audio radar-captured shock noise signals under the influence of different relative motion modulations. The second stage generates a position mask of the shock noise. The third stage is the removal of background noise and shock noise. The removal of background noise adopts our proposed constrained low-rank sparse matrix decomposition algorithm based on power spectral density. The final speech spectrum with shock noise removed is obtained through sparse restoration using the LARC algorithm. The experimental results show that the 9.75-GHz Doppler audio radar can effectively acquire human voice signals, and the proposed multi-stage enhancement method can effectively remove the background noise and shock noise in the radar speech, thereby improving the quality of the audio radar voice. The obtained results imply the potential to implement practical Doppler audio radar systems for human voice signal acquisition.