Available via license: CC BY 4.0
Content may be subject to copyright.
Citation: Bagri, I.; Tahiry, K.; Hraiba,
A.; Touil, A.; Mousrij, A. Vibration
Signal Analysis for Intelligent
Rotating Machinery Diagnosis and
Prognosis: A Comprehensive
Systematic Literature Review.
Vibration 2024,7, 1013–1062.
https://doi.org/10.3390/
vibration7040054
Academic Editor: Aleksandar Pavic
Received: 26 August 2024
Revised: 28 September 2024
Accepted: 22 October 2024
Published: 31 October 2024
Copyright: © 2024 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
vibration
Systematic Review
Vibration Signal Analysis for Intelligent Rotating Machinery
Diagnosis and Prognosis: A Comprehensive Systematic
Literature Review
Ikram Bagri 1,*,†,‡ , Karim Tahiry 2,‡ , Aziz Hraiba 1,‡ , Achraf Touil 1,‡ and Ahmed Mousrij 1,‡
1Laboratory of Engineering, Industrial Management and Innovation, Settat 577, Morocco;
hraibaz@gmail.com (A.H.); ac.touil@uhp.ac.ma (A.T.); mousrij@gmail.com (A.M.)
2Laboratory of Information Technology and Management, Settat 577, Morocco; karim.tahiry@gmail.com
*Correspondence: i.bagri@uhp.ac.ma
†Current address: Electrical and Mechanical Engineering Department, Faculty of Science and Technology,
Hassan 1st University, Settat 577, Morocco.
‡These authors contributed equally to this work.
Abstract: Many industrial processes, from manufacturing to food processing, incorporate rotating
elements as principal components in their production chain. Failure of these components often leads
to costly downtime and potential safety risks, further emphasizing the importance of monitoring
their health state. Vibration signal analysis is now a common approach for this purpose, as it provides
useful information related to the dynamic behavior of machines. This research aimed to conduct a
comprehensive examination of the current methodologies employed in the stages of vibration signal
analysis, which encompass preprocessing, processing, and post-processing phases, ultimately leading
to the application of Artificial Intelligence-based diagnostics and prognostics. An extensive search was
conducted in various databases, including ScienceDirect, IEEE, MDPI, Springer, and Google Scholar,
from 2020 to early 2024 following the PRISMA guidelines. Articles that aligned with at least one of the
targeted topics cited above and provided unique methods and explicit results qualified for retention,
while those that were redundant or did not meet the established inclusion criteria were excluded.
Subsequently, 270 articles were selected from an initial pool of 338. The review results highlighted
several deficiencies in the preprocessing step and the experimental validation, with implementation
rates of 15.41% and 10.15%, respectively, in the selected prototype studies. Examination of the
processing phase revealed that time scale decomposition methods have become essential for accurate
analysis of vibration signals, as they facilitate the extraction of complex information that remains
obscured in the original, undecomposed signals. Combining such methods with time–frequency
analysis methods was shown to be an ideal combination for information extraction. In the context
of fault detection, support vector machines (SVMs), convolutional neural networks (CNNs), Long
Short-Term Memory (LSTM) networks, k-nearest neighbors (KNN), and random forests have been
identified as the five most frequently employed algorithms. Meanwhile, transformer-based models
are emerging as a promising venue for the prediction of RUL values, along with data transformation.
Given the conclusions drawn, future researchers are urged to investigate the interpretability and
integration of the diagnosis and prognosis models developed with the aim of applying them in
real-time industrial contexts. Furthermore, there is a need for experimental studies to disclose the
preprocessing details for datasets and the operational conditions of the machinery, thereby improving
the data reproducibility. Another area that warrants further investigation is differentiation of the
various types of fault information present in vibration signals obtained from bearings, as the defect
information from the overall system is embedded within these signals.
Keywords: vibration signal analysis; rotating machinery; machine learning; signal processing;
Maintenance 4.0
Vibration 2024,7, 1013–1062. https://doi.org/10.3390/vibration7040054 https://www.mdpi.com/journal/vibration
Vibration 2024,71014
1. Introduction
Numerous industries, including power generation, industrial manufacturing, and
transportation systems, have come to depend on rotating machinery. These machines bring
real value in an industrial setting; however, they are prone to breakdowns, which lead
to downtimes and increasing maintenance costs. Consequently, the management of the
health status of rotating machinery is crucial to sustainable productivity. This management
involves both diagnosis and prognosis. On one hand, diagnosis entails detecting existing
failures or abnormal functions in machinery to ensure a proactive approach and circumvent
a reactive response to a catastrophic failure. On the other hand, prognosis forecasts the state
of the equipment based on historical data and trends, allowing for accurate scheduling of
maintenance operations, as well as replacement programs, to extend the life cycle of the
equipment [1–16].
To maintain an accurate understanding of the critical component conditions within an
industrial environment, the machinery is to be inspected while operational, and any harm
to the equipment is to be avoided. On account of this, the monitoring of rotating machinery
relies on non-destructive techniques (NDTs), namely Ultrasound Testing (UT), infrared
(IR) images, Acoustic Sound-Based Condition Monitoring (ASCM), Electrical Signature
Analysis (ESA), and vibration signal analysis (VSA). Figure 1provides the concepts for
each of the cited techniques.
Vibration signal analysis (VSA) is frequently employed for rotating machinery moni-
toring, as it visualizes vibration shocks and changes in movement patterns. Any imbalance,
misalignment, bearing wear, or gear damage will cause the vibration signature to deviate
from the baseline signature of a normal operating machine. As a consequence, a system
response in the form of pulsations appears in the vibration signal, indicating particular
types of faults [
17
–
20
]. Through a comprehensive analysis of the vibration signal, faults
that are still obscure or have not fallen into the range of detection with other techniques
are detected. Furthermore, with the advent of Artificial Intelligence (AI), mainly Machine
Learning (ML), the analysis of vibration signals has been facilitated, allowing researchers
and engineers to explore the information encapsulated in signals further [21].
Regardless of the specific non-intrusive testing method chosen, the fault detection
process generally consists of similar steps, albeit with adjustments based on the type of
data utilized (Figure 2).
Signal data and image data are differentiated in this context. The initial phase involves
preprocessing the collected data and then converting raw data into numerical data for fur-
ther analysis using signal processing or image processing techniques. This analysis yields a
set of features derived from the processed numerical signal, which are then refined in the
post-processing stage. Here, the most relevant features are chosen, potentially enhanced,
and combined into a final feature vector suitable for classification via AI algorithms for
diagnostic and/or prognostic purposes. Ultimately, these AI algorithms determine the
machinery’s current state and/or predict its future state.
In this review, a novel synthesis of the latest developments in the monitoring of
rotating machinery using vibration signal analysis is presented. It distinguishes itself from
prior reviews by concentrating on studies published in the last four years that have not
been thoroughly examined elsewhere in the literature. By adopting an interdisciplinary
approach, our review targets the comprehensive phases of the global monitoring process,
from data retrieval to fault detection. Hence, the content provided offers an exhaustive
understanding of the issue at hand, which has been insufficiently addressed in earlier works
that tended to isolate specific components. In the retrieval process of studies to review, the
selection is extended to Electroencephalograms (EEGs) given that they pertain to the same
class of signals as vibration signals. Both are transient and non-linear and contain multi-
scale information; thereby, the information extraction process of EEGs can be applied to
vibration signals. This approach sheds light on a broader set of signal processing techniques
that can potentially extract more intricate information from vibration signals. In addition, a
comparative analysis of a range of emerging methodologies is conducted. It targets specific
Vibration 2024,71015
algorithms and frameworks, providing a critical assessment of their respective advantages,
drawbacks, and potential avenues to assist researchers who are focused on this particular
area of study. The review is concluded by drawing tangible conclusions from the substantial
number of articles analyzed and providing specific directions for the implementation and
of academic research work and its integration into Industry 4.0 systems.
In Section 1, the methodology of the review is explained, and a preliminary analysis
of the selected studies is presented. Section 3discusses the findings of the review in terms
of the different phases of the monitoring process. Sections 4and 5, respectively, contain the
conclusions drawn and the future directions identified.
Figure 1. Non-intrusive monitoring approaches for rotating machinery.
Vibration 2024,71017
Figure 4represents a flow diagram where each phase of the review process is detailed
further.
Figure 4. Overall flowchart of the methodology.
Studies eligible for inclusion were determined based on the accessibility of their full
texts and their pertinence to at least one of the primary topics specified in Table 1.
Table 1. Scope of the review.
Main Objective
Preprocessing Processing Post-
Processing Diagnosis Prognosis Experimental
Validation
Question asked Were the data
preprocessed?
How were the
data processed?
How were the
results of the
processing
optimized?
Which AI tools
were used?
Which AI tools
were used?
Were the results
validated exper-
imentally?
The primary data for this review were collected from a variety of databases, including
ResearchGate, ScienceDirect, IEEE Xplore, MDPI, SpringerLink, and Google Scholar. The
latter was particularly useful as a gateway to other databases owing to its extensive indexing
capabilities and the significant impact factors of the sources it encompasses. The review
specifically targeted studies published from 2020 to early 2024 to ensure the inclusion
of the latest developments in the field. The studies conducted during this time frame
are grounded in the research deficiencies recognized in previous years, suggesting that
Vibration 2024,71018
extending the review period would yield minimal information. In terms of the keywords
researched, a total of 14 keywords were generated through collaborative brainstorming
sessions, guided by the conceptual framework depicted in Figure 5.
Figure 5. Concept map of the term concepts researched.
In addition, two search queries were generated through the PICO search strategy and
synonym mapping, as detailed in Tables 2and 3.
Vibration 2024,71019
Table 2. First search string strategy.
PICO Element Corresponding
Element Synonym 1 Synonym 2 Synonym 3 Synonym 4
P (Problem) Rotating
machinery
Rotating
Equipment Turbines Motors Generators
I (Interven-
tion/Technology)
Vibration signal
analysis Vibration analysis Vibration signal
processing
Condition
monitoring
C (Compari-
son/Techniques)
Intelligent
solutions
Artificial
Intelligence Deep learning Intelligent
algorithms Machine learning
O (Outcome)
Diagnosis and
prognosis of
machinery
conditions
Fault diagnosis Failure prediction Prognosis Predictive
maintenance
Table 3. Second search string strategy.
PICO Element Corresponding Element Synonym 1 Synonym 2
P (Population/Problem) Rotating machinery Rotating equipment Bearing
I (Intervention/Technology) Vibration signal processing
C (Comparison/Techniques) Intelligent solutions Deep learning Machine learning
O (Outcome/Goal) Prognosis of machinery
conditions RUL prediction Failure prediction
The resulting search strings are shown in Table 4.
Table 4. Search queries.
Search Query 1 Search Query 2
(“rotating machinery” OR “rotating equipment” OR “turbines”
OR “motors” OR “generators”) AND (“vibration analysis” OR
“vibration monitoring” OR “vibration signal processing”) AND
(“fault diagnosis” OR “failure prediction” OR “prognosis” OR
“predictive maintenance”) AND (“artificial intelligence” OR
“machine learning” OR “intelligent algorithms” OR
“deep learning”)
(“rotating machinery” OR “rotating equipment” OR “bearing”)
AND (“vibration analysis” OR “vibration signal processing”)
AND (“RUL Prediction” OR “failure prediction”) AND
(“machine learning” OR “deep learning”)
The initial screening process was carried out independently by the reviewers, who
evaluated the titles and abstracts of the studies retrieved to determine their relevance to the
objectives of the review. Any discrepancies that arose were addressed through discussion,
leading to a consensus on which articles warranted further investigation. During this stage,
20 studies were identified as duplicates, while 21 studies were excluded due to their lack
of alignment with the research focus and 27 studies were excluded due to the absence of
explicit results. Consequently, 266 studies advanced to a second-level screening, where the
reviewers conducted a systematic analysis of each paper’s content, extracting pertinent
data in accordance with the structured format presented in Table 5.
Table 5. Data extraction format.
Article
Reference Type of Study
Preprocessing
Processing
Domain
Processing
Method
Post-
Processing Diagnosis Prognosis Experimental
Validation
[23]Prototype No
Time scale,
time, and
frequency
domains
CEEMDAN
decomposi-
tion, statistical
features, and
FFT
chi-square-
RFE method
SVM, ELM,
DBN, and
DNN No No
Vibration 2024,71020
In order to reduce bias, three independent reviewers utilized the CASP (Critical
Appraisal Skills Programme) checklists, with any discrepancies addressed by a fourth
reviewer. The results of the selection process are depicted in various figures, illustrating the
distribution of the studies according to type (Figure 6) and keywords researched (Figure 7).
Figure 6. Types of studies retrieved.
Vibration 2024,18
In order to reduce bias, three independent reviewers utilized the CASP (Critical
Appraisal Skills Programme) checklists, with any discrepancies addressed by a fourth
reviewer. The results of the selection process are depicted in various figures, illustrating the
distribution of the studies according to type (Figure 6) and keywords researched (Figure 7).
Figure 6. Types of studies retrieved.
Figure 7. Studies retrieved per keyword.
Figure 7. Studies retrieved per keyword.
Vibration 2024,71021
Additionally, an UpSet plot was created to visualize the distribution of the data across
the combinations of categories (Figure 8), complemented by a chronological chart that
illustrates the progression of research relevant to the review’s focus (Figure 9).
Figure 8. UpSet plot of the distribution of data across different categories.
Figure 9. Evolution of the studies retrieved.
Ultimately, the studies chosen were organized into a table and categorized by the
subjects they covered and their respective years of publication (Tables 6and 7).
Table 6. Study characterization.
Research Objective <2020 2020 2021 2022 2023 2024
Preprocessing [24–26] [27–41] [42–55] [56–67] [20,68–86]
Processing [24,26,87–96][27–33,36–41,97–
141][42–55,142–178]
[
23
,
56
–
62
,
64
–
67
,
179
–
214][20,68–75,78,79,81–
83,215–238][86,239–241]
Post-processing [24,26,94]
[27–29,38,40,41,98,
105,106,114,116,118,
126,129,130,134,136,
242,243]
[42–44,46,47,51,53,
144,145,147,154,157,
158,162,165,167,244]
[23,56,60,61,65,66,
183–185,188,198,208,
214,245]
[68,72,74,75,80–83,
216,225,227,246,247][86,239–241,248]
Vibration 2024,71022
Table 7. Study characterization of fault detection, fault prediction, and experimental validation.
Research
Objective <2020 2020 2021 2022 2023 2024
Diagnosis [24–26,91,94,95]
[28–31,33,34,36–
38,40,41,97–
100,104,106–
108,112,114–
116,118–
120,127,129,133–
136,139–
141,242,249–255]
[42–44,46,47,50–
55,142,144–147,
153,157,158,160–
166,172–
177,244,256–262]
[23,57,60–62,64–
67,180,182,183,
186–190,193,195,
196,199–207,212–
214,245,263–267]
[68,72,74–
79,216,217,221–
223,225,229–
233,236,237,268–
274]
[240,241,248,275]
Prognosis [25,90,91]
[27,39,100,109–
111,121,126,136,
138,243,254,276,
277]
[167,169–171,258,
259,278,279][56,63,184,185,
198,211,212,277]
[68,68,73,80,83–
85,220,233,235,
246,247][86,239,241]
expV [26,89–91]
[29,30,36,39,40,
99,101,112,113,
116,128,129,131,
140,141]
[42,45–
48,51,55,145,148,
155,162,166,172–
174,177,261]
[58,61,64,66,67,
199,200,202]
[78,80,215,217,
222,223,234,236,
237][240]
3. Results and Discussion
In the subsequent sections, the findings of the review pertaining to the main research
objectives, established in Table 1, are presented.
3.1. Signal Preprocessing
Signal analysis essentially relies on signals that are collected during data collection.
These are signals received in the acquisition phase in the form of a continuous time series
that cannot be processed or analyzed directly by digital systems. These analog signals are
discrete and show binary information that cannot be further processed numerically. On
account of this, preprocessing techniques ensure the conversion of a signal from the analog
continuous dimension into the discrete numerical dimension. The sampling technique
retrieves samples of a continuous amplitude at regular intervals, generating a sequence
of discrete time values. Through a low-pass filter and interpolation, the signal can be
reconstructed. However, the sampled signal is often contaminated by noise, which causes
distortion of the information content that was carried by the original signal. Techniques
such as filtering, adaptive filtering, and sparse representation aid in denoising the sampled
signal. This ensures the subsequent processes of signal interpretation and analysis are
based on valid and reliable data [280–282].
In this context, we evaluated the proportion of studies that incorporated a prepro-
cessing phase into their methodology. Our analysis revealed that among the 212 studies
examined, only 47 of them integrated a preprocessing phase, accounting for 15.41% of the
total studies. Although this percentage may not provide a comprehensive overview of the
research practices in this domain, it is of significant importance, as the preprocessing phase
is the basis upon which the evaluation of vibration signals from rotating machinery is built.
The lack of signal processing results in information loss and erroneous diagnosis.
3.2. Signal Processing
Analysis of the data contained in a signal is conducted through signal processing
techniques. These techniques examine the signal in four domains: time, frequency, time
frequency, and time scale. Each domain offers different aspects of information (Figure 10).
Vibration 2024,71023
Figure 10. Signal processing methods.
In this part of the research, we review the techniques applied in signal processing for
vibration signals and EEG signals. The heatmap in Figure 11 illustrates the distribution of
the various signal processing techniques examined throughout the studied time period.
Figure 11. Heatmap of signal processing techniques and their use from 2020 to 2023 in the se-
lected studies.
The reason we extended the research to the analysis of EEG signals is that they share
many characteristics with vibration signals; for instance, both are transient and non-linear
and contain multiscale temporal information. Consequently, research advances in EEG
signal processing can benefit researchers in the field of vibration signals.
Vibration 2024,71024
3.2.1. Time Domain Analysis
Time domain analysis revolves around the temporal characteristics of a signal, such as
amplitude and phase. Statistical metrics such as the RMS, variance, standard deviation,
peak-to-peak, crest factor, impulse factor, form factor, shape factor, clearance factor, kurtosis,
skewness, and high-order statistics are frequently used to assess the temporal evolution of
transient signals, mainly vibration signals [
23
,
27
–
30
,
44
,
46
,
53
,
69
,
73
,
83
,
102
,
105
,
111
,
113
,
142
,
144,146,167,183,185,203,204,217,231,233].
In [
20
], the authors privileged filter-based methods and stochastic and advanced
analytical techniques for feature extraction from vibration data, while the authors in [
142
]
used Detrended Fluctuation Analysis (DFA).
The Hilbert transform, used in the studies referenced as [
87
,
97
–
101
,
215
], offered in-
sights for the detection of rolling bearing faults, as well as wind turbine faults.
Several studies have employed different techniques, namely inter-channel correla-
tion [
102
], time synchronous averaging [
42
], the time domain strain-life method [
179
],
time-varying variance, time-varying kurtosis, the time-varying Kolmogorov–Smirnov test
and autocorrelation [143], and the Higuchi fractal dimension (HFD) [43,216].
In [
68
], a combination of time domain features were extracted through an autoencoder
neural network for fault classification in bearings and estimating their remaining useful
life. In another work [
180
], periodic pulse information was extracted using the kurtosis
of the unbiased autocorrelation of the squared envelope of a demodulated signal for the
same purpose.
3.2.2. Frequency Domain Analysis
The inspection of a signal in the frequency domain involves examining its frequency
components instead of its time-based characteristics. This approach allows us to analyze
how fast the signal changes. In the following parts, the frequency domain techniques that
were used in the studies reviewed are presented.
• The Fast Fourier Transform (FFT):
The Fast Fourier Transform (FFT) is a mathematical tool and algorithm that decom-
poses a signal into a combination of sinusoidal functions possessing different fre-
quencies. This method is a classical signal processing technique with prominence
in identifying the generic frequency characteristics of signals. The studies refer-
enced as [
20
,
23
,
27
–
30
,
44
–
46
,
69
,
88
,
99
,
103
–
108
,
110
–
112
,
143
–
146
,
179
–
185
,
216
–
220
] lever-
aged the advantages of the FFT for different applications, namely for rotating ma-
chinery diagnosis, predicting the remaining useful life (RUL) of bearings, and the
maintenance of electrical machines.
Nevertheless, the assumptions on which the FFT is founded give rise to several restric-
tions. First, it presumes that all the signals to be analyzed are continuous. The signals,
however, are mostly discrete and sampled after undergoing a preprocessing phase.
Another assumption in the Fourier transform is that the signal should be stationary.
That is, its statistical properties remain constant with time. Again, these assumptions
are often violated by real-life signals, which show non-stationary behavior. Further-
more, the FFT considers its input signal as periodic and linear. While this may be
true for some signals, in most cases, it does not really explain the complexity of most
real-world signals.
• Power spectral density (PSD):
Power spectral density (PSD) refers to the representation of the amount of power that
exists at each frequency band of a signal [
283
]. It is suitable for analyzing signals
whose energy is spread over a range of frequencies rather than being concentrated
within a few frequencies. The area under the PSD curve over a frequency range gives
the total power of the signal in that range. In the papers cited as [
88
,
113
,
142
], PSD was
useful for the analysis of transient signals such as EEG and speech.
• Cepstrum:
Vibration 2024,71025
The cepstrum is a signal processing technique used when the frequency-based infor-
mation of the signal, like the harmonics, needs to be examined separately. It is obtained
by taking the inverse Fourier transform of the logarithm of the Fourier transform of
a signal. On account of this, the signal is studied in the “quefrency” domain, where
the quefrency denotes delays or periodicity in the original signal. This technique was
used in the works of [
100
,
142
,
147
,
148
] to extract Mel frequency cepstral coefficients
(MFCCs) as characteristics for the evaluation of signal health.
• Envelope spectrum analysis:
Envelope spectrum analysis (ESA) is a highly effective method for detecting modulat-
ing patterns within a signal. By applying a Fourier transform to the envelope signal,
a smoothed and time-variant version of the signal’s amplitude is obtained. In prac-
tical applications, envelope spectrum analysis, including spectral kurtosis analysis,
has been found to be more advantageous than traditional raw vibration analysis for
early-stage fault detection and anomaly identification. This is because many types
of machinery faults and defects manifest as fluctuations in amplitude in vibration
signals. Several studies, as referenced in [
47
,
56
,
89
,
90
,
97
,
98
,
149
,
221
], have leveraged
ESA for fault diagnosis and prediction in bearings.
• Other:
In the study [
57
], Singular Spectrum Analysis (SSA) was used for decomposition of a
signal into its fundamental parts to investigate trends, oscillations, and noise for the
diagnosis of bearing faults while the study [
106
] used the Fractional Fourier Transform
to extract the properties of vibration signals.
Another study [
27
] used the Butterworth filter to refine the EEG signal by removing
unwanted frequencies and thereby predicted epileptic seizures. Moreover, the im-
plementation of modulation signal bispectrum analysis (MSB) in [
91
] yielded precise
information regarding the modulation properties of the signal for gear monitoring.
3.2.3. Time–Frequency Analysis
Frequency domain analysis is certainly useful, but it has its limits. It can be susceptible
to noise and interference, leading to inaccurate analysis. Furthermore, it is not well suited
to transient signals since it analyzes the signal during a steady-state response of the system.
Much of frequency domain theory assumes that the signal is stationary; in other words, its
statistical properties do not change over time. However, this assumption will not hold for
other signals that vary over time, such as vibration signals. Here, the use of time–frequency
analysis techniques provides more insights into the varying metrics of a transient signal.
These methods are based on the concept of time–frequency distribution (TFD), where
an estimate of the amount of energy in a signal is calculated from the signal under inspection
and its complex conjugate. The TFD visualizes the evolution of the frequency content of a
signal over time.
• The Short-Time Fourier Transform (STFT):
The Short-Time Fourier Transform (STFT) is a mathematical technique that operates
by dividing a longer time signal into shorter segments of an equal length and then
computing the Fourier transform separately for each short segment, expressing the
variation in the signal frequency in that segment over time [
284
]. This process cap-
tures both the temporal and frequency information, providing a three-dimensional
representation of the signal. The STFT is recognized for its straightforwardness and
clear physical explanation, which validates its application in the research works cited
as [29,31,92,99,100,111,112,114,115,147,182,187,218,222,223].
• The S-transform:
The S-transform, an extension of the Short-Time Fourier Transform (STFT), addresses
certain limitations of the STFT, including the cross-term problem. It is obtained by
convolving the FFT transformed signal with a Gaussian window function. Researchers
used the S-transform in [
106
,
189
], respectively, for fault diagnosis in rotating machin-
ery and for the classification of high-frequency oscillations in intracranial EEG signals.
Vibration 2024,71026
Nevertheless, the S-transform may pose computational challenges, particularly with
extensive datasets, and may exhibit redundancy by offering excessive information.
• Wavelets in the time–frequency domain:
Signal transformation using wavelets is considered a versatile tool that can be adapted
to each of the analysis domains. For adaptive time–frequency analysis of non-
stationary signals, the wavelet transform (WT) decomposes the signal of interest into
a set of basic localized waveforms called wavelets. A signal is analyzed by examining
the coefficients of its wavelets [285]. In the time–frequency domain, wavelet analysis
reveals the frequency components of signals, just like the Fourier transform, but it also
identifies where a certain frequency exists in the temporal or spatial domain. Addi-
tionally, wavelet transforms have the capability to compress or denoise a signal with
minimal loss in quality. Numerous studies [
27
,
88
,
105
,
106
,
142
,
150
,
188
,
215
,
224
,
286
]
have utilized the wavelet transform as a valuable tool to characterize their signals and
obtain high-quality representations in the time–frequency domain.
• The Wigner–Ville distribution (WVD):
In contrast to linear representations, the Wigner–Ville distribution (WVD) is a quadratic
time–frequency distribution with broad applications in signal processing and spectral
analysis. It is defined as the expected value of the product of two versions of a signal
that are shifted in time and frequency [
287
]. Belonging to the Cohen class of time–
frequency distributions, the Wigner–Ville distribution offers advantageous properties,
such as marginal distribution and localization in the time–frequency domain [
288
].
Although the WVD is functionally similar to a spectrogram, it outperforms it in terms
of its temporal and frequency resolutions. This distinction resides in the principle of
uncertainty in the time–frequency distribution that is not applicable to the bilinear
Wigner–Ville transform, as it is not based on segmentation [
289
]. This method was
used by the studies referenced as [
32
,
100
,
225
] for fault detection in wind turbine in-
duction generators, as well as the classification of episodic memory and the prediction
of heart disease in medical research.
• The Hilbert–Huang Transform (HHT):
The Hilbert–Huang Transform (HHT) is a valuable signal processing method that lever-
ages the advantages of Empirical Mode Decomposition (EMD), which will be discussed
further in the time scale domain section, and Hilbert Spectral Analysis (HSA). The signal
is decomposed into a limited set of intrinsic mode functions (IMFs), along with a trend
component, through EMD. Hilbert Spectral Analysis (HSA) is applied to each IMF to
determine the instantaneous frequency and amplitude [
290
]. Noteworthy studies that
have utilized this method include references [88,101,151,218,226].
• The Gabor transform:
The Gabor transform is a mathematical operation that examines the sinusoidal fre-
quency and phase characteristics of a signal across time. By employing a Gaussian win-
dow function, it is able to analyze the signal in both the time and frequency domains
concurrently, incorporating shifting, modulation, and power integration [291,292].
The studies referenced as [
88
,
116
,
152
], respectively, used this method for seizure
detection, fault diagnosis in bearings, and distinguishing heart sounds. However,
despite its numerous benefits, this method is not suitable for transient events, as it is
specifically tailored to stationary signals.
3.2.4. Time Scale Analysis
In the time–frequency domain, signals are analyzed in terms of their frequency content
over time, showing how the signal’s energy is distributed across different frequencies.
Traditional time–frequency methods suffer from a limitation in that the time–frequency
resolution is restricted by the selection of the window or wavelet, which can impact the
adaptivity of the analysis and the readability of the time–frequency representation. This
constraint arises from the Heisenberg–Gabor uncertainty principle, where a small temporal
window, associated with good time localization, leads to poor frequency resolution, and
Vibration 2024,71027
vice versa. Time scale techniques, such as those used for frequency super-resolution
and intrinsic time scale decomposition, offer solutions to these limitations by providing
enhanced adaptivity and resolution in the time–frequency domain, thereby allowing for
more precise analysis of signals that are non-stationary and that contain multi-temporal
scale information. They are significantly robust to noise and interference and provide
an accurate representation of a signal’s frequency content over time when coupled with
time–frequency distribution (TFD) techniques.
• Wavelets and the time scale domain:
Time scale analysis is commonly linked with the wavelet transform. Depending on
the choice of the mother wavelet, the signal is viewed across different scales. The
Continuous Wavelet Transform (CWT) and the Discrete Wavelet Transform (DWT) are
two commonly used techniques in wavelet analysis. The CWT provides a continuous
representation of a signal in the time scale domain, while the DWT offers a discrete
representation that is often preferred in practice due to its computational efficiency. The
studies referenced as [
45
,
59
,
71
,
89
,
100
,
102
,
117
–
120
,
143
,
151
–
156
,
187
,
192
,
193
,
215
,
218
,
227
]
used these wavelet transforms to analyze wind turbine signals and diagnose faults in
rolling bearings, as well as extract features from EEG signals.
Advanced extensions of the wavelet transform provide a richer representation of
signals by decomposing both their low-frequency (approximation) and high-frequency
(detail) components at each level of the transform. In contrast, the standard wavelet
transform only decomposes the approximation (low-frequency) part of the signal at
each level.
In contrast to the standard wavelet transform, the Wavelet Packet Transform (WPT)
decomposes both low-frequency and high-frequency components at each level of the
transform, thereby providing a richer representation of signals. This method was
employed for fault diagnosis and prognosis in rolling bearings [
121
,
157
,
238
], as well
as detecting drowsiness on the basis of EEG signals [43].
On another hand, the Empirical Wavelet Transform (EWT) adaptively decomposes a
signal into different frequency bands taking into account its specific spectral content.
The Fourier spectrum of the signal is divided into multiple sub-bands depending
on where significant transitions occur, thus defining the boundaries between the
different frequency components. A corresponding wavelet filter is constructed for
each segmented sub-band. Ultimately, the signal is decomposed using empirical
wavelet filters, which results in a set of wavelet coefficients that represent the different
frequency bands. This method demonstrated its effectiveness in the studies referenced
as [
60
,
148
,
159
] for fault diagnosis in planetary gearboxes and for seizure detection
from EEG signals.
Additionally, other wavelet-based techniques, such as dyadic and binary-tree wavelet
filters [
92
], the second-order synchroextracting wavelet transform [
48
], Modified
Continuous Wavelet Decomposition (MCKD) [
56
], and Grossmann–Morlet time scale
wavelets, have also shown promising results in extracting relevant time scale features.
• Cyclostationarity:
Cyclostationarity is a fundamental concept in signal processing that pertains to periodic
fluctuations in the statistical characteristics of a signal. Techniques for cyclostationarity
analysis serve as robust methods for identifying and understanding cyclostationary
signals, which display periodic statistical behaviors. They rely on the identification
of frequency shifts to identify periodic patterns in signals that are referred to as cyclic
frequencies. These methods are especially beneficial in scenarios like fault diagnosis in
bearings, aiding in the detection and examination of cyclostationary patterns in signals
associated with equipment malfunctions [49,112,114,122,123,143,149,181,194,226,228].
Within cyclostationarity analysis, cyclic spectral coherence (CSCoh) is used as a sta-
tistical metric to assess the second-order cyclostationarity of signals. It measures
the linear correlation between two signals in the frequency domain, enabling the
detection of cycle frequencies in diverse datasets and the identification of significant
Vibration 2024,71028
cycle frequencies in signals with cyclostationary attributes. Researchers delving into
the realm of cyclostationarity have utilized CSCoh to pinpoint key cycle frequencies
within signals, as noted in studies [
42
,
106
,
124
,
195
,
196
]. This analysis has led to a more
profound comprehension of the cyclostationary nature of these signals.
The CSCoh function is closely connected to the cyclic spectral correlation function
(CSC), which acts as a cross-correlation function. This metric reveals the similarity
between a spectrum and its adjacent spectra, shedding light on how spectra vary
across positions, a point highlighted in the study [
124
]. Furthermore, a separate study
outlined in reference [
72
] introduced the Cyclic Spectral Covariance Matrix (CSCM) as
a tool to glean insights into the cyclostationary characteristics of transient signals.
• Adaptive decomposition techniques in the time scale domain:
Adaptive decomposition techniques in signal processing are useful in complex envi-
ronments with multiple sources operating on similar spectrum segments [
293
]. Their
particularity resides in their ability to automatically adjust to the input signal’s char-
acteristics, offering a more flexible and effective approach compared to traditional
decomposition methods. For instance, Empirical Mode Decomposition (EMD) iden-
tifies intrinsic modes, and Varional Mode Decomposition (VMD) separates modes
variationally, with each having distinct advantages and limitations [293].
Empirical Mode Decomposition (EMD), being a data-driven method, decomposes
a signal into a set of intrinsic mode functions (IMFs) based on its local characteris-
tics. It is particularly suited to analyzing non-linear and non-stationary signals, such
as vibration data, because it does not require a linear or stationary base as Fourier
or wavelet transforms do. The flexibility of EMD in handling signals with time-
varying frequencies and amplitudes has allowed researchers to effectively identify
the different oscillatory modes of signals in various studies, namely vibration sig-
nals [
30
,
56
,
59
,
73
,
74
,
87
,
93
,
94
,
101
,
125
,
126
,
142
,
160
,
188
,
192
,
197
,
198
,
215
,
225
]. Moreover,
EMD-based methods such as Ensemble Empirical Mode Decomposition (EEMD) [
116
],
Complete Ensemble Empirical Mode Decomposition (CEEMD) [
199
], and Noise-
Assisted MEMD (NAMEMD) [
128
] have been shown to improve the accuracy of fault
diagnosis in mechanical systems by isolating and localizing faults better. However,
EMD is an empirical method that lacks a solid mathematical background.
Variational mode decomposition (VMD) is commonly applied in the analysis of vi-
bration signals due to its ability to adaptively and non-recursively segregate non-
stationary signals into their fundamental modes [
294
] and mitigate mode mixing [
295
],
a prevalent challenge encountered in other decomposition methodologies such as
Empirical Mode Decomposition (EMD). Furthermore, VMD formulates the decompo-
sition problem as a variational optimization problem. It extracts band-limited intrinsic
mode functions (IMFs) adaptively by optimizing the objective functions related to the
signal’s frequency content [296].
The studies referenced as [
61
,
129
,
130
] used this method to perform a time scale analysis
of vibration signals and ultimately assess the health of rotating machinery. Other
studies have used other VMD-based methods, such as adaptive variational mode
decomposition in [
131
] and Recursive Variational Mode Extraction (RVME) [
200
], to
diagnose faults in rolling bearings.
In another study [
132
], VMD was coupled with other decomposition techniques, in-
cluding EMD, local mean decomposition, local characteristic scale decomposition,
Hilbert vibration decomposition, the EWT, and adaptive local iterative filtering, to an-
alyze non-stationary signals from rotating machinery. Consequently, time–frequency
representations (TFRs) with no interference of the cross-terms and auto-terms and a
fine resolution were obtained.
In the same context of decomposition techniques, the authors in [
50
] proposed a
novel feature adaptive extraction method for time scale analysis consisting of a slope
and threshold adaptive activation function with the tanh function (STAC-tanh) for
Vibration 2024,71029
diagnosing bearing faults. Additionally, in [
178
], the authors used Adaptive Periodic
Mode Decomposition for the same purpose.
Finally, multivariate variational mode decomposition (MVMD) was used in [
297
] to
recognize human emotions in EEG signals, and enhanced symplectic geometry mode
decomposition (ESGMD) was used in [
186
] to diagnose faults in rotating machinery
under variable speed conditions.
3.2.5. Other Approaches to Signal Processing
While signal processing methods are frequently used for feature extraction, other
studies have leveraged AI algorithms to this end. These AI algorithms include a kernel
extreme-learning-based multi-layer perceptron (KExL MLP) [
201
], convolutional neural net-
works (CNNs) [
51
,
75
,
161
,
162
,
229
], deep neural network (DNNs) [
163
], and fault-oriented
support vector machines (FO-SVMs) [
52
]. In [
95
], a hybrid genetic algorithm (GA) and a
deep belief network (DBN) coupled with Particle Swarm Optimization (PSO) were em-
ployed to diagnose bearing faults and their severity. Additionally, some studies have
opted to use image processing and texture analysis to process signals. For instance, one
study [
164
] considered the initial signal as a two-dimensional image instead of a one-
dimensional time series, while another study [
33
] converted it into a gray-scale image.
Local Binary Patterns (LBPs) were used in another study [
133
] for texture-based feature
extraction. In another study [
202
], snowflake-like symmetric images were generated from
the signals using the Symmetrical Dot Pattern (SDP) technique, which served as feature
patterns for diagnosing bearing faults. Locally stationary processes (LSPs) were employed
in the study [
32
] to assess the time-varying characteristics of non-stationary signals. Sparse
Regularity Tensor Train (SR-TT) decomposition was utilized in another study [
230
] to
analyze high-dimensional EEG data. Finally, geometrical features, as well as chaotic and
fractal dimensions, were extracted in [165,203] to detect health anomalies.
3.3. Signal Post-Processing
The concept of signal post-processing refers to the process of refining the data that
are intended to be utilized by machine learning algorithms for classification purposes. To
effectively carry out this process, signal post-processing techniques can be classified into
three distinct categories—feature selection techniques, feature fusion techniques, and data
augmentation techniques—as shown in Figure 12. Feature selection techniques specifically
aim to identify the most informative features from a set of features that have been extracted
during the processing phase, while feature fusion techniques focus on structuring and
combining the selected set of features into the optimal format that can be easily analyzed
by the machine learning algorithm. On the other hand, data augmentation techniques
are employed to expand and diversify the learning process when the original dataset is
limited in size. These post-processing steps facilitate the transition from signal processing
to machine learning, as they not only contribute to a reduction in computational costs but
also enhance the predictive accuracy.
3.3.1. Feature Selection
Feature selection aims to eliminate unnecessary or repetitive attributes, which can de-
crease overfitting and improve generalization. A variety of techniques can be implemented
for this purpose, ranging from machine learning algorithms and correlation analysis to
dimensionality reduction methods. The feature selection algorithms used in the studies
selected in our review are presented in Table 8. The correlation analysis methods and
dimensionality reduction methods are shown in Table 9and Table 10, respectively.
Vibration 2024,71030
Figure 12. Signal post-processing methods.
Table 8. Feature selection algorithms.
Feature Selection Approach Algorithm Studies
Fuzzy-Logic- and Kernel-Based Fuzzy Logic Embedded RBF-Kernel-Based ELM (FRBFELM) [60]
Neural-Network- and CNN-Based Convolutional Neural Networks (CNNs) [29,51,116]
Heuristic Search Methods Gray Wolf Optimizer [147]
Cuckoo Search (CS) [56]
Monotony Evaluation [86,167]
Adaptive Multiscale Convolutions Stacked Residual Adaptive Multiscale Convolution (Res AM) Blocks [75]
Multiscale Convolutional Strategy [162]
Genetic Algorthms Genetic Algorithms (GAs) [41,94,134]
Sequential and Recursive Selection Sequential Forward Floating Selection (SFFS) [27]
Recursive Feature Elimination (RFE) [23,188,225]
Sequential Backward Feature Selection (SBFS) [44]
Optimization-Based Approaches Expectation Selection Maximization (ESM) [286]
Correlation-Based Feature Selection (CFS) [157]
Discriminant Regularizer with Gradient Descent [145]
Vibration 2024,71031
Table 9. Correlation analysis methods for feature selection.
Feature Selection Approach Method Studies
Statistical Correlation Analysis Statistical Correlation [42,53,68,80]
Pearson’s Correlation [198,246]
Connection Weights and Fisher’s Criterion Connection Weights [43,245]
Fisher’s Criterion [129]
Correlation Coefficients Correlation Coefficients Calculated for Intrinsic Mode Functions (IMFs) [126]
Correlation Metrics [65,86]
Advanced Correlation Techniques
Gray Relation Analysis (GRA) [185]
Differential Evolution [118]
Discriminating Capability [165]
Frequency Spectrum Averaging [47]
Acceleration Responses and Wavelet Scalograms [227]
Novel Techniques Frequency Band Entropy (FBE) and Envelope Power Spectrum Analysis
for Selecting the Optimal Intrinsic Mode Functions (IMFs) [130]
Table 10. Dimensionality reduction methods for feature selection.
Feature Selection Approach Method Studies
PCA Variants Principal Component Analysis (PCA) [24,28,40,74,86,214,216]
K-Principal Component Analysis (K-PCA) [28,60,136]
Weighted Principal Component Analysis (WPCA) [286]
Singular Value Decomposition Multi-Weight Singular Value Decomposition (MWSVD) [157]
Singular Value Decomposition (SVD) [68,136]
Neighborhood Component Analysis Neighborhood Component Analysis (NCA) [183]
Non-Linear Dimensionality Reduction T-Distributed Stochastic Neighbor Embedding (t-SNE) [105]
3.3.2. Data Augmentation
When the data availability is limited, data augmentation techniques enable the ex-
tension of the size and diversity of the dataset. Such techniques involve creating more
data from existing data by applying different transformations. The significance of data
augmentation is particularly evident for signal data, which are typically high-dimensional
and scarce, posing challenges for training accurate classification models.
Our research indicates that data augmentation is carried out using various methods,
including Gaussian white noise [
106
], one-dimensional deep convolutional generative
adversarial networks (1D-DCGANs) [
244
], cubic B-spline interpolation algorithms [
167
],
and variational autoencoders [
243
]. Furthermore, a study referenced as [
242
] combined
sample-based and dataset-based approaches, incorporating techniques such as using ad-
ditional Gaussian noise, masking noise, signal translation, amplitude shifting, and time
stretching to improve the enhancement process.
3.3.3. Feature Fusion
Feature fusion techniques improve the discriminative power of the classification mod-
els in signal data classification by combining information from different feature extraction
methods. This integration of diverse features allows the model to create more robust
representations that can accurately distinguish between various classes or categories in
the dataset. In [
46
], the researchers explored different combinations of fused features to
identify the most appropriate set, while the studies [
184
,
246
] focused on standardizing the
features prior to classification. For the same purpose, the authors in [
167
] used an SAE
(stacked autoencoder).
In contrast to the methodology in [
46
], ref. [
82
] performed feature fusion before feature
reduction by merging the time–frequency content from individual channels with deep
features extracted separately using a convolutional neural network (CNN). Furthermore,
in [
81
], a multi-feature fusion network (MFFNet) was introduced after feature extraction to
enhance the effectiveness of the model training.
Vibration 2024,71032
3.4. Diagnosis
Recent advancements in Artificial Intelligence (AI) have enabled the development of a
wide variety of classification algorithms, thereby facilitating the detection of component
defects in rotating machinery. They are broadly classified into four categories based on their
learning approach: classical learning, ensemble learning, reinforcement learning, and deep
learning [
298
]. Classical learning techniques are categorized into supervised or unsuper-
vised learning algorithms. Through supervised learning, the input data are mapped to the
output data with labels, while unsupervised learning infers hidden patterns or structures
in the data without labeling the output [
299
]. The ensemble learning approach is built
upon meta-learning and aims to combine the strengths of certain base learners in deriving
a more robust, accurate predictive model by aggregating their predictions. Meanwhile,
the reinforcement learning (RL) approach aims to train software to make decisions that
provide the maximum reward signal possible from a specific environment [
299
]. Finally,
deep learning (DL) algorithms are developed using artificial neural networks, which are
designed to imitate the composition and operations of the human brain. The standard
forms of such models are multiple layers. Every layer in the neural network transforms the
input inside it non-linearly with the purpose of understanding the complicated patterns
or relationships in the data [
298
]. The different algorithms in each category are shown in
Figure 13.
Figure 13. Machine learning methods.
The analysis of the studies we collected in this review shows that certain algorithms
are privileged for the detection of industrial anomalies through vibration signals, as shown
in Figure 14.
Support vector machines (SVMs), convolutional neural networks (CNNs), Long-Short
Term Memory (LSTM), k-nearest neighbors (kNN), and deep belief networks (DBNs),
respectively, outrank other machine learning algorithms in terms of their use for intelligent
diagnoses in rotating machinery. In the subsequent sections, the most relevant studies that
used these algorithms are discussed.
Vibration 2024,71033
Figure 14. Frequency of use of machine learning algorithms in the studies reviewed.
3.4.1. Support Vector Machines
A support vector machine (SVM) is a supervised learning algorithm used for classifi-
cation and regression tasks. It aims to find the optimal hyperplane in an N-dimensional
space to separate different classes and maximize the margin between the closest points in
different classes [300].
In [
23
], a comparison of the performance of different fault identification models (an
SVM, an ELM, a DBN, and a DNN) was performed for the diagnosis of faults in wind
turbine bearings. The SVM model was composed of two layers; features with large chi-
square value were the inputs to the algorithm, and their labels were the output. The weights
of the features were calculated, and the features with the least weight were removed. The
SVM classifier achieved an 99.5% accuracy with the time domain features that exceeded
that of the ELM and DBN classifiers; however, the DNN classifier ranked first. Using
the frequency domain features, the SVM and DNN algorithms ranked first in terms of
their accuracy.
The survey in [
257
] shows that SVMs are quite useful for analyzing failures in bearings,
as they provide accurate results and can handle complex data distributions. However, they
may have shortcomings in terms of probabilistic findings.
The study in [
157
] used a support vector machine (SVM) with a Gaussian kernel
function to convert inseparable data points from a low-dimensional space into a high-
dimensional space. A genetic algorithm determined the optimal parameters for the SVM
within the training set following a five-fold cross-validation process. The SVM model
conducted fault identification on the basis of a feature matrix extracted through the Wavelet
Packet Transform (WPT) and Multi-Weight Singular Value Decomposition (MWSVD). WPT-
MWSVD+SVM outperformed the other methods with a penalty parameter of 25.15 and a
kernel parameter of 212.47. This study concluded that an SVM coupled with the proposed
feature extraction methods was proficient in diagnosing inner race and outer race faults.
In [
183
], the inputs to the SVM algorithm were feature vectors comprising various
statistical features. Instead of a conventional hidden layer, the algorithm employed a
radial basis function (RBF) as a kernel function to convert the input data into a higher-
dimensional space. The output data were represented by a single neuron that generated a
binary classification (0 or 1) based on whether the input data belonged to a specific class of
either faulty or non-faulty bearings. Comparisons were made with KNN, ANN, and Naive
Bayes classifiers, resulting in an average accuracy rate of 86.11% for the SVM algorithm.
In contrast, the KNN, ANN, and NB classifiers achieved accuracy rates of 95.37%, 92.59%,
and 83.33%, respectively.
Vibration 2024,71034
In the study [
253
], a comparison was made between an SVM and other models, such
as a CNN, a CNN + MMD (Maximum Mean Discrepancy), a CNN + CMMD (Conditional
Maximum Mean Discrepancy), an MDDAN (Multi-Scale Deep Domain Adaptive Network),
a DIAN (Deep Intra-class Adaptation Network), and an MDIAN (Multi-Scale Deep Intra-
Class Adaptive Network), for diagnosing bearing faults. This study revealed that the SVM,
as a traditional machine learning method, struggles to effectively handle the difference in
the distribution between the source and target domains, which is crucial in transfer learning
tasks. The SVM assumes that the source and target domains share the same distribution,
resulting in its inferior performance compared to that of more advanced models like the
MDIAN, which outperformed all the other methods in this study.
3.4.2. Convolutional Neural Networks (CNNs)
Convolutional neural networks (CNNs) in signal processing are specialized neural
network architectures designed to process and analyze signals such as audio, speech, and
time series data. In this context, CNNs apply convolutional operations to extract features
from signals, enabling tasks like classification, denoising, and pattern recognition. By
leveraging filters and pooling layers, CNNs can capture the temporal dependencies and
spatial patterns within signals, making them effective tools for various signal processing
applications. The hierarchical feature enhancement methods employed in CNNs allow for
the effective conversion of feature signals into two-dimensional spaces, enhancing these
networks’ ability to process transient signals efficiently. This justifies their wide use in
diagnosing faults in rotating machinery [301].
In [
242
], two-dimensional spectrograms served as the input data for the CNN devel-
oped. To simulate practical noise levels, white Gaussian noise was introduced into the
raw signals at SNRs of 4 dB and 0 dB. The integrated CNN approach demonstrated an im-
pressive average accuracy of 99.02%, surpassing the performance of other time–frequency
analysis methods such as the STFT and WT across different working conditions. The model
was compared with a basic neural network (NN), a recurrent neural network (RNN), and
an SVM. The NN algorithm was less efficient in capturing the data information compared
with the CNN and the RNN, while the testing accuracies achieved by the SVM method
were not very promising either.
In [
268
], a review of the application of deep learning to intelligent fault diagnosis for
rotating machinery affirmed that CNNs have strong data compatibility, a strong feature
extraction ability, fewer model parameters than fully connected networks, and flexible and
changeable structures. However, there is a problem of information loss, and the quality of
the extracted features is affected, along with a significantly long training time.
The research in [
244
] explored the utilization of a one-dimensional convolutional
neural network (1D-CNN) for diagnosing faults in rotating machinery. The 1D-CNN
architecture in this study was composed of an input layer, a feature extraction layer, and
a classification layer. The feature extraction layer was made up of three convolutional
layers and three pooling layers, which extracted features from the original vibration signal
and reduced the dimensionality of the feature vector. The classification layer consisted of
two fully connected layers, with the second fully connected layer having the same number
of neurons as the fault labels for classification. This study utilized a Softmax regression
classifier for output classification. By comparing the performance of the proposed method
with that of two other methods (Markov chain and a variational autoencoder (VAE)),
the results demonstrated that the method proposed surpassed the two other methods,
achieving a higher accuracy rate in fault identification.
The authors in [
256
] proposed the use of ensemble adaptive convolutional neural
networks (ECNNs) composed of ten individual CNNs with different properties. Each CNN
consisted of convolutional layers for feature extraction; Batch Normalization (BN) layers to
normalize the activations of the convolutional layers; pooling layers to downsample the
output of the convolutional layers through max pooling and average pooling layers; and
fully connected layers for classification. This research employed a range of optimization
Vibration 2024,71035
algorithms, such as stochastic gradient descent (SGD), RMSProp, Adgrad, Adadelta, and
Adam. To excel in diagnostic tasks, deep neural networks (DNNs) with multiple hidden
layers are used. However, it should be noted that incorporating more hidden layers may re-
sult in decreased computational efficiency. In order to enhance the stability during the later
stages of training, an adaptive learning rate algorithm called EDLR was employed, which
gradually decreased the learning rate as the iteration progressed. Additionally, parameter
transfer was employed in this study to minimize the training time. This technique involved
pre-training one model and then exploiting its parameters as the initial parameters for the
other models. A comparison with other existing methods, such as AdaBoost, random forest
(RF), and Ensemble Deep Autoencoders (EDAEs), concluded that the ECNNs and EDAEs
achieved a 100% precision rate and 94% and 98% F-1 scores, respectively.
In [
33
], the Deep Fully Convolutional Neural Network (DFCNN) designed contained
several layers, with each one fulfilling specific functions. The convolutional layers executed
convolution operations on the local regions of the input signals through the use of convolu-
tion kernels and extracted the signal characteristics. The convolution window weights were
the same and were not modified when sliding over the entire image; therefore, overfitting
was eliminated and the memory requirements were minimized in training. The activation
layers non-linearly remapped each value of the output of the convolution to help the CNN
converge. The implementation used the activation function Leaky ReLU. The additional
Batch Normalization (BN) layers helped decrease the internal covariance shift, which ex-
pedited the training process, enhanced the network efficiency, and further increased the
generalization ability. Finally, the pooling layers performed downsampling operations to
reduce the parameters of the neural network. A
6×6
input feature window was considered
in this study and pooled to the largest layer, 3
×
3 in size, of output features using a pooling
operation of 2
×
2 with a step of 2. To benchmark the DFCNN, the performance of the
method was compared with that of various other methods: among others, a support vector
machine (SVM), a multi-layer perceptron (MLP), and a deep belief network (DBN). The
results showcased that the SVM, MLP, and DBN had very little adaptability, with average
accuracies of 66.6%, 75.9%, and 75.7% across the six cases. In the meantime, the DFCNN
method portrayed the best accuracy among all methods, with an average accuracy of 90.5%.
As discussed in the SVM section, the study in [
253
] developed a transfer learning
model for bearing fault diagnosis based on a Multi-Scale Deep Intra-Class Transfer Learning
(MDIAN) approach and compared it to different machine learning algorithms. This study
used a CNN model that included a modified ResNet-50, a multiple-scale feature extractor,
and a classifier. Originally intended for low-level feature extraction, ResNet-50 was adjusted
by eliminating its last two layers and replacing them with the multiple-scale feature
extractor. The extractor captured high-level features from the low-level features provided
by ResNet-50. The classifier based its fault diagnosis on the high-level features. The
proposed CNN algorithm surpassed traditional machine learning approaches such as an
SVM and basic CNN models, demonstrating enhanced accuracy and efficiency in reducing
the distribution gap between the source and target domains. However, the proposed CNN
network was still outranked by an MDIAN, which is particularly effective in identifying
faults in rollers and outer rings.
A review conducted in [
260
] covered four traditional types of deep learning models,
deep belief networks (DBNs), autoencoders (AEs), convolutional neural networks (CNNs),
and recurrent neural networks (RNNs), and their use in the detection of motor faults. It
highlighted that CNNs offer tremendous mass data processing capabilities, as well as local
perception, shared weights, and spatial or temporal downsampling, all of which help to
lower the number of network parameters and avoid network overfitting.
3.4.3. Long Short-Term Memory (LSTM)
LSTM is a type of recurrent neural network (RNN) designed to handle the chal-
lenges in processing sequential data in signal processing. LSTM is particularly effective
in capturing the long-term dependencies and temporal relationships within signals. It
Vibration 2024,71036
is designed to address the vanishing gradient problem present in traditional RNNs. It
offers robust classification capabilities, especially when combined with time–frequency and
time–space properties.
The study conducted in [
107
] used an LSTM model based on instance transfer learning
to study failures in bearings. The LSTM model incorporated memory cells and three gates
(an inputting gate, a forgetting gate, and an outputting gate) within its network structure.
The model used data from the frequency spectra obtained through the fast Fourier transform
(FFT) to enhance the clarity of the information on bearing faults. To train the LSTM model,
datasets from different probability distributions (Dsrc-I and Dtar-I) were used to learn the
mapping relationship between the source domain (Dsrc) and the target domain (Dtar). The
model structure also included peephole connections, which allowed the cell state at the
last moment to influence the three gates, thereby enhancing the control and information
processing. In a comparative analysis with a CNN, a DBN, and an SAE, LSTM ranked third,
with the CNN and the DBN achieving higher accuracies.
In a study [
30
] conducted on diagnosing bearing faults, a novel approach was proposed
using a multi-scale CNN and an LSTM model consisting of two modules. The first module
involved two one-dimensional CNNs with varying kernel sizes and depths, which were
simultaneously applied to raw signals to extract features from different frequency domains.
The feature vectors obtained from the CNNs were then fused using element-wise products.
The second module, known as the classifier, comprised a hierarchical LSTM and a fully
connected layer. The hidden states of LSTM1 served as the input for LSTM2, and the outputs
of LSTM2 were fed into the fully connected layer. This study’s results demonstrated that
the combined model achieved an average accuracy rate of 98.46%, while the LSTM network
alone achieved a comparatively lower accuracy rate of 66.39%.
The authors in [
147
] employed a modified transformer architecture incorporating a
Bidirectional LSTM (BiLSTM) network for the classification of vibration signals. In contrast
to the traditional LSTM, the BiLSTM used an update gate to handle long-term dependency
information. Within the modified transformer network, the BiLSTM served as a branch
layer coupled with Global MaxPooling for extracting high-level non-sequential features
from various perspectives. To confirm its efficacy, experiments revealed that this particular
BiLSTM branch bolstered the performance of the modified transformer by approximately
2 percentage points compared to relying solely on the attention mechanism. During the
experiments, the BiLSTM was equipped with 128 hidden units. The model underwent
testing with a CNN in place of an LSTM, demonstrating that the LSTM-based model
attained a superior accuracy rate and a reduced standard deviation when compared to that
with the CNN.
3.4.4. KNN
K-nearest neighbors (KNN) is a non-parametric lazy learning algorithm used in signal
processing to classify data points based on the ‘k’ closest training examples in the feature
space. It leverages the proximity of the data points to make predictions or classifications.
KNN is particularly effective for tasks like classifying transient disturbances to power
quality based on the signal features extracted from the data.
In [
160
], an EMD-KNN method was applied to analyzing wind power rolling bearings.
It incorporated the KNN algorithm to identify the frequency characteristics of various states
using complex signal data. To gauge the similarity between the data points, the algorithm
used the Euclidean distance as the distance measurement method. Prior to inputting data
into the KNN algorithm, data normalization was carried out to standardize them. Weighted
voting based on K-nearest neighbors was employed to make classification decisions re-
garding the data points. This study showcased that the KNN classifier’s accuracy in fault
diagnosis was 100% with a significantly shorter processing time of 0.449198 s in comparison
to that of random forest (RF), Naive Bayes (NB), and a Discriminant Analysis Classifier
(DAC).
Vibration 2024,71037
In the KNN algorithm employed in [
134
], the positions of the training samples remain
fixed, and when a new data point is introduced, the distances between this data point and
all the training samples are computed. Subsequently, K samples with the shortest distances
are pinpointed within the training set. By examining these distances, the algorithm selects
the neighbors closest to the new data sample. The KNN algorithm achieved an accuracy
score of 99.2%, outperforming a decision tree classifier with 98.5% accuracy, while a random
forest (RF) model achieved 99.5% as its accuracy rate.
3.4.5. Random Forest
Random forest is an ensemble learning method that combines multiple decision trees
to improve the accuracy and robustness of the predictions. A set of decision trees classify
signals, extract features, and make predictions based on the input data. By aggregating the
predictions of individual trees, random forest provides the final classification outputs.
The study referenced in [
134
] explored the optimization parameters for a genetic
algorithm (GA) and three distinct classifiers—k-nearest neighbors (KNN), decision trees
(DTs), and random forest (RF)—in order to enhance the performance in diagnosing bearing
faults in induction motors. In this study, each tree in the architecture of the random forest
(RF) algorithm was trained on a random subset of features and samples from the training
data. The class predicted by the individual trees was based on the mode of the class
labels, while the mean value was predicted for regression tasks. This collaborative learning
method aimed to address the issue of overfitting that can arise with single decision trees.
The random forest model achieved an accuracy of 99.5%, the decision tree model reported
98.5%, and the k-nearest neighbors algorithm achieved an accuracy of 99.2%.
In [
129
], a diagnosis model was proposed for a centrifugal multi-level impeller blower.
This model utilized VMD, MSDIs, Fisher’s criterion, and RF. This approach involved
decomposing the vibration signals using VMD and constructing six types of MSDIs from
the decomposed signals. The top-ranked MSDIs were then selected as the fault features
using Fisher’s criterion. The RF classification process included bootstrap-sampling the
training set to reduce overfitting and random feature selection for each node to enhance
the model’s performance. Additionally, tree construction was based on the best features
selected from a random subset, with node splitting conducted until a stopping criterion
was reached. The aggregation of the classification outputs was built upon a majority voting
strategy. Notably, the model achieved a classification accuracy of 95.58%.
3.4.6. Deep Belief Networks (DBNs)
DBNs are types of generative neural networks that combine unsupervised learning
principles and neural networks. Deep belief networks (DBNs) are deep neural network
architectures composed of multiple layers of restricted Boltzmann machines (RBMs) [
302
].
In signal processing, DBNs are used to learn hierarchical representations of the input
signals [303].
The study in [
23
] combined 10,000 samples into a feature set to train and test four
classification models (an ELM, SVM, DBN, and DNN) following a 7:3 ratio. Interestingly,
the DBN exhibited superior performance when frequency domain features were used in
conjunction with a second FFT. However, despite this advantage, the DBN was slightly
outperformed by the other classifiers in the study.
In another study referenced as [
145
], a DBN algorithm with five hidden layers was
used for bearing fault diagnosis. The first two layers consisted of 512 neurons, the third
layer of 128 neurons, the fourth layer of 64 neurons, and the final output layer of 9 neurons.
Remarkably, this approach achieved an accuracy rate of 94.07%. The DBN algorithm
outperformed a CNN, an SVM, and RF in the proposed framework, which incorporated a
novel deep autoencoder method with discriminative information fusion in the model.
In their research, the authors in [
95
] employed adaptive deep belief networks in con-
junction with Dempster–Shafer theory to develop a technique for diagnosing the severity
of bearing faults. The DBN employed a stacked architecture comprising multiple restricted
Vibration 2024,71038
Boltzmann machines (RBM). Each hidden layer from the previous RBM served as the
visible layer for the subsequent RBM. The RBMs in the DBN were structured so that the
visible layer (V) and the first hidden layer (H1) constituted the first RBM, while H1 and
the second hidden layer (H2) constituted the second RBM, and so on. The neurons in the
visible layer were interconnected with the neurons in the hidden layer in each RBM through
a weight matrix. The DBN training process involved two main steps: a pre-training process
that sequentially trained the weights in each stacked RBM and a fine-tuning process that
optimized the entire network after pre-training. Furthermore, a classifier was implemented
on the topmost layer of the DBN network to generate probabilistic outputs and enable
multi-class classification. A Softmax regression model in the final layer of the DBN structure
mapped the non-normalized output to the probability distribution of the predicted output
classes and facilitated multi-class classification and probabilistic output for subsequent D-S
evidence fusion. This study demonstrated that the combined approach of using an adap-
tive parameter-optimized DBN and a D-S-based information fusion scheme significantly
improved the accuracy of the results in diagnosing bearing faults.
The study in [
260
] evaluated deep learning algorithms and their performance in fault
detection in electric motors. The review emphasized that DBNs (deep belief networks) are
able to learn the data features adaptively, without the need for a formal mathematical model.
One of the key advantages of the DBN is its hidden multi-layer structure, which effectively
tackles the issue of dimensionality problems. Additionally, the DBN’s semi-supervised
training method proves to be highly effective in addressing the limitations of the standard
neural network training methods when dealing with problems in multi-layer networks.
3.4.7. Other Machine Learning Algorithms
In terms of the other classifiers that have been employed for the purpose of making
diagnoses in rotating machinery, Table 11 presents the additional classifiers used in the
studies reviewed.
Table 11. Other classifiers used for rotating machinery diagnoses.
Category of Classifiers Algorithm Studies
Neural-Network-Based Classifiers
Graph Neural Network (GNNs) [229]
Deep Neural Networks (DNNs) [23]
Recurrent Neural Networks (RNNs) [260,268]
Generative Adversarial Networks (GANs) [268]
Spiking Neural Networks (SNNs) [57]
A Stacked Inverted Residual Convolution Neural Network (SIRCNN) [108]
Class-Level Matching Transfer Learning Network [273]
Statistical- and Clustering-Based Classifiers
Fuzzy-Logic-Based Confidence Decision [78]
K-Means Clustering [153]
Discriminant Analysis Classifier (DAC) [160]
Gaussian Mixture Model (GMM) [286]
Local Binary Pattern (LBP) [133]
In the medical field, the classifiers in Table 12 achieved significant results in pattern
recognition in EEGs and similar signals. An alternative viewpoint would be to investigate
these classifiers considering the similarities between these signals and vibration signals.
Table 12. Classifiers for EEG and ECG signal classification.
Category of Classifiers Algorithm Studies
Neural-Network- and Attention-Based Bayesian Quadratic Discriminant Transfer Neural Network (BQDTNN) [201]
Multi-Head-Attention-Based Long Short-Term Memory (MHA-LSTM) [263]
Fuzzy Logic and Statistical Methods Fuzzy RBF-ELM Classifier [60]
Gaussian Discriminant Analysis (GDA) [28]
Ensemble Methods EBT Classifiers [190]
Ensemble Classifier Algorithms [44,203]