Citation: Liu, Y.; Gao, W.; Zhao, T.;
Wang, Z.; Wang, Z. A Rapid Bridge
Crack Detection Method Based on
Deep Learning. Appl. Sci. 2023,13,
Academic Editor: Andrea Carpinteri
Received: 25 July 2023
Revised: 26 August 2023
Accepted: 30 August 2023
Published: 31 August 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
A Rapid Bridge Crack Detection Method Based on
Yifan Liu 1,2, Weiliang Gao 3,*, Tingting Zhao 1,2, Zhiyong Wang 1, 2, * and Zhihua Wang 1,2
1Institute of Applied Mechanics, College of Mechanical and Vehicle Engineering, Taiyuan University of
Technology, Taiyuan 030024, China; firstname.lastname@example.org (Y.L.); email@example.com (T.Z.);
2Shanxi Key Laboratory of Material Strength and Structural Impact, Taiyuan University of Technology,
Taiyuan 030024, China
3Institute of Defense Engineering, Academy of Military Sciences (AMS), Peoples Liberation Army (PLA),
Beijing 100850, China
*Correspondence: firstname.lastname@example.org (W.G.); email@example.com (Z.W.)
The aim of this study is to enhance the efﬁciency and lower the expense of detecting cracks
in large-scale concrete structures. A rapid crack detection method based on deep learning is proposed.
A large number of artiﬁcial samples from existing concrete crack images were generated by a deep
convolutional generative adversarial network (DCGAN), and the artiﬁcial samples were balanced
and feature-rich. Then, the dataset was established by mixing the artiﬁcial samples with the original
samples. You Only Look Once v5 (YOLOv5) was trained on this dataset to implement rapid detection
of concrete bridge cracks, and the detection accuracy was compared with the results using only
the original samples. The experiments show that DCGAN can mine the potential distribution of
image data and extract crack features through the deep transposed convolution layer and down
sampling operation. Moreover, the light-weight YOLOv5 increases channel capacity and reduces
the dimensions of the input image without losing pixel information. This method maintains the
generalization performance of the neural network and provides an alternative solution with a low
cost of data acquisition while accomplishing the rapid detection of bridge cracks with high precision.
Keywords: crack detection; concrete; DCGAN; YOLOv5
Concrete is widely used in dams, bridges, and other large-scale engineering struc-
]. For these structures, maintenance, monitoring, and life assessments are very
important tasks after the post-construction period [
]. Among the various disasters that
can occur during the maintenance period of bridge engineering, cracks often appear first [
This is due to the uneven settlement of the bridge foundation in the vertical direction and
displacement in the horizontal direction, leading to internal stresses in the concrete structure
and resulting in cracks [
]. For foundations that are built in phases or subjected to the
effects of frost in cold areas, deformation of the structure and cracks can also occur [
Additionally, bridge cracks can cause significant harm: (1) Cracks can result in leaks, caus-
ing water flow to into the interior of the bridge and destroying the concrete’s mechanical
characteristics and physical properties. When flowing water freezes, it will cause the for-
mation of deeper and wider cracks, which will cause instability of the main structure of
the bridge, decreasing the safety level. The water in the bridge cracks will cause further
development and expansion of cracks under the influence of gravity and pressure during
]. (2) The cracks will lead to carbonation in the concrete structure of the
bridge. The concrete material reacts with CO
from the environment in the presence of
moisture to produce calcium carbonate. As a result, the safety and mechanical properties of
the concrete structure are reduced [
]. (3) Bridge cracks will destroy the purification film
Appl. Sci. 2023,13, 9878. https://doi.org/10.3390/app13179878 https://www.mdpi.com/journal/applsci
Appl. Sci. 2023,13, 9878 2 of 17
of steel bars and metal components and corrode under the simultaneous penetration and
action of external air and water. The rust generated after the corrosion of steel bars is more
than ten times larger than the initial volume, which will reduce the stability of the reinforced
concrete engineering [
]. In summary, it is very important to obtain the location, length,
width, and extension condition of bridge cracks in time through detection technology.
However, with continuous increases in the number of completed bridges and bridge
spans, crack detection work becomes increasingly demanding. At the same time, the
economic costs also become higher [
]. With the accelerated growth of computing capa-
bilities in recent years, advancements in deep learning technology have motivated us to
develop a novel approach to address these problems [
]. Li et al. [
] presented a novel
approach for bridge crack detection by enhancing the encoder–decoder framework and
utilizing a mixed pooling module. The encoder employs dilated convolutions to extract the
fundamental characteristics of crack images. This methodology ensures preservation of the
feature image resolution and facilitates the acquisition of a wide receptive ﬁeld. Notably,
the experimental results demonstrated that this technique achieved signiﬁcantly higher
precision and recall metrics. Li et al. [
] proposed the utilization of deep learning tech-
niques for bridge crack detection using unmanned aerial vehicles (UAVs). They adopted
the faster region convolutional neural network (faster R-CNN) algorithm based on VGG16
transfer learning to detect bridge cracks effectively. The experimental results indicated
that the automatic detection of bridge cracks using UAVs and the faster R-CNN algorithm
could provide enhanced efﬁciency without compromising accuracy. By leveraging the
advantages of depthwise separable convolution, atrous convolution, and the atrous spatial
pyramid pooling module, Xu et al. [
] presented a convolutional neural network (CNN)
end-to-end crack detection method. This study showed that the model had better capa-
bilities compared with conventional classiﬁcation models. Moreover, the model had the
potential to be integrated into any convolutional network, serving as an efﬁcient module
for feature extraction. Based on the current research status, it is obvious that obtaining
large amounts of high-quality data on cracks is still a time-consuming and costly task [
Additionally, it is worth discussing which form of neural network is most suitable for the
rapid detection of bridge cracks.
Therefore, the DCGAN is employed to generate a large number of artiﬁcial crack
samples to expand the dataset. As a representative artiﬁcial neural network (ANN),
DCGAN can mining the potential distribution of image data and achieve image data ﬁtting
so that the neural network can produce high-quality realistic images based on learned
image features [
]. In terms of application ﬁelds, generative adversarial networks, a
popular means of dataset augmentation and generation, have been widely used in medical
ﬁelds, safety inspection, agricultural production, and other ﬁelds [
]. An imbalanced
fault diagnosis method based on the generative model of DCGAN was proposed by
Luo et al. [
] to solve the problems of limited datasets. In order to expand fake ﬁngerprint
data, Choi et al. [
] proposed a method to investigate whether a fake ﬁngerprint generated
by DCGAN was similar to a fake ﬁngerprint from the dataset. At present, image generation
technology is still relatively rare in the ﬁeld of roads, and the intelligent detection of road
cracks is also in its exploratory stages. For example, the collected images of roads often
have complex backgrounds and are heavily affected by light, and conditions such as cracks
are often difﬁcult to distinguish [
]. Therefore, it is of value to study bridge crack data
augmentation methods based on DCGAN.
YOLOv5, which is considered to be an efﬁcient and stable target detection neural
network, is used to achieve the rapid detection of bridge cracks [
]. In recent years, the
YOLOv5 object detection architecture has gained increasing attention in the ﬁeld of com-
puter vision due to its outstanding performance in detecting various objects in real-time
scenarios. The utilization of YOLOv5 for bridge crack detection brings several advan-
]. Firstly, YOLOv5 offers a high accuracy in detecting and localizing cracks on
bridge surfaces. With its advanced anchor-based and anchor-free mechanisms, YOLOv5
can effectively identify and delineate ﬁne cracks, regardless of their length, width, or ori-
Appl. Sci. 2023,13, 9878 3 of 17
entation. This enables engineers and researchers to obtain precise information regarding
crack locations and sizes. Secondly, the real-time capabilities of YOLOv5 enable rapid crack
detection, aiding in efﬁcient inspections. Its speed-optimized architecture allows inspectors
to quickly assess the conditions of bridges. This reduces the time and resources required for
manual inspections, making crack detection more time-efﬁcient and cost-effective. More-
over, YOLOv5 can adapt to varying crack features and textures, ensuring consistent and
reliable crack detection across different bridge types. This ﬂexibility eliminates the need for
multiple detection techniques, simplifying the crack detection process. This study shows
that the combination of DCGAN and YOLOv5 can carry out the detection of engineering
defects rapidly and accurately.
The sections of this study are structured as follows: Section 2presents a detailed
introduction of the establishment of the dataset and the structure and theory of DCGAN
and YOLOv5. In Section 3, the training process, training environment, and predicted results
are presented. Subsequently, Section 4discusses the research motivation, limitations of this
work, and future research directions. Finally, Section 5provides the concluding remarks for
2. Materials and Methods
2.1. Establishment of the Dataset
A total of 2068 sets of bridge crack datapoints with various morphologies were col-
lected, as shown in Figure 1. The dataset of cracks is feature rich. Figure 1a–d shows vertical
cracks, cross cracks, horizontal cracks, and wider cracks, respectively. These 2068 images
were used as the training data for DCGAN.
Figure 1. Partial bridge crack image data: (a–p) The representative cracks.
Appl. Sci. 2023,13, 9878 4 of 17
2.2. The Theory of DCGAN
The GAN was ﬁrst proposed by Goodfellow in 2014 [
], aiming to produce generated
samples that are nearly consistent with the distribution of real data, which belongs to the
category of unsupervised learning [
]. GANs have attracted a large number of researchers
and have subsequently achieved many research results in computer vision ﬁelds such as
image synthesis, style transfer, image repair, and object detection [
]. DCGAN combines
CNN with GAN, which greatly improves the stability of GAN and the effect of the output
results. DCGAN usually consists of a generator and discriminator.
The primary objective of the generator is to understand and absorb the attributes
present in the training data. It accomplishes this by aligning the random noise distribution
with the actual distribution of the training data under the guidance of the discriminator.
This process enables the generator to produce synthetic data that closely resemble the
characteristics observed in the training dataset. The training objective function can be
expressed as follows:
In the equation,
are the real and generated data distributions, respectively,
and Div() is the difference between the distributions.
The primary role of the discriminator is to differentiate between real and generated
data produced by the generator while providing feedback to the generator. Both net-
works undergo iterative training, where their capabilities grow simultaneously until the
generated network is capable of producing data that can deceive the discriminator. This
process focuses on reﬁning the discerning abilities of the discriminator and enhancing the
generative potential of the network. Finally, a certain balance will be reached in terms of
the capabilities of the generator and discriminator. The training objective function can be
expressed as follows:
The true likelihood of the input sample is manifested in the discriminator objective
function value, reﬂecting a binary classiﬁcation challenge. The formula for the training
objective function can be articulated as follows:
V(G,D) = EX∼Pdata(x)[log D(x)] + EZ∼PZ(z)[log(1−D(G(z))] (3)
In the equation,
are the real data xand the noise zexpectations,
is the output of the real data x; and
represents the noise zthrough
Suppose that in a continuous space, Equation (3) can be expressed as:
V(G,D) = Zx[Pdata(x)log(D(x) + PG(x)log(1−D(x)))]dx (4)
For the integrand function F(x),
F(x) = Pdata(x)log D(x) + PGlog(1−D(x)) (5)
are arbitrary non-zero real numbers, Equation (5) obtains the
maximum value when Equation (6) is satisﬁed. Equation (6) can be expressed in the
Pdata(x) + PG(x)(6)
Appl. Sci. 2023,13, 9878 5 of 17
Equation (7) can be obtained by taking Equation (6) into Equation (4):
V(G,D) = −2 log 2 +2JSD(Pd ata ||PG)(7)
In the equation,
. The more
are, the smaller the
divergence; conversely, the more different
and PGare, the larger the JS divergence.
Considering the generator and the discriminator together, the objective function of
DCGAN can be expressed as:
DV(G,D) = EX∼Pdata(x)[log D(x)] + EZ∼PZ(z)[log(1−D(G(z)))] (8)
2.3. The Architecture of the DCGAN
Figure 2shows the generator architecture used in this study. To commence, the
generator is supplied with a random noise vector of 100-dimensions and undergoes an
upward sampling procedure facilitated by dee- transposed convolution layers. Then the
feature maps are obtained, which have different scales. Transposed convolution is a special
kind of forward convolution which ﬁrst expands the dimensions of the input image [
This process is also called up-sampling, which can convert images into higher resolutions.
Every up-sampling operation is succeeded by a layer of batch normalization and a layer
implementing an activation function. The last layer uses the Tanh function, and the other
activation function layers use the Relu function. The initial random noise vector undergoes
a series of seven up-sampling operations, resulting in the generation of an image with
dimensions of 256
1. The complete network structure and parameters of each layer
of the generator is shown in Table 1.
Figure 2. The architecture of the DCGAN.
Figure 2also shows the discriminator architecture used in this study. The discriminator
undertakes seven down-sampling operations when presented with both the real bridge
crack image and the generated crack image. These operations efﬁciently diminish both the
size of the images and the dimensions of their respective features [
]. Each convolution
layer employs convolution kernels measuring 4
4. The count of convolution kernels
employed in each respective layer is 64, 128, 256, 512, 1024, and 1. For each down-sampling
operation, a batch normalization layer and a Leaky Relu function layer follow. The Leaky
Relu function layer incorporates a negative slope coefﬁcient of 0.2. Consequently, a com-
prehensive down-sampling process enables the extraction of the features of the images.
The ultimate loss value is derived based on the features extracted from the down sampling
process. When the image of the unauthentic bridge crack or the real bridge crack is re-
ceived, the loss value closes to 1 or 0, respectively. The complete network structure and the
parameters of each layer of the generator are shown in Table 2.
Appl. Sci. 2023,13, 9878 6 of 17
Table 1. The complete network structure of the generator.
Layer Output Shape Parameter
Reshape_1 (1 ×1×100) 0
Conv2D_transpose_1 (4 ×4×1024) 1,639,424
BN_1 (4 ×4×1024) 4096
Activation_1 (4 ×4×1024) 0
Conv2D_transpose_2 (8 ×8×512) 8,389,120
BN_2 (8 ×8×512) 2048
Activation_2 (8 ×8×512) 0
Conv2D_transpose_3 (16 ×16 ×256) 2,097,408
BN_3 (16 ×16 ×256) 1024
Activation_3 (16 ×16 ×256) 0
Conv2D_transpose_4 (32 ×32 ×128) 524,416
BN_4 (32 ×32 ×128) 512
Activation_4 (32 ×32 ×128) 0
Conv2D_transpose_5 (64 ×64 ×64) 131,136
BN_5 (64 ×64 ×64) 256
Activation_5 (64 ×64 ×64) 0
Conv2D_transpose_6 (128 ×128 ×32) 32,800
BN_6 (128 ×128 ×32) 128
Activation_6 (128 ×128 ×32) 0
Conv2D_transpose_7 (256 ×256 ×1) 513
Activation_7 (256 ×256 ×1) 0
Table 2. The complete network structure of discriminator.
Layer Output Shape Parameter
Conv2D_1 (128 ×128 ×64) 1088
BN_1 (128 ×128 ×64) 256
Leaky Relu_1 (128 ×128 ×64) 0
Conv2D_2 (64 ×64 ×128) 131,200
BN_2 (64 ×64 ×128) 512
Leaky Relu_2 (64 ×64 ×128) 0
Conv2D_3 (32 ×32 ×256) 524,544
BN_3 (32 ×32 ×256) 1024
Leaky Relu_3 (32 ×32 ×256) 0
Conv2D_4 (16 ×16 ×512) 2,097,664
BN_4 (16 ×16 ×512) 2048
Leaky Relu_4 (16 ×16 ×512) 0
Conv2D_5 (8 ×8×1024) 8,389,632
BN_5 (8 ×8×1024) 4096
Leaky Relu_5 (8 ×8×1024) 0
Conv2D_6 (4 ×4×1) 16,385
Flatten_1 16 0
Dense_1 1 17
2.4. The Architecture of YOLOv5
YOLOv5 is an efﬁcient target detection algorithm [
]. Similar to previous generations
of Yolo algorithms, YOLOv5 adopts the concept of grids, that is, the image is divided into
multiple meshes, each of which is responsible for predicting one or more objects. Each grid
can produce prediction boxes (i.e., anchor), and templates for three prediction boxes are
generally pre-stored. The anchor has a preset width, height, coordinates, and conﬁdence
level. The conﬁdence level indicates the probability that an object is present in the mesh.
When the mesh in which anchor is located has objects, the conﬁdence level is 1, and vice
versa is 0. If we regard the difference between the width and height of anchor and the
difference of coordinates as losses, and the binary cross entropy as a loss of conﬁdence,
then the problem of target detection will be greatly simpliﬁed into a simple regression
prediction and classiﬁcation problem.
Appl. Sci. 2023,13, 9878 7 of 17
Figure 3shows the YOLOv5 architecture used in this study. YOLOv5 mainly consists
of Backbone network, Neck network and Head network. The Backbone part is mainly
used for feature extraction and continuous reduction of feature map. The fusion of shallow
graphic features and deep semantic features is performed by Neck network and the Head
network is the classiﬁer and regressor of YOLOv5. The main modules of YOLOv5 include
Focus, CBL, CSP1_X, CSP2_X and SPP. Among them, the Focus module operates as part of
the initial processing before the Backbone network. This operation essentially retrieves a
value from alternate pixels in an image, creating an effect akin to proximate down sampling.
The original 640
3 image is inputted into the Focus module, and the feature map
12 is ﬁrst changed by slicing operation. It ﬁnally becomes a feature map
with the size of 320
32 after a convolution operation; The convolutional layer,
batch normalization layer and Leaky Relu function are the main parts of The CBL module;
CSP1_X module uses the CSPNet structure, consisting of three convolutional layers and X
Res units modules while CSP2_X module replaces Res unit with CBL; Multi-scale fusion is
carried out by SPP module with Maxpool operation of 5
9 and 13
Figure 3. The architecture of YOLOv5.
3.1. The Training Process and Results of DCGAN
The experimental environment is presented in Table 3. The training process of the
neural network involves 2000 epochs. A batch size of 64 is used, and the weight parameters
of both the discriminator and generator are automatically saved every 50 epochs. The
loss function adopts the binary cross entropy, which can be expressed by Equation (9) [
The loss curves of the generator and discriminator are presented in Figure 4. The training
process is mainly divided into two phases (an unstable phase and a stable phase). The loss
values ﬂuctuate greatly in the unstable phase, and the loss curves convergence gradually.
Real-time visualization allows for visual inspection of the samples. As shown in Figure 5,
the training results of four representative epochs are selected. In the early phase of train-
ing (approximately 0~1200 epochs), the generated cracks have obvious defects, and the
morphology of the cracks is only slightly learned by the neural network. In approximately
1200~1600 epochs, the morphology of the cracks gradually appears, but the generated
results are still unstable. The quality of the produced cracks notably improves as the
training progresses into its later stages.
Appl. Sci. 2023,13, 9878 8 of 17
Table 3. Parameters of the computer environment.
System Windows 10
CPU Inter Core i7-11800H CPU @ 2.3 GHz
Memory 8 GB
Graphics card NIVIDA GeForce RTX3060
Environment Python 3.6, Tensorfolw 2.8.0, Keras 2.8.0, Numpy 1.22.2
Conﬁguration CUDA 11.2
The loss curves of the DCGAN: (
) The generator loss curve of the training process; (
discriminator loss curve of the training process.
Figure 5. The partial real-time visualization.
Appl. Sci. 2023,13, 9878 9 of 17
In this equation, Nis the output size of the predicted results of neural network,
the true value of the ilabel, and ˆ
yiis the predicted value of the ilabel.
For computational efﬁciency, the best training results are chosen from the images
generated at the 1950th training epoch. The current weight parameters are saved for future
use in generating bridge cracks efﬁciently. Under our computer hardware conditions, the
time cost of generating 1000 artiﬁcial crack samples is on the order of 10 s. Figure 6shows
part of original crack images and generated crack images. The diversity in shapes of the
generated cracks is evident, including vertical cracks, cross cracks, horizontal cracks, and
X-type cracks, which are consistent with the real dataset. From a subjective perspective,
the bridge cracks produced by the trained DCGAN effectively encapsulate the primary
features of the original cracks. It can be challenging to differentiate the generated samples
from the original cracks. Finally, 90 generated bridge cracks are used as the dataset in the
following experiments and analysis.
The partial original samples and generated samples: (
) The original crack samples;
(e–h) The generated crack samples.
3.2. The Training Process and Results of YOLOv5
The experimental environment is presented in Table 4. With a total of 300 training
epochs, the batch size is set to 4. The binary cross entropy, box loss, and object loss are
employed as the loss functions of the neural network. The classiﬁcation loss is evaluated
by the binary cross entropy; that is, calculating whether the anchor and the corresponding
calibration classiﬁcation are correct. The effect of the box loss function is to evaluate the
location loss; that is, calculating the error between the anchor and the calibration box, which
can be expressed by Equation (10). The object loss is calculated by IOU [
], which is the
intersection and union ratio between the real and predicted boxes.
Table 4. Parameters of the computer environment.
System Windows 10
CPU Inter Core i7-11800H CPU @ 2.3 GHz
Memory 8 GB
Graphics card NIVIDA GeForce RTX3060
Environment Python 3.8.5, Pytorch 1.8.0, NUMPY 1.21.5
Conﬁguration CUDA 11.2
Appl. Sci. 2023,13, 9878 10 of 17
LCIOU =1−IOU +ρ2(b,bgt)
(1−IOU) + v(11)
hgt −arctan w
In Equations (10)–(12),
are the weight of the predicted and real boxes,
are the height of predicted and real boxes, respectively;
the central points of the predicted and real boxes, respectively; ρrepresents the Euclidean
distance between two central points;
is the diagonal distance across the smallest enclosed
area that is able to encompass both the predicted and real boxes.
The learning rate is a very important hyperparameter in the neural network, which
will inﬂuence the accuracy and speed of the training process [
]. The learning rate curves
of the training process of YOLOv5 are shown in Figure 7. The warm-up algorithm is
used in the initial phase of training (phase 1 in Figure 7). The purpose of the warm-
up algorithm is as follows [
]: At the beginning of training, the weights of the model
are randomly initialized, and at this time, a larger learning rate may bring instability
(oscillation) to the model. However, the warm-up algorithm can involve several epochs
with a small learning rate at the beginning of training. Therefore, the model can slowly
tend to stabilize, then select the pre-set learning rate for follow-up training, which makes
the model convergence faster and the model effect better. The cosine annealing algorithm
is used throughout the remainder of the training (phase 2 in Figure 7) [
]. The idea of
the cosine annealing algorithm is as follows: The cosine function is used by the cosine
annealing to reduce the learning rate. The cosine value ﬁrst decreases slowly with an
increase in x in the cosine function, then it decreases more rapidly followed by decreasing
slowly again, which satisﬁes the requirements of the learning rate of the gradient descent
algorithm. In this study, the initial learning rate and the pre-set learning rate are set to 0.001
and 0.01, respectively.
The original dataset (including 180 sets of training samples and 10 sets of validation
samples) and the extended dataset (including 180 sets of training samples and 10 sets of
validation samples) were established. The samples in the original dataset are all real images,
while half of the samples in the training dataset are generated. The detection accuracy and
model performance were evaluated using Precision, Recall, MAP-0.5, and MAP-0.5:0.95.
The precision and recall can be expressed by Equations (13) and (14), respectively. AP is
the average accuracy; that is, the area under the PR curve (recall on the horizontal axis and
precision on the vertical axis) of a speciﬁc classiﬁcation in all images. MAP is the average of
all the classiﬁcations of AP in all images. Therefore, MAP-0.5 represents the value of MAP
when IOU is equal to 0. When IOU is equal to 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, and
0.95, MAP-0.5:0.95 are the average values of MAP, respectively.
TP +FP (13)
In Equation (13), TP is the number of positive sample (IOU
threshold) with the
correct classiﬁcation, and FP is the number of negative samples (IOU < threshold) with the
TP +FN (14)
In Equation (14), FN is the number of positive samples with the wrong classiﬁcation.
Appl. Sci. 2023,13, 9878 11 of 17
Figure 7. The learning rate curve of YOLOv5.
Each loss curve and index of the training process is shown in Figure 8. It can be
seen that the loss and accuracy curves of original and extended datasets almost coincide.
The ﬁnal loss and accuracy convergence values are also approximate. Therefore, the
trained YOLOv5 based on the original dataset and extended dataset have similar detection
accuracies. Moreover, two representative YOLOv5 models (YOLOv5m and YOLOv5l) were
trained on the extended dataset and compared with the presented model [
]. Figure 9
shows the approximate training losses and detection accuracy of the three models. However,
the model sizes of YOLOv5m and YOLOv5l are larger, leading to longer training times and
greater GPU memory requirements. Figure 10 shows the detection results based on the
original dataset (Figure 10a–c) and extended dataset (Figure 10d–f) of the trained YOLOv5.
Under our computer hardware conditions, the time cost of detecting one crack image is
on the order of 10 ms. Zheng et al. [
] used an improved YOLOv5 to carry out automatic
concrete pavement crack detection, and the time cost of detecting one crack image was
on the order of 5.1 ms. Pei et al. [
] also employed a deep learning method to extend the
dataset and the faster R-CNN to detect the cracks. The average precision was approximately
0.86, but they used a large dataset containing 1000 real images and 3000 generated cracks.
The results indicate that the proposed method is effective.
Appl. Sci. 2023,13, 9878 12 of 17
The loss and accuracy curves of YOLOv5: (
) The box loss curves of the training dataset;
) The box loss curves of the validation dataset; (
) The object loss curves of the training dataset;
) The object loss curves of the validation dataset; (
) The Precision curves of the training process;
) The Recall curves of the training process; (
) The MAP-0.5 curves of the training process; (
MAP-0.5:0.95 curves of the training process.
Comparison of three different YOLOv5 models: (
) The object loss curves of the validation
dataset; (b) The MAP-0.5:0.95 curves of the training process.
Appl. Sci. 2023,13, 9878 13 of 17
The detection results of YOLOv5: (
) The detection results based on the original dataset;
(d–f) The detection results based on the extended dataset.
Large-scale engineering projects play a pivotal role in modern society, ranging from
infrastructure development to environmental protection. However, the success and sustain-
ability of these projects heavily rely on their post-construction maintenance and monitoring.
Effective post-construction management is important for ensuring the long-term func-
tionality and safety of such projects [
]. One primary reason for the signiﬁcance of
post-construction maintenance and monitoring is the dynamic properties of engineering
structures and systems. As these structures are exposed to changing environmental condi-
tions, they undergo various forms of deterioration over time. Without proper maintenance,
this deterioration can escalate, resulting in impaired functionality, decreased performance,
and compromised safety. Regular inspections, repairs, and replacements are essential to
rectify damage, avoid catastrophic failures, and extend the lifespan of large-scale engi-
neering projects [
]. Additionally, post-construction monitoring provides crucial data for
understanding the behavior and performance of these projects in real-world conditions. By
collecting and analyzing information on structural stresses, vibrations, deﬂections, envi-
ronmental impacts, and defects, it is possible to identify potential issues and optimize the
design and operation of similar future projects. Such monitoring also facilitates prompt
interventions, allowing for early identiﬁcation of problems and effective maintenance
Bridge crack detection is an important task in large-scale engineering project moni-
toring, as cracks on bridges may pose signiﬁcant hazards [
]. It is important to recognize
that the severity and potential hazards associated with cracks depend on various factors,
including the type, size, and location of the crack, as well as the overall condition of the
bridge. The damage to the bridge caused by cracks can also be diverse [50,51]:
Appl. Sci. 2023,13, 9878 14 of 17
Cracks can weaken the structure, making it susceptible to sudden failure, particularly
under heavy trafﬁc loads or extreme weather conditions.
Cracks can accelerate the degradation and deterioration of bridge materials. Moisture
penetration through cracks can promote corrosion in reinforced concrete or steel
elements, further compromising their structural integrity.
Cracks can affect the dynamic behavior of bridges, leading to decreased stability
and increased vulnerability to external forces, such as earthquakes or strong winds.
Moreover, fatigue cracks caused by repetitive loading and stress cycles introduce a
gradual deterioration process that can eventually lead to catastrophic failure.
A key concern related to cracks on bridges is their impact on the safety of transporta-
Settlement cracks, arising from the differential settlement of bridge foundations, can
cause misalignments and deformations that affect the overall stability and functional-
ity of the structure.
To address the hazards posed by cracks on bridges, effective maintenance and repair
strategies are essential. Routine inspections and monitoring programs play a crucial role in
detecting cracks in early stages and evaluating their severity. Depending on the size and
extent of a crack, appropriate repair techniques, such as crack injection or grouting, need to
be implemented to restore the structural integrity. In conclusion, cracks pose signiﬁcant
hazards to bridges, jeopardizing their structural integrity and safety. Understanding the
causes, types, detection methods, and consequences of cracks in bridges is crucial for
designing appropriate maintenance and repair strategies .
However, traditional methods are costly for massive bridges spanning large distances.
Further research and technological advancements in the ﬁeld of rapid crack detection tech-
nologies are instrumental for ensuring the long-term sustainability of bridge infrastructure
and ensuring the safety of the public. Deep learning provides a new method for solving
this problem with the rapid development of computer technology [
]. Firstly, drone
technology can be used to sample the entire bridge [
]. This technique has the following
advantages: (1) It is safe and reliable. The use of drones for inspecting bridges eliminates
the need for manual inspection, avoids casualties, improves operational safety, and saves
inspection costs. (2) The unmanned detection accuracy of bridges is high. The drone itself
can carry high-deﬁnition cameras to take images of cracks. (3) It is more efﬁcient to use
drones for bridge inspection. Drone sampling technology is relatively mature. The proposal
of collision-tolerant UAVs along with a two-stage inspection method for bridge coating was
put forward by Jiang et al. [
]. Junwon et al. [
] extensively described the principles of
drone-facilitated inspections as well as key factors to consider for optimal data collection.
The images of the entire bridge captured by the UAV can be divided into many small-
sized pictures and entered into a computer for crack detection. Even though rapid detection
of cracks can be achieved using deep learning techniques, it still requires a large dataset
to train the neural network. Additionally, it should be pointed out that the existing small
sample training strategy is not sufﬁcient to solve the issue of data scarcity [
]. In order
to further reduce the cost of data acquisition and manual detection, the DCGAN can be
employed to generate artiﬁcial crack samples from existing crack images. Peng et al. [
used the DCGAN to reconstruct the oil reservoir fracture model. The results showed that
the reconstructed model could predict the pressure distribution accurately.
Finally, YOLOv5 is used to carry out the detection process. YOLOv5 applies the
adaptive anchor. Therefore, YOLOv5 learns the best anchor boxes in the dataset during the
training process without having to run the K-means clustering algorithm ofﬂine to obtain
the K anchor boxes and modify the head network parameters. Overall, the training process
of YOLOv5 is simple and automated. Moreover, the Focus module is used by YOLOv5,
which augments the channel count (the channel count has a minimal inﬂuence on the
computation quantity) and reduces the dimensions of the input image without losing pixel
information. As a result, the model calculation amount is greatly reduced .
Appl. Sci. 2023,13, 9878 15 of 17
This study has several limitations. The quality of the samples generated by the
DCGAN needs to be improved, which is inﬂuenced by many factors like the size of the
training dataset, the computing power of the computer, and hyperparameter optimization.
During the training of the DCGAN, problems such as mode collapse and image collapse
may also be encountered [
]. In future work, we will investigate more efﬁcient DCGAN
training strategies and achieve a higher-quality reconstruction of the cracks.
A method combining DCGAN and YOLOv5 which can detect bridge cracks rapidly
and accurately was proposed. Moreover, we described the theory, training environment,
parameter setting, and neural architecture of the DCGAN and YOLOv5. This work inves-
tigated the performance of the proposed model and compared the training results of the
extended dataset with the original dataset. The main ﬁndings are presented as follows:
The trained DCGAN can learn the characteristics of cracks and quickly generate
a large number of artiﬁcial bridge crack images which are used to extend the real
dataset. The time cost of generating 1000 artiﬁcial crack samples was on the order of
10 s. The generated images were balanced and feature-rich.
The YOLOv5 target detection neural network can perform crack identiﬁcation and
rapid detection. The time cost of detecting one crack image is on the order of 10 ms.
The results indicate that when YOLOv5 was trained on extended dataset, it had a
similar detection accuracy compared with when it was trained on the original dataset
(real dataset), which provides a new idea for the cost control of maintenance and
monitoring of large-scale concrete structures.
The proposed method of combining DCGAN and YOLOv5 has been proven to be
acceptable, especially in terms of cost-effectiveness. However, the quality of the images
generated by the DCGAN and the detection accuracy of YOLOv5 need to be improved.
These factors are inﬂuenced by factors such as the computing power of the computer,
hyperparameter optimization, and training strategy. In future work, advanced training
strategies and optimization algorithms for neural networks will be the focus of our research.
Conceptualization, Y.L. and W.G.; Methodology, Y.L. and W.G.; Software,
T.Z.; Validation, W.G., T.Z., Z.W. (Zhiyong Wang) and Z.W. (Zhihua Wang); Formal analysis, Y.L.;
Investigation, Y.L.; Data curation, Y.L.; Writing—original draft, Y.L.; Writing—review & editing, W.G.,
T.Z., Z.W. (Zhiyong Wang) and Z.W. (Zhihua Wang); Supervision, Z.W. (Zhiyong Wang) and Z.W.
(Zhihua Wang); Funding acquisition, Z.W. (Zhiyong Wang) and Z.W. (Zhihua Wang). All authors
have read and agreed to the published version of the manuscript.
This research was funded by the National Natural Science Foundation of China (Grant Nos.
12102294, 12272257), the Natural Science Foundation of Shanxi Province (202203021211169), and the
special fund for Science and Technology Innovation Teams of Shanxi Province (Nos. 202204051002006).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conﬂicts of Interest: The authors declare no conﬂict of interest.
Khem, F.C.; Kai, S.W.; Jee, K.H.; Jee, H.L.; Foo, W.L.; Yee, L.L. Experimental and numerical study of the strength performance of
deep beams with perforated thin mild steel plates as shear reinforcement. Appl. Sci. 2023,13, 8217.
Jack, M.; Marcus, P.; Christos, V.; Lorena, B.; Brenden, L. Robotic spray coating of self-sensing metakaolin geopolymer for concrete
monitoring. Automat. Constr. 2021,121, 103415.
Zhang, C.Y.; Wang, M.; Liu, R.T.; Li, X.H.; Yan, J.; Du, H.J. Enhancing self-healing efﬁciency of concrete using multifunctional
granules and PVA ﬁbers. J. Build. Eng. 2023,76, 107314. [CrossRef]
Gabriele, B.; Mario, F.; Luca, G.; Marzia, M. Preliminary investigation on steel jacketing retroﬁtting of concrete bridges half-joints.
Appl. Sci. 2023,13, 8181.
Appl. Sci. 2023,13, 9878 16 of 17
Jang, K.; Jung, H.; An, Y. Automated bridge crack evaluation through deep super resolution network-based hybrid image
matching. Automat. Constr. 2022,137, 104229. [CrossRef]
6. Zhang, T.J. Analysis on the causes of cracks in bridges. J. Constr. Res. 2018,1, 13–26. [CrossRef]
Huang, Y.F.; Chen, Y.G.; Deng, F.M.; Wang, X.M. Design of CB-PDMS ﬂexible sensing for monitoring of bridge cracks. Sensors
2022,22, 9817. [CrossRef] [PubMed]
Yu, S.; He, F.C.; Zhang, J.R. Experimental PIV radial splitting study on expansive soil during the drying process. Appl. Sci.
13, 8050. [CrossRef]
Hawley, C.J.; Gräbe, P.J. Water leakage mapping in concrete railway tunnels using LiDAR generated point clouds. Constr. Build.
Mater. 2022,361, 129644. [CrossRef]
Jiang, C.; Gu, X.L.; Huang, Q.H.; Zhang, W.P. Carbonation depth predictions in concrete bridges under changing climate
conditions and increasing trafﬁc loads. Cement. Concrete Comp. 2018,93, 140–154. [CrossRef]
Oday, I.M.; Salah, S.A.; Alaa, S.A.; Hassane, L.; Belkheir, H.; Abdelkarim, C.; Young, G.K. On the development of an intelligent
Poly(aniline-co-o-toluidine)/Fe3O4/Alkyd coating for corrosion protection in carbon steel. Appl. Sci. 2023,13, 8189.
Li, G.; Liu, T.; Fang, Z.Y.; Shen, Q.; Ali, J. Automatic bridge crack detection using boundary reﬁnement based on real-time
segmentation network. Struct. Control. Health Monitor. 2022,29, 2991. [CrossRef]
Sepasdar, R.; Karpatne, A.; Shakiba, M. A data-driven approach to full-ﬁeld nonlinear stress distribution and failure pattern
prediction in composites using deep learning. Comput. Method. Appl. Mech. Eng. 2022,397, 115126. [CrossRef]
Yaser, G.; Jonny, N.; Tauﬁk, N.; Andrzej, C. Formwork pressure prediction in cast-in-place self-compacting concrete using deep
learning. Automat. Constr. 2023,151, 104869.
Masi, F.; Stefanou, I. Multiscale modeling of inelastic materials with Thermodynamics-based Artiﬁcial Neural Networks (TANN).
Comput. Method. Appl. Mech. Eng. 2022,398, 115190. [CrossRef]
Li, G.; Fang, Z.Y.; Mohammed, A.M.; Liu, T.; Deng, Z.H. Automated bridge crack detection based on improving encoder–decoder
network and strip pooling. J. Infrastruct. Syst. 2023,29, 218. [CrossRef]
Li, R.X.; Yu, J.Y.; Li, F.; Yang, R.T.; Wang, Y.D.; Peng, Z.H. Automatic bridge crack detection using unmanned aerial vehicle and
Faster R-CNN. Constr. Build Mater. 2023,362, 129659. [CrossRef]
Xu, H.Y.; Su, X.; Wang, Y.; Cai, H.Y.; Cui, K.R.; Chen, X.D. Automatic bridge crack detection using a convolutional neural network.
Appl. Sci. 2019,9, 2867. [CrossRef]
Oh, K.; Kim, E.; Park, C.Y.; Chen, X.D. A physical model-based data-driven approach to overcome data scarcity and predict
building energy consumption. Sustainability 2022,14, 9464. [CrossRef]
Abdelhalim, I.S.A.; Mohamed, M.F.; Mahdy, Y.B. Data augmentation for skin lesion using self-attention based progressive
generative adversarial network. Expert. Syst. Appl. 2021,165, 113922. [CrossRef]
Pawar, S.P.; Talbar, S.N. LungSeg-Net: Lung ﬁeld segmentation using generative adversarial network. Biomed. Signal Process.
Control 2021,64, 102296. [CrossRef]
Kazuhiro, K.; Werner, R.A.; Toriumi, F.; Javadi, M.S.; Pomper, M.G.; Solnes, L.B.; Verde, F.; Higuchi, T.; Rowe, S.P. Generative
adversarial networks for the creation of realistic artiﬁcial brain magnetic resonance images. Tomography
Luo, J.; Huang, J.; Li, H. A case study of conditional deep convolutional generative adversarial networks in machine fault
diagnosis. J. Intell. Manuf. 2021,32, 407–425. [CrossRef]
Choi, S.H.; Jung, S.H.; Li, H. Similarity analysis of actual fake ﬁngerprints and generated fake ﬁngerprints by DCGAN. Int. J.
Fuzzy Log. Intell. Syst. 2019,19, 40–47. [CrossRef]
Zhang, K.G.; Zhang, Y.T.; Cheng, H.D. CrackGAN: Pavement crack detection using partially accurate ground truths based on
generative adversarial learning. IEEE Trans. Intell. Transp. 2021,22, 1306–1319. [CrossRef]
Yang, H.Y.; Yang, L.N.; Wu, T.; Meng, Z.Q.; Huang, Y.J.; Wang, P.S.; Li, P.; Li, X.C. Automatic detection of bridge surface crack
using improved Yolov5s. Int. J. Pattern. Recogn. 2022,36, 2250047. [CrossRef]
Mahaur, B.; Mishra, K.K. Small-object detection based on Yolov5 in autonomous driving systems. Pattern. Recogn. Lett.
Zhou, S.; Bi, Y.; Wei, X.; Liu, J.; Ye, Z.; Li, F.; Du, Y. Automated detection and classiﬁcation of spilled loads on freeways based on
improved YOLO network. Mach. Vis. Appl. 2021,32, 44. [CrossRef]
Hu, W.X.; Xiong, J.T.; Liang, J.H.; Xie, Z.M.; Liu, Z.Y.; Huang, Q.Y.; Yang, Z.G. A method of citrus epidermis defects detection
based on an improved YOLOv5. Biosyst. Eng. 2023,227, 19–35. [CrossRef]
Tang, Z.; Zhou, L.; Qi, F.; Chen, H.R. An improved lightweight and real-time YOLOv5 network for detection of surface defects on
indocalamus leaves. J. Real-Time Image Process. 2023,20, 14. [CrossRef]
Jiang, Q.; Li, H. Silicon energy bulk material cargo ship detection and tracking method combining YOLOv5 and DeepSort. Energy.
Rep. 2023,9, 151–158. [CrossRef]
Lan, J.G.; Jean, P.; Mehdi, M.; Xu, B.; David, W.; Sherjil, O.; Aaron, C.; Yoshua, B. Generative Adversarial Networks. CoRR.
Won, U.Y.; An, V.Q.; Park, S.B.; Park, M.H.; Dam, D.V.; Park, H.J.; Yang, H.; Lee, Y.; Yu, W.J. Multi-neuron connection using
multi-terminal ﬂoating-gate memristor for unsupervised learning. Nat. Commun. 2023,14, 3070. [CrossRef]
Appl. Sci. 2023,13, 9878 17 of 17
34. Koochak, R.; Sayyafzadeh, M.; Nadian, A.; Bunch, M.; Haghighi, M. A variability aware GAN for improving spatial representa-
tiveness of discrete geobodies. Comput. Geosci. 2022,166, 105188. [CrossRef]
Kenshi, M.; Isal, N.; Yasuhiro, W. Transposed convolution as alternative preprocessor for brain-computer interface using
electroencephalogram. Appl. Sci. 2023,13, 3578.
Swamy, A.S.A.; Shylashree, N. HDR Image compression by multi-scale down sampling of intensity levels. Int. J. Image Graph
2021,21, 2150048. [CrossRef]
Hamzah, N.A.b.A.; Hadhrami, B.A.G.; Al-Selwi, H.F.; Hassan, N.; Aziz, A.b.A. Facial Mask Detection and Energy Monitoring
Dashboard Using YOLOv5 and Jetson Nano. In Proceedings of the Multimedia University Engineering Conference (MECON 2022);
Atlantis Press: Amsterdam, The Netherlands, 2022.
Liu, Y.F.; Zhang, J.; Zhao, T.T.; Wang, Z.Y.; Wang, Z.H. Reconstruction of the meso-scale concrete model using a deep convolutional
generative adversarial network (DCGAN). Constr. Build Mater. 2023,370, 130704. [CrossRef]
Xu, X.; Qiao, H.B.; Ma, X.M.; Yin, G.H.; Wang, Y.K.; Zhao, J.P.; Li, H.Y. An automatic wheat ear counting model based on the
minimum area intersection ratio algorithm and transfer learning. Measurement 2023,216, 112849. [CrossRef]
Yevick, D.; Melko, R. The accuracy of restricted Boltzmann machine models of Ising systems. Comput. Phys. Commun.
41. He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep residual learning for image recognition. CoRR. 2015, 1–12. [CrossRef]
42. Llya, L.; Frank, H. Decoupled weight decay regularization. CoRR. 2019, 1–11. [CrossRef]
Huang, W.X.; Huo, Y.; Yang, S.C.; Liu, M.J.; Li, H.; Zhang, M. Detection of Laodelphax striatellus (small brown planthopper)
based on improved YOLOv5. Comput. Electron. Agric. 2023,206, 107657. [CrossRef]
Zheng, X.; Qian, S.R.; Wei, S.D.; Zhou, S.Y.; Hou, Y. The combination of transformer and you only look once for automatic concrete
pavement crack detection. Appl. Sci. 2023,13, 9211. [CrossRef]
Pei, L.L.; Sun, Z.Y.; Sun, J.; Li, W.; Zhang, H. Generation method of pavement crack images based on deep convolutional
generative adversarial networks. J. Cent. South Univ. (Sci. Technol.) 2021,52, 2899–3906.
Marwan, H.; Khaled, K. Studying the effectiveness of changing parameters in pavement management systems on optimum
maintenance strategies of low-volume paved roads. J. Transp. Eng. Part B Pavements 2021,147, 04020075.
La, M.L.; Oddo, M.C.; Cucchiara, C.; Granata, M.F.; Barile, S.; Pappalardo, F.; Pennisi, A. Experimental investigation on innovative
stress sensors for existing masonry structures monitoring. Appl. Sci. 2023,13, 3712.
Orlowsky, J.; Be
ling, M.; Kryzhanovskyi, V. Prospects for the use of textile-reinforced concrete in buildings and structures
maintenance. Buildings 2023,13, 189. [CrossRef]
Wang, D.; Zhao, Y.; Wang, J.F.; Wang, Q.; Liu, X.D.; Pappalardo, F.; Pennisi, A. Establishment and effect analysis of trafﬁc load for
long-span bridge via fusion of parameter correlation. Structure 2023,55, 1992–2002.
Eslam, M.A.; Osama, M.; Mohamed, M.; Tarek, Z. Entropy-Based automated method for detection and assessment of spalling
severities in reinforced concrete bridges. J. Perform. Conster. Fac. 2021,35, 04020132.
Barros, J.A.O.; Baghi, H.; Ventura-Gouveia, A. Assessing the applicability of a smeared crack approach for simulating the
behaviour of concrete beams ﬂexurally reinforced with GFRP bars and failing in shear. Eng. Struct.
,227, 111391. [CrossRef]
Qian, H.J.; Li, Y.; Yang, J.F.; Xie, L.H.; Tang, K.H. Segmentation and analysis of cement particles in cement paste with deep
learning. Cement Concrete Comp. 2023,136, 104819. [CrossRef]
Dai, X.; Nagahara, M. Platooning control of drones with real-time deep learning object detection. Adv. Robot.
Jiang, S.; Wu, Y.Q.; Zhang, J. Bridge coating inspection based on two-stage automatic method and collision-tolerant unmanned
aerial system. Automat. Constr. 2023,146, 104685. [CrossRef]
Junwon, S.; Luis, D.; Jim, W. Drone-enabled bridge inspection methodology and application. Automat. Constr.
Zhang, X.Y.; Zhao, T.T.; Liu, Y.F.; Chen, Q.Q.; Wang, Z.Y.; Wang, Z.H. A data-driven model for predicting the mixed-mode stress
intensity factors of a crack in composites. Eng. Fract. Mech. 2023,288, 109385. [CrossRef]
Peng, X.Y.; Rao, X.; Zhao, H.; Xun, Y.F.; Zhong, X.; Zhen, W.T.; Huang, L.Y. A proxy model to predict reservoir dynamic pressure
proﬁle of fracture network based on deep convolutional generative adversarial networks (DCGAN). J. Petrol. Sci. Eng.
Klinwichit, P.; Yookwan, W.; Limchareon, S.; Chinnasarn, K.; Jang, J.S.; Onuean, A. BUU-LSPINE: A thai open lumbar spine
dataset for spondylolisthesis detection. Appl. Sci. 2023,13, 8646. [CrossRef]
The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.