ArticlePDF Available

Automatic Crack Detection on Road Pavements Using Encoder-Decoder Architecture

MDPI
Materials
Authors:

Abstract and Figures

Automatic crack detection from images is an important task that is adopted to ensure road safety and durability for Portland cement concrete (PCC) and asphalt concrete (AC) pavement. Pavement failure depends on a number of causes including water intrusion, stress from heavy loads, and all the climate effects. Generally, cracks are the first distress that arises on road surfaces and proper monitoring and maintenance to prevent cracks from spreading or forming is important. Conventional algorithms to identify cracks on road pavements are extremely time-consuming and high cost. Many cracks show complicated topological structures, oil stains, poor continuity, and low contrast, which are difficult for defining crack features. Therefore, the automated crack detection algorithm is a key tool to improve the results. Inspired by the development of deep learning in computer vision and object detection, the proposed algorithm considers an encoder-decoder architecture with hierarchical feature learning and dilated convolution, named U-Hierarchical Dilated Network (U-HDN), to perform crack detection in an end-to-end method. Crack characteristics with multiple context information are automatically able to learn and perform end-to-end crack detection. Then, a multi-dilation module embedded in an encoder-decoder architecture is proposed. The crack features of multiple context sizes can be integrated into the multi-dilation module by dilation convolution with different dilatation rates, which can obtain much more cracks information. Finally, the hierarchical feature learning module is designed to obtain a multi-scale features from the high to low- level convolutional layers, which are integrated to predict pixel-wise crack detection. Some experiments on public crack databases using 118 images were performed and the results were compared with those obtained with other methods on the same images. The results show that the proposed U-HDN method achieves high performance because it can extract and fuse different context sizes and different levels of feature maps than other algorithms.
Content may be subject to copyright.
materials
Article
Automatic Crack Detection on Road Pavements Using
Encoder-Decoder Architecture
Zhun Fan 1, Chong Li 1,2, Ying Chen 1, Jiahong Wei 1, Giuseppe Loprencipe 2, * ,
Xiaopeng Chen 3and Paola Di Mascio 2
1Key Lab of Digital Signal and Image Processing of Guangdong Province, Department of Electronic and
information Engineering, College of Engineering, Shantou University, Shan’tou 515063, China;
zfan@stu.edu.cn (Z.F.); chongli1217@163.com (C.L.); 19ychen1@stu.edu.cn (Y.C.); 19jhwei@stu.edu.cn (J.W.)
2Department of Civil, Constructional and Environmental Engineering, Sapienza University of Rome,
00184 Rome, Italy; paola.dimascio@uniroma1.it
3Department of Industrial Engineering, Pusan National University, Busan 609735, Korea;
xiaopengchen388@gmail.com
*Correspondence: giuseppe.loprencipe@uniroma1.it
Received: 28 May 2020; Accepted: 30 June 2020; Published: 2 July 2020


Abstract:
Automatic crack detection from images is an important task that is adopted to ensure
road safety and durability for Portland cement concrete (PCC) and asphalt concrete (AC) pavement.
Pavement failure depends on a number of causes including water intrusion, stress from heavy loads,
and all the climate eects. Generally, cracks are the first distress that arises on road surfaces and proper
monitoring and maintenance to prevent cracks from spreading or forming is important. Conventional
algorithms to identify cracks on road pavements are extremely time-consuming and high cost. Many
cracks show complicated topological structures, oil stains, poor continuity, and low contrast, which are
dicult for defining crack features. Therefore, the automated crack detection algorithm is a key tool
to improve the results. Inspired by the development of deep learning in computer vision and object
detection, the proposed algorithm considers an encoder-decoder architecture with hierarchical feature
learning and dilated convolution, named U-Hierarchical Dilated Network (U-HDN), to perform
crack detection in an end-to-end method. Crack characteristics with multiple context information
are automatically able to learn and perform end-to-end crack detection. Then, a multi-dilation
module embedded in an encoder-decoder architecture is proposed. The crack features of multiple
context sizes can be integrated into the multi-dilation module by dilation convolution with dierent
dilatation rates, which can obtain much more cracks information. Finally, the hierarchical feature
learning module is designed to obtain a multi-scale features from the high to low- level convolutional
layers, which are integrated to predict pixel-wise crack detection. Some experiments on public crack
databases using 118 images were performed and the results were compared with those obtained with
other methods on the same images. The results show that the proposed U-HDN method achieves
high performance because it can extract and fuse dierent context sizes and dierent levels of feature
maps than other algorithms.
Keywords:
pavement cracking; automatic crack detection; encoder-decoder; deep learning; U-net;
hierarchical feature; dilated Convolution
1. Introduction
1.1. Motivation
Cracks are common distresses in both concrete and asphalt pavements. Dierent types of cracks
can be observed due to dierent causes: road surface aging, climate, and trac load. The methods
Materials 2020,13, 2960; doi:10.3390/ma13132960 www.mdpi.com/journal/materials
Materials 2020,13, 2960 2 of 18
currently used for road and airport pavement management system (PMS) [
1
,
2
] generally used for the
classification of cracks provided by Shahin [
3
] and adopted by the international standard American
Society for Testing and Materials (ASTM) [
4
]. The classification is defined on crack characteristic and
causes as listed in Table 1and Figure 1.
Table 1. Types of cracks in road pavements.
Flexible Pavements Rigid Pavements
Distress Cause Distress Cause
Alligator Cracking load Corner Break load
Block Cracking Slippage
Cracking tracShattered
Slab/Intersecting Cracks load
Longitudinal Cracking climate Durability (“D”)
Cracking climate
Transverse Cracking climate
Longitudinal, Transverse,
and Diagonal Cracking load
Joint Reflection Cracking
climate Shrinkage Cracks climate
Materials 2020, 13, x FOR PEER REVIEW 2 of 18
currently used for road and airport pavement management system (PMS) [1,2] generally used for the
classification of cracks provided by Shahin [3] and adopted by the international standard American
Society for Testing and Materials (ASTM) [4]. The classification is defined on crack characteristic and
causes as listed in Table 1 and Figure 1.
Table 1. Types of cracks in road pavements.
Flexible Pavements Rigid Pavements
Distress Cause Distress Cause
Alligator Cracking load Corner Break load
Block Cracking Slippage Cracking traffic Shattered Slab/Intersecting Cracks load
Longitudinal Cracking climate Durability (“D”) Cracking climate
Transverse Cracking climate Longitudinal, Transverse, and Diagonal Cracking load
Joint Reflection Cracking climate Shrinkage Cracks climate
Figure 1. Some different crack types. In the top row (from the left to right: alligator cracking, block
cracking, slippage cracking, longitudinal cracking, transverse cracking, and joint reflection cracking);
on the bottom row (from the left to right: corner break, shattered slab/intersecting cracks, durability
(“D”) cracking, longitudinal, transverse, and diagonal cracking, and shrinkage cracks).
The cracks can shorten the service life of roads; indeed, the water that can penetrate them can
reduce the compaction of the materials of the deeper layers of the pavement with the obvious
consequence of a decrease in the load-bearing capacity of the whole structure. In addition, this fact
increases the unevenness of the road surface that and is potential threat to road safety [5–11].
Therefore, it is clear that to maintain the pavement in good condition, crack detection is a significant
step for pavement management. That step can be performed by both visual inspection and automatic
survey. Both methods present good results in terms of distresses analysis, but the automatic crack
detection system is more efficient, quick, lower costing than traditional human vision detection.
Therefore, automatic crack detection has attracted much attention of scientific and technical
corporations in recent years.
1.2. Monitoring System
In the past few decades, many researchers have performed structure health monitoring [12–17].
Yu et al. in [18] proposed an integrated system based on the robot for crack detection, which includes
mobile manipulate and crack detection system. The mobile manipulate system is used to ensure
distance from the objects, and crack detection system is employed to obtain pavement crack
information. Oh et al. in [19] proposed bridge detection system, including a designed car, robot
system, and machine vision system. Lim et al. in [20] designed a crack inspection system, which
consists of three parts: mobile robot, vision system, and algorithm. The camera is mounted on the
mobile robot to collect crack images; Laplacian of Gaussian algorithm is applied to extract crack
information.
Li et al. in [21] used the laser-image techniques to construct the road surface 3D point clouds.
The collecting laser point cloud images are divided into small patches, which is used to identify as
containing cracks or not. The minimum spanning tree is employed to extract the cracks from the
Figure 1.
Some dierent crack types. In the top row (from the left to right: alligator cracking, block
cracking, slippage cracking, longitudinal cracking, transverse cracking, and joint reflection cracking);
on the bottom row (from the left to right: corner break, shattered slab/intersecting cracks, durability
(“D”) cracking, longitudinal, transverse, and diagonal cracking, and shrinkage cracks).
The cracks can shorten the service life of roads; indeed, the water that can penetrate them
can reduce the compaction of the materials of the deeper layers of the pavement with the obvious
consequence of a decrease in the load-bearing capacity of the whole structure. In addition, this fact
increases the unevenness of the road surface that and is potential threat to road safety [
5
11
]. Therefore,
it is clear that to maintain the pavement in good condition, crack detection is a significant step for
pavement management. That step can be performed by both visual inspection and automatic survey.
Both methods present good results in terms of distresses analysis, but the automatic crack detection system
is more efficient, quick, lower costing than traditional human vision detection. Therefore, automatic crack
detection has attracted much attention of scientific and technical corporations in recent years.
1.2. Monitoring System
In the past few decades, many researchers have performed structure health monitoring [
12
17
].
Yu et al. in [
18
] proposed an integrated system based on the robot for crack detection, which includes
mobile manipulate and crack detection system. The mobile manipulate system is used to ensure
distance from the objects, and crack detection system is employed to obtain pavement crack information.
Oh et al. in [
19
] proposed bridge detection system, including a designed car, robot system, and machine
vision system. Lim et al. in [
20
] designed a crack inspection system, which consists of three parts:
mobile robot, vision system, and algorithm. The camera is mounted on the mobile robot to collect
crack images; Laplacian of Gaussian algorithm is applied to extract crack information.
Materials 2020,13, 2960 3 of 18
Li et al. in [
21
] used the laser-image techniques to construct the road surface 3D point clouds.
The collecting laser point cloud images are divided into small patches, which is used to identify as
containing cracks or not. The minimum spanning tree is employed to extract the cracks from the
image patches. Zou et al. in [
22
] proposed path voting techniques to perform crack detection based on
laser range images. Firstly, the local grouping is employed with path voting algorithm based on 3D
point cloud images. Then, crack seeds are used for graph representation to extract cracks information.
Fernandes et al. proposed a crack detection system by using a light field imaging sensor (Lytro Illum
camera), which is employed to disparity information to obtain cracks on the road [23].
1.3. Crack Detection Algorithms
Existing visual-based crack detection algorithms can be roughly divided into two branches:
traditional crack detection methods and artificial intelligence.
1.3.1. Traditional Crack Detection Methods
Wavelet transform: Zhou et al. in [
24
] used a wavelet transform to perform crack detection.
Dierent frequency sub-bands are employed to distinguish crack from images, and high and
low amplitudes are defined as crack and noises, respectively. A 2-D wavelet transformation to
separate crack and no-crack regions was proposed by Subirats et al. in [25].
Image thresholding: A threshold value is applied in some research [
26
28
] to segment crack regions,
followed by morphological technologies for refining the processed crack images. The method
in [
26
] needs to preprocess the images with morphological filter to reduce pixels intensity variance,
followed dynamic thresholding to detect the cracks. These methods have low eciency. Oliveira
in [
26
,
29
] proposed the threshold-based segmentation method. In CrackIT [
30
], the threshold-based
segmentation is proposed to distinguish crack block from the image. After that, they updated
their works to CrackIT toolbox [
29
]. And the latest improvement in [
31
] used the connectivity
consideration as a post-processing step, which contains two steps: selection of prominent “crack
seeds” and binary pixels classification, which can improve segmentation results.
Hand crafted feature and classification: The hand crafted features descriptors are applied to
extract crack information from images, followed by patch classifier. [
32
34
]. Quintana et al. in [
34
]
proposed a computer vision algorithm contains three parts: hard shoulder detection, proposal
regions, and crack classification. The Hough transform (HT) was used to detect the hard shoulder;
the Hough transform features (HTF) and local binary pattern (LBP) was employed in the proposal
regions step; finally, classification was used to detect the crack. It is clear that crack detection
operation has low eciency, and it cannot perform automatic crack detection.
Edge detection-based methods: Other authors applied the Canny [
35
] and Sobel [
36
] edge detector
to extract cracks information. Maode et al. in [
37
] used a modified median filter to remove cracks’
noises and the morphological filters were adopted to detect cracks.
Minimal path-based methods: All these algorithms take brightness and connectivity into
consideration for crack detection. Kaul et al. in [
38
] used the minimal path selection (MPS)
method, which is based on fast-marching algorithm to find open and closed curves, and did not
employ prior knowledge for endpoints and topology. In addition, the proposed method is fairly
robust to the addition of noise. Baltazart et al. proposed three dierent ongoing improvement
with MPS, including selecting crack endpoints, path finding strategy and selection of minimum
path cost, and the proposed method can improve the MPS performance in both segmentation and
computation time [
39
]. Nguyen et al. in [
40
] took brightness and connectivity into consideration for
crack detection simultaneously with free-form anisotropy (FFA). In [
41
], Amhaz et al. introduced
the labelled MPS for minimal path selection, which relies on the localization of minimal path
based on Dijkstra’s algorithm or A* family, and the proposed method can provide robust and
precise results. By contrast, Kass et al. in [
42
] used the theory of actives contours (“snakes”),
which used L2 norm for constrained minimization.
Materials 2020,13, 2960 4 of 18
1.3.2. Artificial Intelligence
Wang et al. in [
43
] proposed a multi-class classification method, which applied support vector
and machine (SVM) and data fusion to inspect aircraft skin crack. Shi et al. proposed a CrackForest
method to describe the crack feature with random structured forests, and the proposed the public CFD
database with road crack images was very popular for scholars and researchers [
44
]. However, these
methods are excessive relying on feature descriptors, which is dicult for human to detect dierent
types of crack images.
Recently, with the development of machine learning classified as deep learning inspired by
structure of the brain called artificial neural networks (ANN) [
45
], many algorithms have been
proposed to perform object detection and image classification tasks. ANN is employed to solve many
civil engineering problems [
46
50
]. Gao and Mosalam in [
51
] applied the transfer learning to detect
damage images with structural method, and this method can reduce the computational cost by using
the pre-trained neural network model. Meanwhile, the author needs to fine the neural network to
perform the crack detection. Local patch information was employed to inspect crack information by
convolutional neural networks (CNN) in [
52
]. In CrackNet [
53
], the algorithm improved pixel-perfect
accuracy based on CNN by discarding pooling layers. In CrackNet-R [
54
], a recurrent neural network
(RNN) is deployed to perform automatic crack detection on asphalt road.
Cha et al. [55]
adopted a
sliding windows based on CNN to scan and detect road crack. Fan et al. in [
56
] proposed a structured
prediction method to detect crack pixels with CNN. The small structured pixel images (27
×
27
pixels) was input into the neural network, which may generate overload for the computer memory.
Ensemble network is proposed to perform crack detection and measure pavement cracks generated
in road pavement [
57
]. Maeda et al. on [
58
] adopted object detection network architecture to detect
crack images, and the network architecture can be transferred to a smartphone to perform road crack
detection. Cha et al. used the Faster-RCNN to inspect road cracks [
59
]. Yang et al. in [
60
] adopted a fully
convolutional network (FCN) to inspect road pavement cracks at pixel level, which can perform crack
detection by end-to-end training. Li et al. in [
61
] employed the you-only-look-once v3 (YOLOv3)-Lite
method to inspect the aircraft structures, and the depth wise separable convolution and feature pyramid
were adopted to design the network architecture and joined the low- and high-resolution for crack
detection. Jenkins et al. presented an encoder-decoder architecture to perform road crack detection,
and the function of the encoder and decoder layers are used to reduce the size of input image to
generate lower level feature maps, and obtain the resolution of the input data with up-sampling,
respectively [
62
]. Tisuchiya et al. proposed a data augmentation method based on YOLOv3 to perform
crack detection, which can increase the accuracy eectively [63].
It is clear that the feature maps become more and more coarse after several convolution and
pooling operations in the CNN process. At the same time, the detailed and abstracted features are
presented in large-scale and small-scale layers. Liu et al. in [
64
] proposed an algorithm to fuse dierent
scale features to improve object detection performance. In the image segmentation process, U-net is
proposed in [
65
] to perform semantic image segmentation based on encoder-decoder architecture to
improve accuracy. The dilated convolution for multiple rates is proposed in [
66
68
] to increase context
and obtain more deeper features to improve network performance.
1.4. Contribution
Inspired by above observations, in this paper a new network called U-HDN, to fuse multi-scale
features in encoder-decoder network based on U-net for crack detection is proposed. The flowchart
and the proposed U-HDN architecture are shown in Figures 2and 3, and the proposed method consists
of three components: U-net architecture, multi-dilation module (MDM), and hierarchical feature (HF)
learning module. Firstly, an U-net is divided into encoder and decoder networks, which have the same
scale at each stage. The encoder networks are applied to extracted features of cracks after convolutions
and pooling layers. The decoder networks are employed to restore the image size after a series of
up-sampling and convolution layers.
Materials 2020,13, 2960 5 of 18
Materials 2020, 13, x FOR PEER REVIEW 5 of 18
size can be integrated into multi-dilation module by dilation convolution with different dilation rates,
which can obtain much more cracks information. Next, hierarchical feature (HF) learning module is
designed to obtain multi-scale feature from the high- to low- level convolutional layers. The single-
scale features of each convolutional stage are used to predict pixel-wise crack detection at side output.
Figure 2. Flowchart for detecting pavement cracks.
Finally, the single-scale feature at each side output is concatenated to produce a final fused
feature map. Both side outputs and fused results are supervised by deeply-supervised nets (DSN)
[69].
The contributions of U-HDN are the following:
1. A new automatic road crack detection method, called U-HDN based on U-net is designed, and
encoder-decoder networks are introduced to perform end-to-end training for crack detection.
The hierarchical features of crack can be learning in multiple scales and scenes effectively.
2. U-net architecture is modified. Firstly, the pool4, conv9, conv10, and up-conv1 based on U-net
model are removed. Secondly, in order to implement end-to-end training, zero-padding during
each convolution and up-convolution process are performed.
3. The MDM is proposed to learn crack features of multiple context sizes. The crack features of
multiple context size can be integrated into MDM by dilation convolution with different dilation
rates.
4. HF learning module is designed to obtain multi-scale feature from the high convolutional layers
to low-level convolutional layers. The fusion of hierarchical convolutional features shows a
better performance for inferring cracks information.
The rest of this paper is organized as follows: the details of the proposed U-HDN is described in
the Section 2 (Methods). Some comprehensive experiments to show the performance for U-HDN and
make a comparison with state-of-art algorithms were conducted and the results are discussed in the
Section 3 (Experiments and Results). Finally, Section 4 reports the conclusions of the research and
some possible future improvements of the method are proposed.
2. Methods
In this section, the details of proposed method are introduced, which are the core component of
U-HDN. End-to-end classification approach based on encoder-encoder network is employed to
perform road crack detection.
The image features are auto-selection in the convolutional operation process, and the selection
image features are based on image pixels information from the point of deep learning. Meanwhile,
the feature maps tend to be considered and calculated in the convolutional operation process.
Therefore, the proposed method is designed and calculated the number of feature maps. In this paper,
we employ spatial domain to calculate the feature maps, and the number of the feature maps are
shown in Figure 3 (shown on the green boxes).
Deep learning tends to learn image features based on convolutional operation without pre-
processing (such as, filter, reducing noises, and data augmentation et al.), according to ground truth,
regression function and other active functions. This operation can present wider generalization
ability in the database, which can accomplish automatic object detection or semantic segmentation
with end-to-end training. Meanwhile, the neural network will auto-learn and extract crack features
by convolutional operation, according to the parameters setting and ground truth.
Figure 2. Flowchart for detecting pavement cracks.
Materials 2020, 13, x FOR PEER REVIEW 6 of 18
Figure 3. The proposed U-HDN architecture consists of three components: U-net architecture, multi-
dilation module, and hierarchical feature learning module. The red dotted box presents the modified
U-net; the green dotted box is a multi-dilation module; the blue dotted box shows the hierarchical
feature learning module.
Figure 3.
The proposed U-HDN architecture consists of three components: U-net architecture,
multi-dilation module, and hierarchical feature learning module. The red dotted box presents the
modified U-net; the green dotted box is a multi-dilation module; the blue dotted box shows the
hierarchical feature learning module.
Materials 2020,13, 2960 6 of 18
Then, a multi-dilation module (MDM) is designed, which is embedded into an encoder-decoder
architecture to obtain cracks features of multiple context sizes. The crack features of multiple context
size can be integrated into multi-dilation module by dilation convolution with dierent dilation rates,
which can obtain much more cracks information. Next, hierarchical feature (HF) learning module is
designed to obtain multi-scale feature from the high- to low- level convolutional layers. The single-scale
features of each convolutional stage are used to predict pixel-wise crack detection at side output.
Finally, the single-scale feature at each side output is concatenated to produce a final fused feature
map. Both side outputs and fused results are supervised by deeply-supervised nets (DSN) [69].
The contributions of U-HDN are the following:
1.
A new automatic road crack detection method, called U-HDN based on U-net is designed,
and encoder-decoder networks are introduced to perform end-to-end training for crack detection.
The hierarchical features of crack can be learning in multiple scales and scenes eectively.
2.
U-net architecture is modified. Firstly, the pool4, conv9, conv10, and up-conv1 based on U-net
model are removed. Secondly, in order to implement end-to-end training, zero-padding during
each convolution and up-convolution process are performed.
3.
The MDM is proposed to learn crack features of multiple context sizes. The crack features
of multiple context size can be integrated into MDM by dilation convolution with dierent
dilation rates.
4.
HF learning module is designed to obtain multi-scale feature from the high convolutional layers
to low-level convolutional layers. The fusion of hierarchical convolutional features shows a better
performance for inferring cracks information.
The rest of this paper is organized as follows: the details of the proposed U-HDN is described
in the Section 2(Methods). Some comprehensive experiments to show the performance for U-HDN
and make a comparison with state-of-art algorithms were conducted and the results are discussed in
the Section 3(Experiments and Results). Finally, Section 4reports the conclusions of the research and
some possible future improvements of the method are proposed.
2. Methods
In this section, the details of proposed method are introduced, which are the core component
of U-HDN. End-to-end classification approach based on encoder-encoder network is employed to
perform road crack detection.
The image features are auto-selection in the convolutional operation process, and the selection
image features are based on image pixels information from the point of deep learning. Meanwhile, the
feature maps tend to be considered and calculated in the convolutional operation process. Therefore,
the proposed method is designed and calculated the number of feature maps. In this paper, we employ
spatial domain to calculate the feature maps, and the number of the feature maps are shown in Figure 3
(shown on the green boxes).
Deep learning tends to learn image features based on convolutional operation without
pre-processing (such as, filter, reducing noises, and data augmentation et al.), according to ground
truth, regression function and other active functions. This operation can present wider generalization
ability in the database, which can accomplish automatic object detection or semantic segmentation
with end-to-end training. Meanwhile, the neural network will auto-learn and extract crack features by
convolutional operation, according to the parameters setting and ground truth.
2.1. U-Net Architecture
In this paper, the main backbone of the U-HDN is based on U-net architecture, which is divided
into two parts: contracting path (or encoder) and expansive path (or decoder) locating in the left and
right side, respectively [65].
Materials 2020,13, 2960 7 of 18
As is shown in Figure 3, the red dotted box presents the modified U-net. Contracting path
consists of two 3 ×3 convolution layers, each followed by the activation function rectified linear unit
(ReLU) [70], and a 2 ×2 max pooling layers for down-sampling.
The expansive path consists of a 2
×
2 up-convolution being up-sampled features, cropped features
from the contracting path, and two 3
×
3 convolution layers, each followed by the activation function
ReLU. In this U-net architecture, the components pool4, conv9, conv10, and upconv1 were removed.
Secondly, in order to implement end-to-end training, a transformation zero-padding during each
convolution and up-convolution process was performed. Meanwhile, in order to understand the
convolution neural network, we recommend readers to look up this article [71].
Convolution layer: kfilters (or kernels) belong to the convolutional layer with the weight
w
. In the
convolution process, input image being convolving with filters and plus bias
b
that can obtain
k
feature
maps. In order to increase nonlinearity for output, ReLU is employed as activation function after
convolution process.
Max pooling layer: max pooling is applied to obtain maximum value for each subarray during
down-sampling process, and this operation can reduce computational complexity.
Activation Function: the activation function ReLU to increase nonlinearity for convolution layes’
output was used. At the same time, the sigmoid function to distinguish crack and non-crack pixels
for final output result was adopted [
72
]. Zero-padding: it is convenient to pad the input matrix with
zeros around the border, so that we can apply the filter to bordering elements of our input image
matrix. The function of zero padding can ensure the size of the output image that we desired during
the up-sampling process [65,73].
2.2. Multi-Dilation Module (MDM)
In encoder network of U-net, only one type of the convolutional filters is employed to obtain
receptive field for extracting crack features, which has a negative influence on detecting dierent cracks
types, such as, vertical, horizontal and topologies.
Therefore, a MDM based on encoder features to obtain multiple context sizes’ features was
designed [
66
68
], as is shown in Figure 4. The dilation convolution is able to expand the sizes of
the convolution filters, instead of using larger filter and down-sampling. The MDM has a better
performance for extracting and detecting cracks with multiple context sizes.
Figure 4. The overview of the multi-dilation module.
In a 2-D signal, the dilation convolution is defined as the following equation [66]:
y[i]=
K
X
k=1
x[i+r·k]w[k](1)
where
x[i]
and
y[i]
are input and output signal for each location
i
, respectively.
w[k]
is defined as the
filter of length
K
. Dilation rate
r
corresponds to stride for sampling input signal. It is necessary to
Materials 2020,13, 2960 8 of 18
insert a number of
r
1 zeros between two consecutive filter values along each spatial dimension in
the process of convolution operation. In the standard convolution operation, it can be assumed
r=
1.
Assuming a convolution filter size equal to k, the dilation convolution filter size is kd[66].
kd=k+(k1)×(r1)(2)
As is shown in Figure 5, dierent dilation rates are designed for convolution filters. Although the
dilation convolution expands feature context size in the convolution, it does not increase amount of
calculation with inserting of r1 zeros.
Materials 2020, 13, x FOR PEER REVIEW 8 of 18
where [] and [] are input and output signal for each location , respectively. [] is defined as
the filter of length . Dilation rate corresponds to stride for sampling input signal. It is necessary
to insert a number of −1 zeros between two consecutive filter values along each spatial dimension
in the process of convolution operation. In the standard convolution operation, it can be assumed
=1.
Assuming a convolution filter size equal to, the dilation convolution filter size is [66].
=+(−1
)×(−1) (2)
As is shown in Figure 5, different dilation rates are designed for convolution filters. Although
the dilation convolution expands feature context size in the convolution, it does not increase amount
of calculation with inserting of −1 zeros.
Figure 5. Convolution filters with different dilation rates.
Due to complex road images, different topologies and width, standard convolution can only
obtain one context, which cannot effectively satisfy both thin, simple cracks and wide, complex
cracks.
Therefore, a multi-dilation module (MDM) to address above problems was proposed. This
module uses the different context sizes for crack features and fuses them to get multiple context
features. Firstly, the four dilation rates are defined as 2, 4, 8, and 16, respectively. These four dilation
convolution operations are able to extract crack features with different context sizes. After that, the
five different crack features by a concatenation method were combined. Next, a 1 × 1 convolution to
change the number of features from 512 × 5 to 1024 was used. After this convolution operation, the
multi-dilation module was accomplished, to obtain output features that can have a better
performance for various crack types.
2.3. Hierarchical Feature (HF) and Loss Function
Since the high-level feature maps have more complex context information than low levels during
the deeper convolution operation. Therefore, the HF learning network was adopted (or side 1, 2, 3, 4,
5, and fused), which can perform crack detection individually. A real example is shown in Figure 6,
it shows the ground truth for input image and the fused feature maps ant different scales.
Each side outputs and fused output are supervised by DSN [69] with holistically-nested edge
detection (HED) [74] for edge detection. The HED for crack detection was introduced. A training
database is defined as =(,
),=1,…,, where and are the raw input image and
ground truth crack map, respectively. In order to write convenience, the subscript is dropped in
subsequent paragraphs. and are defined as the number of network parameters and side
networks, respectively.
Figure 5. Convolution filters with dierent dilation rates.
Due to complex road images, dierent topologies and width, standard convolution can only
obtain one context, which cannot eectively satisfy both thin, simple cracks and wide, complex cracks.
Therefore, a multi-dilation module (MDM) to address above problems was proposed. This module
uses the dierent context sizes for crack features and fuses them to get multiple context features. Firstly,
the four dilation rates are defined as 2, 4, 8, and 16, respectively. These four dilation convolution
operations are able to extract crack features with dierent context sizes. After that, the five dierent
crack features by a concatenation method were combined. Next, a 1
×
1 convolution to change the
number of features from 512
×
5 to 1024 was used. After this convolution operation, the multi-dilation
module was accomplished, to obtain output features that can have a better performance for various
crack types.
2.3. Hierarchical Feature (HF) and Loss Function
Since the high-level feature maps have more complex context information than low levels during
the deeper convolution operation. Therefore, the HF learning network was adopted (or side 1, 2, 3, 4,
5, and fused), which can perform crack detection individually. A real example is shown in Figure 6,
it shows the ground truth for input image and the fused feature maps ant dierent scales.
Materials 2020, 13, x FOR PEER REVIEW 9 of 18
Figure 6. A real example of crack detection based on U-HDN. It shows the comparison between
ground truth for input image and fused feature maps at different scales.
Each side network is followed by a classifier and the weights for each side network is denoted
as =(),…,(). The following equation is the loss function for side networks [74].
(,)=

 (,) (3)
where  is the image-level loss function for each side network. The parameter is a
hyperparameter for loss weight at each side-out layer. In this project, =5. During end-to-end
training, the image pixels are divided into crack and non-crack pixels with a classifier. Therefore,
crack detection can be denoted as a binary classification problem. An activation function sigmoid is
applied to distinguish the non-crack and crack pixels. Furthermore, the sigmoid cross entropy loss
function to address imbalance samples problem was modified. This sigmoid loss function [65] with
weight is shown Equation (4):
 =1
{  log+(1−
)log(1−)}

(4)
where and are hyperparameters, is defined as the pixels’ number for one image. and
are the ground truth and predicted output result locating  pixel, respectively.
Each the side network can generate a prediction feature map, which consists of a single output
loss. The entitle outputs of side network are fused to generate final prediction result with
concatenation method, and the fused loss function is equal to :
 =
 (5)
Finally, the total loss function of the entitle network is defined as following equation:
 =
 + (6)
3. Experiments and Results
In this part, the implementation details for the proposed U-HDN are described. Then, evaluation
metric and compared methods are presented. Finally, the experimental results are analyzed.
3.1. Implementation Details
The proposed U-HDN is programmed by Pytorch library [75] as the deep learning framework
for training and testing under Google Colaboratory (free with time limitation) GPU Workstation with
the types of Tesla P100-PCIE-16 GB, memory 16280 MB.
The public databases CFD [44] and AigleRN [76] were used to train and test the proposed
network, which do not demonstrate the visual condition for image collection. The CFD database
Figure 6.
A real example of crack detection based on U-HDN. It shows the comparison between ground
truth for input image and fused feature maps at dierent scales.
Materials 2020,13, 2960 9 of 18
Each side outputs and fused output are supervised by DSN [
69
] with holistically-nested edge
detection (HED) [
74
] for edge detection. The HED for crack detection was introduced. A training
database is defined as
S=(Xn,Yn)
,
n=
1,
. . .
,
N
, where
Xn
and
Yn
are the raw input image and
ground truth crack map, respectively. In order to write convenience, the subscript
n
is dropped
in subsequent paragraphs.
W
and
M
are defined as the number of network parameters and side
networks, respectively.
Each side network is followed by a classifier and the weights for each side network is denoted as
w=w(1),. . . ,w(M). The following equation is the loss function for side networks [74].
Lside(W,w)=
M
X
m=1
αmlm
side(W,wm)(3)
where
lside
is the image-level loss function for each side network. The parameter
αm
is a hyperparameter
for loss weight at each side-out layer. In this project,
M=
5. During end-to-end training, the image
pixels are divided into crack and non-crack pixels with a classifier. Therefore, crack detection can be
denoted as a binary classification problem. An activation function sigmoid is applied to distinguish the
non-crack and crack pixels. Furthermore, the sigmoid cross entropy loss function to address imbalance
samples problem was modified. This sigmoid loss function [65] with weight is shown Equation (4):
lside =1
N
N
X
i=1βyilog ˆ
yi+γ(1yi)log(1ˆ
yi)(4)
where
β
and
γ
are hyperparameters,
N
is defined as the pixels’ number for one image.
yi
and
ˆ
yi
are the
ground truth and predicted output result locating ith pixel, respectively.
Each the side network can generate a prediction feature map, which consists of a single output
loss. The entitle outputs of side network are fused to generate final prediction result with concatenation
method, and the fused loss function is equal to lside:
lf use =lside (5)
Finally, the total loss function of the entitle network is defined as following equation:
Ltotal =Lside +lf use (6)
3. Experiments and Results
In this part, the implementation details for the proposed U-HDN are described. Then, evaluation
metric and compared methods are presented. Finally, the experimental results are analyzed.
3.1. Implementation Details
The proposed U-HDN is programmed by Pytorch library [
75
] as the deep learning framework for
training and testing under Google Colaboratory (free with time limitation) GPU Workstation with the
types of Tesla P100-PCIE-16 GB, memory 16280 MB.
The public databases CFD [
44
] and AigleRN [
76
] were used to train and test the proposed network,
which do not demonstrate the visual condition for image collection. The CFD database contains 118
color images (images of size 320
×
480 pixels), which was collected by iPhone 5 smartphone in Beijing,
China. In this project, a sample of 72 images were used to train the method and a sample of 46 images
were used to test the proposed U-HDN. The AigleRN database includes 38 gray images (with two
types of images’ size: 991
×
462 pixels and 311
×
462 pixels), which was obtained from a sample of
pavements located in France. At the same time, the 24 images and 14 images were employed to train
and test the U-HDN, respectively. In this paper, to extract the crack pixels, and distinguish the crack
Materials 2020,13, 2960 10 of 18
and non-crack pixels some procedures were performed. The images of both public databases have a
resolution equal to 600 ppi; this means that the images were acquired with each pixel corresponding to
approximately 1 mm2of the real road pavement.
The visual condition for these two database was collected at vertical incidence [
44
]. The results in
this research would not have the goal to demonstrate the eect of visual condition, for this reason,
this information is not considered important and it was not reported.
At this moment, the proposed method is not able to detect the crack widths, but the calculus of this
important characteristic will be obtained in the next upgrade of the model. In this paper, we perform
to extract the crack pixels, and distinguish the crack and non-crack pixels.
The training time for CFD is about 5 h and 20 min. The training time for AigleRN is about 3 h.
3.1.1. Parameters Setting
The hyperparameters contain: bath size (4 images for CFD, 1 image for AigleRN), optimizer (adam),
learning rate (0.001), min-learning rate (0.000001), learning rate scheduler (plateau), patience (10),
factor (0.95) with two functions (torch.optim.lr_scheduler.ReduceLROnPlateau and torch.optim.Adam
based on Pytorch library [
75
]). These parameters are intrinsic parameter during training the neural
network, such as learning rate. When we train the CFD, 4 images are input the neural network
once time; When we train the AigleRN, 1 image is input the neural network once time. This setting
can enable crack detection to obtain global optimum in the segmentation performance. We fix the
parameters setting for these two databases during training neural network.
3.1.2. Evaluate Metrics
The models considered in this study were evaluated by three performance measures: the precision
(Pr)
, the recall
(Re)
, and the F1 score
(F1)
. The precision and recall [
77
] are calculated by Equations (7)
and (8) as below:
Pr =TP
TP +FP (7)
Re =TP
TP +FN (8)
where
TP
,
FP
, and
FN
are the number of the true positive, false positive and false negative, respectively.
F
1 is employed to evaluate the overall performance for the crack detection and it is the harmonic
average of Precision and Recall [77] calculated by Equation (9).
F1=2×Pr ×Re
Pr +Re (9)
Specifically, two dierent metrics based on
F
1 are adopted in the evaluation: the best
F
1 on the
public database for a fixed threshold (ODS), and the aggregate
F
1 on the public database for the best
threshold in each image (OIS) [78].
The definitions of the ODS and OIS are reported in the Equations (10) and (11):
ODS =max2×Prt×Ret
Prt+Ret:t=0.001, 0.002, . . . , 0.999(10)
OIS =1
Nimg
Nimg
X
i
max2×Pri
t×Rei
t
Pri
t+Rei
t
:t=0.001, 0.002, . . . , 0.999 (11)
The values
t
,
i
, and
Nimg
are the threshold, index and the number of the images. The parameters
Prt,Ret,Pri
tand Rei
tare precision and recall based on threshold tand image ith, respectively.
For the proposed U-HDN, the transitional areas between non-crack and crack pixels were
considered before computing
TP
,
FP
, and
FN
. Considering the subjective manual labels for ground
Materials 2020,13, 2960 11 of 18
truth, the transitional areas (2 pixels distance) between crack and non-crack pixels are accepted in
these papers [
41
,
56
,
57
,
79
,
80
]. Therefore, 2 pixels of distance is accepted in this project. The decision
threshold is defined as 0.5 to obtain a binary output.
3.2. Discussion for Multi-Dilation Module (MDM)
The dilation rate presented in Equation (1) plays an important role in varying the context size
based on MDM for the U-HDN. A large dilation rate can obtain a large context size, as is shown in
Figures 2and 3. Specifically, dierent dilation rates can get dierent context size, which can produce
dierent prediction results. In order to analyze the dierent eect of dilation rates, an experiment to
proof the setting of the hyperparameters in MDM was performed.
Three groups of
{1, 2, 3, 4}
,
{1, 2, 4, 8}
,
{2, 4, 8, 16}
are tested based on public database CFD
and AigleRN. As shown from the experimental results in Tables 2and 3, group of
{2, 4, 8, 16}
can
obtain the highest accuracy on both databases. The reason is that a large dilation rate can get more
context information of the cracks for the relatively wide or thin crack structure, which can improve the
crack detection accuracy.
Table 2. Experimental results for dierent dilation rates on CFD database.
Dilatation Rates Precision Recall F1 Score
{1, 2, 3, 4}0.943 0.933 0.935
{1, 2, 4, 8}0.944 0.934 0.937
{2, 4, 8, 16}0.945 0.936 0.939
Table 3. Experimental results for dierent dilation rates on AigleRN database.
Dilatation Rates Precision Recall F1 Score
{1, 2, 3, 4}0.914 0.921 0.915
{1, 2, 4, 8}0.919 0.923 0.921
{2, 4, 8, 16}0.921 0.931 0.924
3.3. Experimental Results on CFD
The experimental results of some specimen detection are shown in Figure 7and Table 4based on
CFD. It is clear that Canny and local threshold are sensitive to the noises, which can lead to a negative
influence for crack detection.
Compared with ground truth, it is also observed that CrackForest algorithm can over-measure the
number of cracks and extract the wider cracks with a high recall 0.9514, as shown in Table 4. As is
shown in Figure 5, although structured prediction and U-net can get a better performance for crack
detection, these methods can detect several wrong non-crack pixels. Although ensemble network
(threshold =0.6) can achieve high precision, recall and F1 score, this method can produce resource
redundancy and also occur missed detection in the images, as is shown in Figure 7. At the same time,
this method cannot perform end-to-end training. The values for two images in Figure 7are: Pr: 0.978,
Re: 0.973, F1: 0.975 (top image) and Pr: 0.977, Re: 0.966, F1: 0.971 (bottom image).
Materials 2020, 13, x FOR PEER REVIEW 11 of 18
The values ,, and  are the threshold, index and the number of the images. The parameters
,,
and 
are precision and recall based on threshold and image , respectively.
For the proposed U-HDN, the transitional areas between non-crack and crack pixels were
considered before computing ,, and . Considering the subjective manual labels for ground
truth, the transitional areas (2 pixels distance) between crack and non-crack pixels are accepted in
these papers [41,56,57,79,80]. Therefore, 2 pixels of distance is accepted in this project. The decision
threshold is defined as 0.5 to obtain a binary output.
3.2. Discussion for Multi-Dilation Module (MDM)
The dilation rate presented in Equation (1) plays an important role in varying the context size
based on MDM for the U-HDN. A large dilation rate can obtain a large context size, as is shown in
Figures 2 and 3. Specifically, different dilation rates can get different context size, which can produce
different prediction results. In order to analyze the different effect of dilation rates, an experiment to
proof the setting of the hyperparameters in MDM was performed.
Three groups of {1,2,3,4},{1,2,4,8},{2,4,8,16} are tested based on public database CFD and
AigleRN. As shown from the experimental results in Table 2 and Table 3, group of {2,4,8,16} can
obtain the highest accuracy on both databases. The reason is that a large dilation rate can get more
context information of the cracks for the relatively wide or thin crack structure, which can improve
the crack detection accuracy.
Table 2. Experimental results for different dilation rates on CFD database.
Dilatation Rates Precision Recall F1 Score
{1,2,3,4} 0.943 0.933 0.935
{1,2,4,8} 0.944 0.934 0.937
{2,4,8,16} 0.945 0.936 0.939
Table 3. Experimental results for different dilation rates on AigleRN database.
Dilatation Rates Precision Recall F1 Score
{1,2,3,4} 0.914 0.921 0.915
{1,2,4,8} 0.919 0.923 0.921
{2,4,8,16} 0.921 0.931 0.924
3.3. Experimental Results on CFD
The experimental results of some specimen detection are shown in Figure 7 and Table 4 based
on CFD. It is clear that Canny and local threshold are sensitive to the noises, which can lead to a
negative influence for crack detection.
Figure 7. Results of comparison of proposed U-HDN with other method based on public database
(From left to right: input image, ground truth, Canny, local threshold, CrackForest, structured
prediction, U-net, ensemble network, and proposed U-HDN).
Compared with ground truth, it is also observed that CrackForest algorithm can over-measure
the number of cracks and extract the wider cracks with a high recall 0.9514, as shown in Table 4. As
is shown in Figure 5, although structured prediction and U-net can get a better performance for crack
detection, these methods can detect several wrong non-crack pixels. Although ensemble network
(threshold = 0.6) can achieve high precision, recall and F1 score, this method can produce resource
Figure 7.
Results of comparison of proposed U-HDN with other method based on public database (From
left to right: input image, ground truth, Canny, local threshold, CrackForest, structured prediction,
U-net, ensemble network, and proposed U-HDN).
Materials 2020,13, 2960 12 of 18
Table 4. Crack detection results on CFD.
Methods Tolerance Margin Pr Re F1
Canny [35] 2 0.4377 0.7307 0.457
Local thresholding [26] 2 0.7727 0.8274 0.7418
CrackForest [44] 2 0.7466 0.9514 0.8318
CrackForest [44] 5 0.8228 0.8944 0.8517
MFCD [81] 5 0.899 0.8947 0.8804
Method [79] 2 0.907 0.846 0.87
Structed prediction [56] 2 0.9119 0.9481 0.9244
Ensemble network
(threshold =0.6) [57]2 0.9552 0.9521 0.9533
Ensemble network
(threshold =0.5) [57]2 0.9256 0.9611 0.934
U-net [65] 2 0.9325 0.932 0.928
U-net +HF 2 0.933 0.933 0.931
U-net +MDM 2 0.9302 0.931 0.93
U-HDN 2 0.945 0.936 0.939
The proposed U-HDN can perform end-to-end training and also obtain a satisfactory accuracy
than other algorithms (Pr: 0.945, Re: 0.936, F1: 0.939). The main reason is that U-HDN can extract and
fuse dierent context sizes (based on MDM) and dierent levels (high-level, and low-level based on
HF) feature maps than other algorithms. In Table 5, it is clear that proposed U-HDN achieves superior
performance compared to other algorithms in terms of ODS and OIS.
Table 5. The ODS, and OIS of comparison methods on CFD.
Methods ODS OIS
HED [74] 0.593 0.626
RCF [64] 0.542 0.607
FCN [82] 0.585 0.609
CrackForest [44] 0.104 0.104
FPHBN [78] 0.683 0.705
U-net [65] 0.901 0.897
U-HDN 0.935 0.928
3.4. Experimental Results on AigleRN
The experimental results of some specimen detection are shown in Figure 8and Table 6based on
AigleRN database include 38 images. As shown in Figure 6, it is observed that two traditional methods
(Canny and local threshold) cannot extract the crack skeleton and detect the continuous cracks, which
are susceptible to the noises. It is clear that FFA and MPS are able to inspect local and small cracks
but also fail to extract crack skeleton and find continuous cracks. Although the structured predicted
method can extract rough skeleton and detect cracks, it can also occur missed detection in the images.
The ensemble network is able to obtain a better crack skeleton than structured predicted, but it cannot
find cracks that are more continuous. The values for two images in Figure 7are: Pr: 0.915, Re: 0.961,
F1: 0.937 (top image) and Pr: 0.924, Re: 0.981, F1: 0.952 (bottom image).
Meanwhile, it is clear that FFA can detect thicker crack than our proposed, and cannot extract the
crack skeleton, which can cause the low precision rate, as is shown in Table 6. The method proposed
is able to extract the crack skeleton. Secondly, it is observed that the method can obtain much more
number of false positive than false negative, which lead to the higher recall rate than precision rate.
Then, the 2-pixel distance can also help to improve the precision rate. Finally, the average vales based
on test database can improve the global precision rate.
The proposed U-HDN method can achieve superior performance compared to other algorithms,
as is shown in Figure 6and Table 6(Pr: 0.921, Re: 0.931, F1: 0.924). The main reason is that U-HDN can
Materials 2020,13, 2960 13 of 18
extract and fuse dierent context sizes (based on MDM) and dierent levels (high-level, and low-level
based on HF) feature maps than other algorithms. Hence, U-HDN can get a high accuracy.
Materials 2020, 13, x FOR PEER REVIEW 13 of 18
The proposed U-HDN method can achieve superior performance compared to other algorithms,
as is shown in Figure 6 and Table 6 (Pr: 0.921, Re: 0.931, F1: 0.924). The main reason is that U-HDN
can extract and fuse different context sizes (based on MDM) and different levels (high-level, and low-
level based on HF) feature maps than other algorithms. Hence, U-HDN can get a high accuracy.
Figure 8. Results of comparison of proposed U-HDN with other method based on public database
(From left to right: input image, ground truth, Canny, local threshold, FFA, MPS, structured
prediction, ensemble network, and proposed U-HDN).
Table 6. Crack detection results on AigleRN.
Methods Tolerance Ma
r
gin P
r
Re F1
Canny [35] 2 0.1989 0.6753 0.2881
Local thresholding [26] 2 0.5329 0.9345 0.667
FFA [43] 12 2 0.7688 0.6812 0.6817
MPS [42] 2 0.8263 0.841 0.8195
CrackForest [44] 2 0.8424 0.801 0.8233
CrackForest [44] 5 0.9028 0.8658 0.8839
Structed prediction [40] 2 0.9178 0.8812 0.8954
Method [67] 2 0.869 0.9304 0.8986
Ensemble network (threshold = 0.6) [57] 2 0.9302 0.9266 0.9238
Ensemble network (threshold = 0.5) [57] 2 0.9334 0.8879 0.9211
U-net [65] 2 0.9127 0.9076 0.91
U-net + HF 2 0.911 0.922 0.913
U-net + MDM 2 0.9138 0.9245 0.914
U-HDN 2 0.921 0.931 0.924
3.5. AigleRN Dataset Generalization
As reported above, the AigleRN database include 38 images (two types of resolution: 991 × 462
and 311 × 462). ESAR database (resolution 768 × 512) is collected by a statistic system, which contains
15 images. LCMS database includes 5 images. Because of having small number of images for these
databases, they are combined to obtain a new database, named AEL with in total 38 + 15 + 5 = 58
images. In Table 7, it is clear that proposed U-HDN achieves high performance compared with other
algorithms in terms of ODS and OIS.
Table 7. The ODS, and OIS of comparison methods on AEL.
Methods ODS OIS
HED [74] 0.042 0.626
RCF [64] 0.462 0.607
FCN [82] 0.322 0.609
CrackForest [44] 0.231 0.104
FPHBN [78] 0.492 0.705
U-net [65] 0.752 0.897
U-HDN 0.783 0.928
U-HDN (only using AigleRN) 0.927 0.912
Figure 8.
Results of comparison of proposed U-HDN with other method based on public database
(From left to right: input image, ground truth, Canny, local threshold, FFA, MPS, structured prediction,
ensemble network, and proposed U-HDN).
Table 6. Crack detection results on AigleRN.
Methods Tolerance Margin Pr Re F1
Canny [35] 2 0.1989 0.6753 0.2881
Local thresholding [26] 2 0.5329 0.9345 0.667
FFA [43] 12 2 0.7688 0.6812 0.6817
MPS [42] 2 0.8263 0.841 0.8195
CrackForest [44] 2 0.8424 0.801 0.8233
CrackForest [44] 5 0.9028 0.8658 0.8839
Structed prediction [40] 2 0.9178 0.8812 0.8954
Method [67] 2 0.869 0.9304 0.8986
Ensemble network
(threshold =0.6) [57]2 0.9302 0.9266 0.9238
Ensemble network
(threshold =0.5) [57]2 0.9334 0.8879 0.9211
U-net [65] 2 0.9127 0.9076 0.91
U-net +HF 2 0.911 0.922 0.913
U-net +MDM 2 0.9138 0.9245 0.914
U-HDN 2 0.921 0.931 0.924
3.5. AigleRN Dataset Generalization
As reported above, the AigleRN database include 38 images (two types of resolution: 991
×
462
and 311
×
462). ESAR database (resolution 768
×
512) is collected by a statistic system, which contains
15 images. LCMS database includes 5 images. Because of having small number of images for these
databases, they are combined to obtain a new database, named AEL with in total 38 +15 +5=58
images. In Table 7, it is clear that proposed U-HDN achieves high performance compared with other
algorithms in terms of ODS and OIS.
Table 7. The ODS, and OIS of comparison methods on AEL.
Methods ODS OIS
HED [74] 0.042 0.626
RCF [64] 0.462 0.607
FCN [82] 0.322 0.609
CrackForest [44] 0.231 0.104
FPHBN [78] 0.492 0.705
U-net [65] 0.752 0.897
U-HDN 0.783 0.928
U-HDN (only using AigleRN) 0.927 0.912
Materials 2020,13, 2960 14 of 18
4. Conclusions
The analysis and survey of pavement crack plays an important role in the road and airport
pavement management system. In this project, the proposed U-HDN method can achieve a high
precision and accuracy for pavement crack detection. An MDM and HF module based on U-net are
developed in this paper. The MDM is able to obtain and extract feature maps of dierent context sizes
by dierent dilation rates. The HF module can obtain multi-scale (high-level and low-level) feature
maps, which can be integrated to predict pixel-wise crack detection at side output. By combining two
MDM and HF in the U-net, U-HDN can achieve a satisfactory performance.
Although the proposed U-HDN can obtain a satisfactory performance than other methods,
the neural network is a complicated structure which contains redundant feature maps and cause
computational cost and low eciency. These issues will be addressed in the future work.
In order to remove the redundant features maps, the channel pruning and automatically designing
neural network will be explored to improve the computational eciency and accuracy.
Some methods tend to research crack detection for static images. Actually, video streaming
detection also has a significant function for road cracks. Therefore, we will study this direction in
the future work.
We plan to propose a new method to address the cement concrete crack detection, evaluate the
global surface waterproofing and repair water-leakage cracks.
Due to F1 sensitivity to the pixel margin, it is not appropriate for author to compare the performance
segmentation algorithms that do not give all the details on the metric. Therefore, we will try
contact some authors to obtain the source codes and analyze them, followed by exploring and
constructing an integrated crack detection system.
Author Contributions:
Conceptualization, Z.F.; methodology, C.L.; software, Y.C., X.C., and J.W.; validation,
X.C.; formal analysis, C.L.; investigation, Y.C. and J.W.; resources, C.L.; data curation, C.L.; writing-original
draft preparation, C.L.; writing-review and editing, Z.F., P.D.M. and G.L.; visualization, C.L.; supervision, Z.F.,
P.D.M. and G.L.; project administration, Z.F., G.L. All authors have read and agreed to the published version of
the manuscript.
Funding:
This work was supported by the Science and Technology Planning Project of Guangdong Province
of China under grant 180917144960530, by the Project of Educational Commission of Guangdong Province of
China under grant 2017KZDXM032, by the State Key Lab of Digital Manufacturing Equipment and Technology
under grant DMETKF2019020, by the Project of Robot Automatic Design Platform combining Multi-Objective
Evolutionary Computation and Deep Neural Network under grant 2019A050519008, and by the China Scholarship
Council (CSC) in 2019.
Conflicts of Interest: The authors declare no conflict of interest.
References
1.
Di Mascio, P.; Moretti, L. Implementation of a pavement management system for maintenance and
rehabilitation of airport surfaces. Case Stud. Constr. Mater. 2019,11, e00251. [CrossRef]
2.
Bonin, G.; Polizzotti, S.; Loprencipe, G.; Folino, N.; Oliviero Rossi, C.; Teltayev, B.B. Development of a
road asset management system in kazakhstan. In Transport Infrastructure and Systems—Proceedings of the
AIIT International Congress on Transport Infrastructure and Systems, TIS 2017; CRC Press/Balkema: Leiden,
The Netherlands, 2017; pp. 537–545. ISBN 9781138030091.
3.
Shahin, M.Y. Pavement Management for Airports, Roads, and Parking Lots, 2nd ed.; Springer: New York, NY,
USA, 2005; ISBN 0387234640.
4.
Systems, P.; Management, P. Standard Practice for Roads and Parking Lots Pavement Condition Index
Surveys. ASTM Int. 2011,D6433, 49.
5.
Sayeed Ahmed, G.M.; Algahtani, A.; Mahmoud, E.R.I.; Badruddin, I.A. Experimental Evaluation of Interfacial
Surface Cracks in Friction Welded Dissimilar Metals through Image Segmentation Technique (IST). Materials
(Basel) 2018,11, 2460. [CrossRef] [PubMed]
Materials 2020,13, 2960 15 of 18
6.
Zou, Q.; Zhang, Z.; Li, Q.; Qi, X.; Wang, Q.; Wang, S. Deepcrack: Learning hierarchical convolutional features
for crack detection. IEEE Trans. Image Process. 2018,28, 1498–1512. [CrossRef] [PubMed]
7.
Vien, B.S.; Rose, L.R.F.; Chiu, W.K. Experimental and computational studies on the scattering of an edge-guided
wave by a hidden crack on a racecourse shaped hole. Materials (Basel) 2017,10, 732. [CrossRef] [PubMed]
8.
Sun, W.; Yao, B.; He, Y.; Chen, B.; Zeng, N.; He, W. Health state monitoring of bladed machinery with crack
growth detection in BFG power plant using an active frequency shift spectral correction method. Materials (Basel)
2017,10, 925. [CrossRef] [PubMed]
9.
Pantuso, A.; Loprencipe, G.; Bonin, G.; Teltayev, B.B. Analysis of pavement condition survey data for eective
implementation of a network level pavement management program for Kazakhstan. Sustainability
2019
,11,
901. [CrossRef]
10.
Loprencipe, G.; Pantuso, A. A Specified Procedure for Distress Identification and Assessment for Urban
Road Surfaces Based on PCI. Coatings 2017,7, 65. [CrossRef]
11.
Di Mascio, P.; Loprencipe, G.; Moretti, L. Technical and Economic Criteria to Select Pavement Surfaces of
Port Handling Plants. Coatings 2019,9, 126. [CrossRef]
12.
Farrar, C.R.; Doebling, S.W. Structural health monitoring at Los Alamos National Laboratory. In Proceedings
of the IEE Colloquium on Condition Monitoring: Machinery, External Structures and Health (Ref. No.
1999/034), Birmingham, UK, 22–23 April 1999; pp. 2/1–2/4.
13.
Sazonov, E.; Janoyan, K.; Jha, R. Wireless intelligent sensor network for autonomous structural health
monitoring. In Proceedings of the Smart Structures and Materials 2004: Smart Sensor Technology and
Measurement Systems, San Diego, CA, USA, 15–17 March 2004; Volume 5384, pp. 305–314.
14.
Sheng, W.; Chen, H.; Xi, N. Navigating a miniature crawler robot for engineered structure inspection.
IEEE Trans. Autom. Sci. Eng. 2008,5, 368–373. [CrossRef]
15.
Loprencipe, G.; Zoccali, P. Ride Quality Due to Road Surface Irregularities: Comparison of Dierent Methods
Applied on a Set of Real Road Profiles. Coatings 2017,7, 59. [CrossRef]
16.
Loprencipe, G.; Cantisani, G. Evaluation methods for improving surface geometry of concrete floors: A case
study. Case Stud. Struct. Eng. 2015,4, 14–25. [CrossRef]
17.
Moretti, L.; Di Mascio, P.; Loprencipe, G.; Zoccali, P. Theoretical analysis of stone pavers in pedestrian areas.
Transp. Res. Procedia 2020,45, 169–176. [CrossRef]
18.
Yu, S.-N.; Jang, J.-H.; Han, C.-S. Auto inspection system using a mobile robot for detecting concrete cracks in
a tunnel. Autom. Constr. 2007,16, 255–261. [CrossRef]
19.
Oh, J.K.; Jang, G.; Oh, S.; Lee, J.H.; Yi, B.J.; Moon, Y.S.; Lee, J.S.; Choi, Y. Bridge inspection robot system with
machine vision. Autom. Constr. 2009,18, 929–941. [CrossRef]
20.
Lim, R.S.; La, H.M.; Sheng, W. A robotic crack inspection and mapping system for bridge deck maintenance.
IEEE Trans. Autom. Sci. Eng. 2014,11, 367–378. [CrossRef]
21.
Li, Q.; Zhang, D.; Zou, Q.; Lin, H. 3D Laser imaging and sparse points grouping for pavement crack detection.
In Proceedings of the 2017 25th European Signal Processing Conference (EUSIPCO), Kos, Greece, 28 August–2
September 2017; pp. 2036–2040.
22.
Zou, Q.; Li, Q.; Zhang, F.; Xiong Qian Wang, Z.; Wang, Q. Path voting based pavement crack detection from
laser range images. Int. Conf. Digit. Signal Process. DSP 2016,0, 432–436.
23.
Fernandes, D.; Correia, P.L.; Oliveira, H. Road surface crack detection using a light field camera. In Proceedings
of the 2018 26th European Signal Processing Conference (EUSIPCO), Rome, Italy, 3–7 September 2018;
pp. 2135–2139.
24.
Zhou, J.; Huang, P.S.; Chiang, F.-P. Wavelet-based pavement distress detection and evaluation. Opt. Eng.
2006,45, 27007. [CrossRef]
25.
Subirats, P.; Dumoulin, J.; Legeay, V.; Barba, D. Automation of pavement surface crack detection using the
continuous wavelet transform. In Proceedings of the International Conference on Image Processing (ICIP),
Atlanta, GA, USA, 8–11 October 2006; pp. 3037–3040.
26.
Oliveira, H.; Correia, P.L. Automatic road crack segmentation using entropy and image dynamic thresholding.
In Proceedings of the European Signal Processing Conference, Glasgow, UK, 24–28 August 2009; pp. 622–626.
27.
Tang, J.; Gu, Y. Automatic crack detection and segmetnation using a hybrid algorithm for road distress
analysis. In Proceedings of the 2013 IEEE International Conference on Systems, Man, and Cybernetics (SMC),
Manchester, UK, 13–16 October 2013; pp. 3026–3030.
Materials 2020,13, 2960 16 of 18
28.
Li, Q.; Liu, X. Novel approach to pavement image segmentation based on neighboring dierence histogram
method. In Proceedings of the 2008 Congress on Image and Signal Processing, Sanya, China, 27–30 May
2008; Volume 2, pp. 792–796.
29.
Oliveira, H.; Correia, P.L. CrackIT—An image processing toolbox for crack detection and characterization.
In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30
October 2014; pp. 798–802.
30.
Oliveira, H.; Correia, P.L. Automatic road crack detection and characterization. IEEE Trans. Intell. Transp. Syst.
2012,14, 155–168. [CrossRef]
31.
Oliveira, H.; Correia, P.L. Road surface crack Detection: Improved segmentation with pixel-based refinement.
In Proceedings of the 2017 25th European Signal Processing Conference (EUSIPCO), Kos, Greece, 28 August–2
September 2017; pp. 2026–2030.
32.
Kapela, R.; ´
Sniatała, P.; Turkot, A.; Rybarczyk, A.; Po˙zarycki, A.; Rydzewski, P.; Wyczałek, M.; Błoch, A.
Asphalt surfaced pavement cracks detection based on histograms of oriented gradients. In Proceedings of
the 2015 22nd International Conference Mixed Design of Integrated Circuits & Systems (MIXDES), Torun,
Poland, 25–27 June 2015; pp. 579–584.
33. Hu, Y.; Zhao, C. A novel LBP based methods for pavement crack detection. J. Pattern Recognit. Res. 2010,5,
140–147. [CrossRef]
34.
Quintana, M.; Torres, J.; Men
é
ndez, J.M. A simplified computer vision system for road surface inspection
and maintenance. IEEE Trans. Intell. Transp. Syst. 2015,17, 608–619. [CrossRef]
35.
Zhao, H.; Qin, G.; Wang, X. Improvement of canny algorithm based on pavement edge detection.
In Proceedings of the 2010 3rd International Congress on Image and Signal Processing (CISP), Yantai,
China, 16–18 October 2010; Volume 2, pp. 964–967.
36.
Attoh-Okine, N.; Ayenu-Prah, A. Evaluating pavement cracks with bidimensional empirical mode
decomposition. EURASIP J. Adv. Signal Process. 2008,2008, 1–7.
37.
Maode, Y.; Shaobo, B.; Kun, X.; Yuyao, H. Pavement crack detection and analysis for high-grade highway.
In Proceedings of the 2007 8th International Conference on Electronic Measurement and Instruments, Xi’an,
China, 16–18 August 2007; pp. 4–548.
38.
Kaul, V.; Yezzi, A.; Tsai, Y. Detecting curves with unknown endpoints and arbitrary topology using minimal
paths. IEEE Trans. Pattern Anal. Mach. Intell. 2011,34, 1952–1965. [CrossRef] [PubMed]
39.
Baltazart, V.; Nicolle, P.; Yang, L. Ongoing Tests and Improvements of the MPS algorithm for the automatic
crack detection within grey level pavement images. In Proceedings of the 2017 25th European Signal
Processing Conference (EUSIPCO), Kos, Greece, 28 August–2 September 2017; pp. 2016–2020.
40.
Nguyen, T.S.; Begot, S.; Duculty, F.; Avila, M. Free-form anisotropy: A new method for crack detection
on pavement surface images. In Proceedings of the International Conference on Image Processing (ICIP),
Brussels, Belgium, 11–14 September 2011; pp. 1069–1072.
41.
Amhaz, R.; Chambon, S.; Idier, J.; Baltazart, V. Automatic Crack Detection on Two-Dimensional Pavement Images:
An Algorithm Based on Minimal Path Selection. IEEE Trans. Intell. Transp. Syst. 2016,17, 2718–2729. [CrossRef]
42.
Kass, M.; Witkin, A.; Terzopoulos, D. Snakes: Active contour models. Int. J. Comput. Vis.
1988
,1, 321–331. [CrossRef]
43.
Wang, C.; Wang, X.; Zhou, X.; Li, Z. The Aircraft Skin Crack Inspection Based on Dierent-Source Sensors
and Support Vector Machines. J. Nondestruct. Eval. 2016,35, 46. [CrossRef]
44.
Shi, Y.; Cui, L.; Qi, Z.; Meng, F.; Chen, Z. Automatic road crack detection using random structured forests.
IEEE Trans. Intell. Transp. Syst. 2016,17, 3434–3445. [CrossRef]
45.
Hassoun, M.H.Fundamentals of ArtificialNeural Networks; MIT Press: Cambridge, MA, USA, 1995; ISBN 9780262082396.
46.
Adeli, H. Neural networks in civil engineering: 1989–2000. Comput. Civ. Infrastruct. Eng.
2001
,16, 126–142.
[CrossRef]
47.
Adeli, H.; Hung, S.L. Machine Learning-Neural Networks, Genetic Algorithms and Fuzzy Systems. Kybernetes
1999,28, 317–318. [CrossRef]
48.
Adeli, H.; Karim, A. Neural network model for optimization of cold-formed steel beams. J. Struct. Eng.
1997
,
123, 1535–1543. [CrossRef]
49.
Adeli, H.; Samant, A. An adaptive conjugate gradient neural network–wavelet model for trac incident
detection. Comput. Civ. Infrastruct. Eng. 2000,15, 251–260. [CrossRef]
50.
Adeli, H.; Yeh, C. Perceptron learning in engineering design. Comput. Civ. Infrastruct. Eng.
1989
,4, 247–256.
[CrossRef]
Materials 2020,13, 2960 17 of 18
51.
Gao, Y.; Mosalam, K.M. Deep transfer learning for image-based structural damage recognition. Comput. Civ.
Infrastruct. Eng. 2018,33, 748–768. [CrossRef]
52.
Zhang, L.; Yang, F.; Zhang, Y.D.; Zhu, Y.J. Road crack detection using deep convolutional neural network.
In Proceedings of the 2016 IEEE international conference on image processing (ICIP), Phoenix, AZ, USA,
25–28 September 2016; pp. 3708–3712.
53.
Zhang, A.; Wang, K.C.P.; Li, B.; Yang, E.; Dai, X.; Peng, Y.; Fei, Y.; Liu, Y.; Li, J.Q.; Chen, C. Automated
Pixel-Level Pavement Crack Detection on 3D Asphalt Surfaces Using a Deep-Learning Network. Comput. Civ.
Infrastruct. Eng. 2017,32, 805–819. [CrossRef]
54.
Zhang, A.; Wang, K.C.P.; Fei, Y.; Liu, Y.; Chen, C.; Yang, G.; Li, J.Q.; Yang, E.; Qiu, S. Automated Pixel-Level
Pavement Crack Detection on 3D Asphalt Surfaces with a Recurrent Neural Network. Comput. Civ. Infrastruct. Eng.
2019,34, 213–229. [CrossRef]
55.
Cha, Y.J.; Choi, W.; Büyüköztürk, O. Deep Learning-Based Crack Damage Detection Using Convolutional
Neural Networks. Comput. Civ. Infrastruct. Eng. 2017,32, 361–378. [CrossRef]
56.
Fan, Z.; Wu, Y.; Lu, J.; Li, W. Automatic Pavement Crack Detection Based on Structured Prediction with the
Convolutional Neural Network. arXiv 2018, arXiv:1802.02208.
57.
Fan, Z.; Li, C.; Chen, Y.; Mascio, P.D.; Chen, X.; Zhu, G.; Loprencipe, G. Ensemble of Deep Convolutional Neural
Networks for Automatic Pavement Crack Detection and Measurement. Coatings 2020,10, 152. [CrossRef]
58.
Maeda, H.; Sekimoto, Y.; Seto, T.; Kashiyama, T.; Omata, H. Road damage detection and classification using
deep neural networks with smartphone images. Comput. Civ. Infrastruct. Eng. 2018,33, 1127–1141. [CrossRef]
59.
Cha, Y.J.; Choi, W.; Suh, G.; Mahmoudkhani, S.; Büyüköztürk, O. Autonomous Structural Visual Inspection
Using Region-Based Deep Learning for Detecting Multiple Damage Types. Comput. Civ. Infrastruct. Eng.
2018,33, 731–747. [CrossRef]
60.
Yang, X.; Li, H.; Yu, Y.; Luo, X.; Huang, T.; Yang, X. Automatic Pixel-Level Crack Detection and Measurement
Using Fully Convolutional Network. Comput. Civ. Infrastruct. Eng. 2018,33, 1090–1109. [CrossRef]
61.
Li, Y.; Han, Z.; Xu, H.; Liu, L.; Li, X.; Zhang, K. YOLOv3-lite: A lightweight crack detection network for
aircraft structure based on depthwise separable convolutions. Appl. Sci. 2019,9, 3781. [CrossRef]
62.
Jenkins, M.D.; Carr, T.A.; Iglesias, M.I.; Buggy, T.; Morison, G. A deep convolutional neural network for
semantic pixel-wise segmentation of road and pavement surface cracks. In Proceedings of the 2018 26th
European Signal Processing Conference (EUSIPCO), Rome, Italy, 3–7 September 2018; pp. 2120–2124.
63.
Tsuchiya, H.; Fukui, S.; Iwahori, Y.; Hayashi, Y.; Achariyaviriya, W.; Kijsirikul, B. A method of data augmentation
for classifying road damage considering influence on classification accuracy. Procedia Comput. Sci.
2019
,159,
1449–1458. [CrossRef]
64.
Liu, Y.; Cheng, M.-M.; Hu, X.; Wang, K.; Bai, X. Richer convolutional features for edge detection. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017;
pp. 3000–3009.
65.
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation.
In Proceedings of the Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial
Intelligence and Lecture Notes in Bioinformatics), Munich, Germany, 5–9 October 2015; Springer: Cham,
Switzerland, 2015; Volume 9351, pp. 234–241.
66.
Chen, L.-C.; Papandreou, G.; Schro, F.; Adam, H. Rethinking atrous convolution for semantic image
segmentation. arXiv 2017, arXiv:1706.05587.
67.
Holschneider, M.; Kronland-Martinet, R.; Morlet, J.; Tchamitchian, P. A real-time algorithm for signal analysis
with the help of the wavelet transform. In Wavelets; Springer: Berlin/Heidelberg, Germany, 1990; pp. 286–297.
68.
Yu, F.; Koltun, V.; Funkhouser, T. Dilated residual networks. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 472–480.
69.
Lee, C.-Y.; Xie, S.; Gallagher, P.; Zhang, Z.; Tu, Z. Deeply-supervised nets. In Proceedings of the Artificial
Intelligence and Statistics, San Diego, CA, USA, 9–12 May 2015; pp. 562–570.
70.
Nair, V.; Hinton, G.E. Rectified linear units improve Restricted Boltzmann machines. In Proceedings of the
27th International Conference on Machine Learning (ICML), Haifa, Israel, 21–24 June 2010; pp. 807–814.
71.
Britz, D. Understanding Convolutional Neural Networks for NLP—WildML. Available online:
http://www.wildml.com/2015/11/understanding-convolutional-neural-networks- for-nlp/%0Ahttps:
//www.kdnuggets.com/2015/11/understanding-convolutional-neural-networks-nlp.html/3(accessed on 16
June 2020).
Materials 2020,13, 2960 18 of 18
72.
Nam, J.; Kim, J.; Loza Menc
í
a, E.; Gurevych, I.; Fürnkranz, J. Large-scale multi-label text
classification—Revisiting neural networks. In Proceedings of the Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Nacy,
France, 15–19 September 2014; Springer: Cham, Switzerland, 2014; Volume 8725, pp. 437–452.
73. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015,521, 436–444. [CrossRef] [PubMed]
74.
Xie, S.; Tu, Z. Holistically-nested edge detection. In Proceedings of the IEEE International Conference on
Computer Vision, Las Condes, Chile, 11–18 December 2015; pp. 1395–1403.
75.
Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A.
Automatic differentiation in pytorch. In Proceedings of the NIPS-W, Long Beach, CA, USA, 4–9 December 2017.
76.
Amhaz, R.; Chambon, S.; Idier, J.; Baltazart, V. Automatic Crack Detection on 2D Pavement Images:
An Algorithm Based on Minimal Path Selection. Available online: https://www.irit.fr/~{}Sylvie.Chambon/
Crack_Detection_Database.html (accessed on 23 June 2020).
77.
Powers, D.M.W. Ailab Evaluation: From precision, recall and F-measure to ROC, informedness, markedness
and correlation. Inf. Markedness Correl. 2011,2, 37–63.
78.
Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling, H. Feature pyramid and hierarchical boosting
network for pavement crack detection. IEEE Trans. Intell. Transp. Syst. 2019,21, 1525–1535. [CrossRef]
79.
Ai, D.; Jiang, G.; Siew Kei, L.; Li, C. Automatic Pixel-Level Pavement Crack Detection Using Information of
Multi-Scale Neighborhoods. IEEE Access 2018,6, 24452–24463. [CrossRef]
80.
König, J.; Jenkins, M.D.; Barrie, P.; Mannion, M.; Morison, G. A convolutional neural network for pavement
surface crack segmentation using residual connections and attention gating. In Proceedings of the 2019 IEEE
International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1460–1464.
81.
Li, H.; Song, D.; Liu, Y.; Li, B. Automatic pavement crack detection by multi-scale image fusion. IEEE Trans.
Intell. Transp. Syst. 2018,20, 2025–2036. [CrossRef]
82.
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015;
pp. 3431–3440.
©
2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).
... Recent work has facilitated automated classification, localization, and quantification of structural defects from image data (Cha et al., 2017;Chen and Jahanshahi, 2017;Cha et al., 2018;Attard et al., 2019;Liu et al., 2019;Li et al., 2020Li et al., , 2024a. DL-based techniques have shown promise in detecting cracks in buildings (Perez et al., 2019;Jiang et al., 2021), bridges (Dais et al., 2021;Hallee et al., 2021;Loverdos and Sarhosis, 2022), tunnels (Liao et al., 2022a;Protopapadakis et al., 2019), and roads (Fan et al., 2020). For example, recent studies reported the successful detection of cracks with widths greater than ≤ 1 mm (Liao et al., 2022b;Mohammadi et al., 2019). ...
Preprint
Full-text available
Ageing structures require periodic inspections to identify structural defects. Previous work has used geometric distortions to locate cracks in synthetic masonry bridge point clouds but has struggled to detect small cracks. To address this limitation, this study proposes a novel 3D multimodal feature, 3DMulti-FPFHI, which combines a customized Fast Point Feature Histogram (FPFH) with an intensity feature. This feature is integrated into the PatchCore anomaly detection algorithm and evaluated through statistical and parametric analyses. The method is further evaluated using point clouds of a real masonry arch bridge and a full-scale experimental model of a concrete tunnel. Results show that the 3D intensity feature enhances inspection quality by improving crack detection; it also enables the identification of water ingress which introduces intensity anomalies. The 3DMulti-FPFHI outperforms FPFH and a state-of-the-art multimodal anomaly detection method. The potential of the method to address diverse infrastructure anomaly detection scenarios is highlighted by the minimal requirements for data compared to learning-based methods. The code and related point cloud dataset are available at https://github.com/Jingyixiong/3D-Multi-FPFHI.
... Recent work has facilitated automated classification, localization, and quantification of structural defects from image data (Cha et al., 2017;Chen and Jahanshahi, 2017;Cha et al., 2018;Attard et al., 2019;Liu et al., 2019;Li et al., 2020Li et al., , 2024a. DL-based techniques have shown promise in detecting cracks in buildings (Perez et al., 2019;Jiang et al., 2021), bridges (Dais et al., 2021;Hallee et al., 2021;Loverdos and Sarhosis, 2022), tunnels (Liao et al., 2022a;Protopapadakis et al., 2019), and roads (Fan et al., 2020). For example, recent studies reported the successful detection of cracks with widths greater than ≤ 1 mm (Liao et al., 2022b;Mohammadi et al., 2019). ...
Preprint
Ageing structures require periodic inspections to identify structural defects. Previous work has used geometric distortions to locate cracks in synthetic masonry bridge point clouds but has struggled to detect small cracks. To address this limitation, this study proposes a novel 3D multimodal feature, 3DMulti-FPFHI, that combines a customized Fast Point Feature Histogram (FPFH) with an intensity feature. This feature is integrated into the PatchCore anomaly detection algorithm and evaluated through statistical and parametric analyses. The method is further evaluated using point clouds of a real masonry arch bridge and a full-scale experimental model of a concrete tunnel. Results show that the 3D intensity feature enhances inspection quality by improving crack detection; it also enables the identification of water ingress which introduces intensity anomalies. The 3DMulti-FPFHI outperforms FPFH and a state-of-the-art multimodal anomaly detection method. The potential of the method to address diverse infrastructure anomaly detection scenarios is highlighted by the minimal requirements for data compared to learning-based methods. The code and related point cloud dataset are available at https://github.com/Jingyixiong/3D-Multi-FPFHI.
... Recently, ED-CNNs have been developed for semantic image segmentation [18]. Motivated by these accomplishments, numerous recent investigations have devised ED-CNN-based models aimed at automatic semantic segmentation of concrete cracks [19,20]. ...
Article
Full-text available
The longevity and safety of concrete precast crane beams significantly impact the operational integrity of industrial infrastructure. Assessment of surface cracks development in concrete structural elements during laboratory tests is performed mainly by applying standard tools such as linear-variable-differential transformers and strain gauges. This paper presents a novel assessment methodology combining deep convolutional neural network for image segmentation with digital image correlation method to evaluate the structural health of precast crane beams after more than fifty years of service. The study first outlines the adaptation of the deep learning U-Net architecture for detecting and segmentation of surface cracks in crane beams. Concurrently, DIC technique is employed to measure surface strains and displacements under load. The integration of these technologies enables a non-destructive, accurate, and detailed analysis, facilitating early detection of deterioration that may compromise structural safety. Initial results from field tests validate the effectiveness of our approach, demonstrating its potential as a tool for predictive maintenance of aging industrial infrastructure.
... The results demonstrated that the proposed mdoel outperformed Faster R-CNN in terms of both accuracy and inference speed. In recent studies, new deep learning models have also been proposed for crack detection, in particular, I-UNet [42], CrackU-Net [43], U-CliqueNet [44], SCHNet [45], U-HDN [46], and feature pyramid and hierarchical boosting network (FPHBN) [47]. ...
Article
Full-text available
Roadway distress detection is essential for ensuring a safe and comfortable driving environment. However, given the irregular shape, small area size, and occasionally very large number, of the road distress objects, it is often laborious to label the distress instances during the training process under the fully supervised algorithm. To address this issue, the study strives to apply semi-supervised learning for distress detection that claims to reduce the cost associated with the labeling process, while maintaining or even improving the learning accuracy in some situations. The research features three distinct backbones of Mask R-CNN models, Unmanned Aerial System imagery of two resolutions, three levels of pseudo-labeled data, eleven threshold values and two types of assessment (that is, in-resolution and out-of-resolution). The results demonstrate that semi-supervised Mask R-CNN models are effective in detecting road distress. Nonetheless, the sensitive analysis is recommended in the future research to identify the optimal pseudo ratio that could generate the highest prediction accuracy.
... And inserted a spatial focus to reuse features. 9. UHDN [31]: UHDN propose an encoder-decoder architecture with hierarchical feature learning and dilated convolution and design hierarchical feature learning module. 10. ...
Article
Full-text available
Roads frequently experience cracks. It adversely impact the safe passage of vehicles and pedestrians, and have the potential to alter the road’s structure. To address this issue, we propose a novel crack detection network. The network constructs multi-channel attention and enhanced information interaction mechanisms to capture more granular semantic information. In our network, each convolutional layer is followed by a convolution combining asymmetric convolutions and criss-cross attention to enhance the feature maps post-convolution. This is followed by spatial and channel reconstruction convolutions and shuffle attention to optimize the generated side-output features. By extensively mining features from the deep network and ingeniously integrating bottom-level and top-level features through a new feature fusion module. The network achieves precise crack prediction results. Extensive experiments on the general-purpose crack image datasets Crack500, CFD and DeepCrack demonstrate the model’s effectiveness. In these three datasets, F1-score values of 0.734, 0.635, and 0.881, MIoU values of 0.773, 0.726 and 0.888.
Article
Cracks pose a persistent challenge in the design and construction of rural cement concrete roads. Any factor associated with the design, construction, operation, or external environment can lead to cracks in general transportation projects. Specifically, issues related to maintenance during the construction phase and vehicles exceeding weight limits during the operation phase are noteworthy concerns deserving attention. This study aims to evaluate potential causes of cracks from the design to the operation stage and develop a model to assess the likelihood of transverse crack occurrence. A literature review and expert input were utilized to pinpoint the research problem. Data were collected via a structured questionnaire distributed to stakeholders. Using the frequency index (FI), severity index (SI), and importance index (IMP.I), five primary causes of cracks in rural road projects have been pinpointed. These influencing causes are categorized into four main groups with factor loadings exceeding 0.5 and extracted variances exceeding 50%: construction-related, design-related, maintenance-related, and environmental-related factors. The binary regression model accurately predicts the likelihood of cracks occurring with an 86.8% success rate. Additionally, a case study validates the high suitability of this model. The primary causes identified in this study provide valuable insights for assessing crack formation. The outcomes of constructing a predictive model for transverse crack occurrence based on these causes are highly dependable and applicable in practical settings.
Article
Full-text available
This paper compared two approaches used to analyze a modular pedestrian pavement made of hexagonal basalt pavers. In presence of occasional heavy traffic roads, the pavement should be verified using methods currently used for road pavements. Different loading conditions were examined varying the geometry of the blocks, and the magnitude of the vertical load. In all cases, the results obtained from the analytical theory of Westergaard were higher than those obtained from a finite element model (FEM). Therefore, a parametric study was performed in order to use the analytical method as an alternative to the costly FEM approach. The results of comparison gave a correction factor, valid for hexagonal pavers: it permits to analytically estimate with good approximation the stresses induced by heavy loads applied to natural stone blocks.
Article
Full-text available
Automated pavement crack detection and measurement are important road issues. Agencies have to guarantee the improvement of road safety. Conventional crack detection and measurement algorithms can be extremely time-consuming and low efficiency. Therefore, recently, innovative algorithms have received increased attention from researchers. In this paper, we propose an ensemble of convolutional neural networks (without a pooling layer) based on probability fusion for automated pavement crack detection and measurement. Specifically, an ensemble of convolutional neural networks was employed to identify the structure of small cracks with raw images. Secondly, outputs of the individual convolutional neural network model for the ensemble were averaged to produce the final crack probability value of each pixel, which can obtain a predicted probability map. Finally, the predicted morphological features of the cracks were measured by using the skeleton extraction algorithm. To validate the proposed method, some experiments were performed on two public crack databases (CFD and AigleRN) and the results of the different state-of-the-art methods were compared. To evaluate the efficiency of crack detection methods, three parameters were considered: precision (Pr), recall (Re) and F1 score (F1). For the two public databases of pavement images, the proposed method obtained the highest values of the three evaluation parameters: for the CFD database, Pr = 0.9552, Re = 0.9521 and F1 = 0.9533 (which reach values up to 0.5175 higher than the values obtained on the same database with the other methods), for the AigleRN database, Pr = 0.9302, Re = 0.9166 and F1 = 0.9238 (which reach values up to 0.7313 higher than the values obtained on the same database with the other methods). The experimental results show that the proposed method outperforms the other methods. For crack measurement, the crack length and width can be measure based on different crack types (complex, common, thin, and intersecting cracks.). The results show that the proposed algorithm can be effectively applied for crack measurement.
Article
Full-text available
This paper proposes a method for augmenting learning data of road damage dataset considering the influence of the augmented data on classification accuracy. Data augmentation is a very important task in the field of machine learning because more learning data causes increasing the accuracy of classification accuracy in general. The quality of the augmented data influences the accuracy of the classification. Effective data augmentation method for increasing classification accuracy is needed. The proposed method generates learning data by selecting effective data augmentation methods depending on the class of road damage. The method uses You Only Look Once v3 (YOLOv3) for detection and classification of road damage in an image. It is tuned by data adding the data augmented by the proposed method to the road damage dataset presented to the public. The experimental results show that the proposed method can increase the accuracy efficiently and effectively. The proposed selection of data augmentation methods improves remarkably mean Average Precision (mAP) which is one of the accuracy indices.
Article
Full-text available
Due to the high proportion of aircraft faults caused by cracks in aircraft structures, crack inspection in aircraft structures has long played an important role in the aviation industry. The existing approaches, however, are time-consuming or have poor accuracy, given the complex background of aircraft structure images. In order to solve these problems, we propose the YOLOv3-Lite method, which combines depthwise separable convolution, feature pyramids, and YOLOv3. Depthwise separable convolution is employed to design the backbone network for reducing parameters and for extracting crack features effectively. Then, the feature pyramid joins together low-resolution, semantically strong features at a high-resolution for obtaining rich semantics. Finally, YOLOv3 is used for the bounding box regression. YOLOv3-Lite is a fast and accurate crack detection method, which can be used on aircraft structure such as fuselage or engine blades. The result shows that, with almost no loss of detection accuracy, the speed of YOLOv3-Lite is 50% more than that of YOLOv3. It can be concluded that YOLOv3-Lite can reach state-of-the-art performance.
Article
Full-text available
Pavement crack detection is a critical task for insuring road safety. Manual crack detection is extremely time-consuming. Therefore, an automatic road crack detection method is required to boost this progress. However, it remains a challenging task due to the intensity inhomogeneity of cracks and complexity of the background, e.g., the low contrast with surrounding pavements and possible shadows with a similar intensity. Inspired by recent advances of deep learning in computer vision, we propose a novel network architecture, named feature pyramid and hierarchical boosting network (FPHBN), for pavement crack detection. The proposed network integrates context information to low-level features for crack detection in a feature pyramid way, and it balances the contributions of both easy and hard samples to loss by nested sample reweighting in a hierarchical way during training. In addition, we propose a novel measurement for crack detection named average intersection over union (AIU). To demonstrate the superiority and generalizability of the proposed method, we evaluate it on five crack datasets and compare it with the state-of-the-art crack detection, edge detection, and semantic segmentation methods. The extensive experiments show that the proposed method outperforms these methods in terms of accuracy and generalizability. Code and data can be found in https://github.com/fyangneil/pavement-crack-detection.
Article
Full-text available
A port is an intermodal system in which many logistics activities requiring properly constructed areas occur. The large extension of these areas poses a major problem in choosing materials with technical and economic implications. Choice and design of pavements are directly related to the port handling systems and procedures for the disposal of the cargo units. The paper presents the design and verification procedures for three equivalent pavements for a handling pavement in an Italian medium-sized port trafficked by reach stackers moving containers. An asphalt pavement, a concrete pavement, and a concrete block pavement have been considered during the 20-year service life. Empirical and analytical methods have been adopted to design and verify the pavements. The structures have been examined in terms of economic concerns during the overall service life, considering both construction and maintenance costs, in order to determine the most cost-effective option. The results demonstrate the inappropriateness of asphalt pavement, in the examined case, from a construction costs point of view. Furthermore, the overall discounted costs show an inversion of convenience between block concrete pavement and cast in situ concrete: the latter is the cheaper solution. The proposed methodology can balance often conflicting objectives in matters of durability and funds management, providing answers to a complex topic.
Article
Full-text available
Pavement roads and transportation systems are crucial assets for promoting political stability, as well as economic and sustainable growth in developing countries. However, pavement maintenance backlogs and the high capital costs of road rehabilitation require the use of pavement evaluation tools to assure the best value of the investment. This research presents a methodology for analyzing the collected pavement data for the implementation of a network level pavement management program in Kazakhstan. This methodology, which could also be suitable in other developing countries’ road networks, focuses on the survey data processing to determine cost-effective maintenance treatments for each road section. The proposed methodology aims to support a decision-making process for the application of a strategic level business planning analysis, by extracting information from the survey data.
Article
Airport pavements should satisfy safe and regular aircraft operations; thus, it is necessary to monitor these surfaces and implement expensive maintenance and rehabilitation works. The Airport Pavement Management System (APMS) is an approach to monitor the pavement condition and to determine the priorities for intervention, to plan, and to allocate resources through procedures. The method for monitoring pavement conditions is currently adopted by the airport management company because it is necessary to the airport operability. The study focuses on the paved network of an Italian airport that is composed of a runway, a parallel taxiway and five exit taxiways. Measures of load bearing capacity, transversal and longitudinal evenness, pavement-tire adherence, and pavement distresses were collected and merged to identify the needed maintenance and rehabilitation works. The results revealed the presence of critical sections, where several structural and functional distresses were. The needed structural and functional works involved the greater parts of the runway and the parallel taxiway, and two exit taxiways. Given the high operational impact of the needed works, they were scheduled to be conducted within three phases in order to minimize the impact on the traffic, reducing the closure period to 15 consecutive days. In general, the results summarized approaches typical of different conditions: the article has highlighted that the Pavement Management System (PMS) requires multiple analyses to consider various indices and correctly manage existing pavements having different competences to conduct comprehensive and appropriate analyses. Keywords: Airport pavement, Pavement management system, ACN, PCN, IRI, PCI, Rutting