ArticlePDF Available

Automatic Detection of Road Cracks using EfficientNet with Residual U-Net-based Segmentation and YOLOv5-based Detection

Authors:

Abstract and Figures

The main factor affecting road performance is pavement damage. One of the difficulties in maintaining roads is pavement cracking. Credible and reliable inspection of heritage structural health relies heavily on crack detection on road surfaces. To achieve intelligent operation and maintenance, intelligent crack detection is essential to traffic safety. The detection of road pavement cracks using computer vision has gained popularity in recent years. Recent technological breakthroughs in general deep learning algorithms have resulted in improved results in the discipline of crack detection. In this paper, two techniques for object identification and segmentation are proposed. The EfficientNet with residual U-Net technique is suggested for segmentation, while the YOLO v5 algorithm is offered for crack detection. To correctly separate the pavement cracks, a crack segmentation network is used. Road crack identification and segmentation accuracy were enhanced by optimising the model's hyperparameters and increasing the feature extraction structure. The suggested algorithm's performance is compared to state-of-the-art algorithms. The suggested work achieves 99.35% accuracy.
Content may be subject to copyright.
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 11 Issue: 4s
DOI: https://doi.org/10.17762/ijritcc.v11i4s.6310
Article Received: 24 December 2022 Revised: 26 January 2023 Accepted: 02 February 2023
___________________________________________________________________________________________________________________
84
IJRITCC | March 2023, Available @ http://www.ijritcc.org
Automatic Detection of Road Cracks using
EfficientNet with Residual U-Net-based
Segmentation and YOLOv5-based Detection
Satheesh Kumar Gooda1*, Narender Chinthamu2, Dr. S. Tamil Selvan3, Dr. V. Rajakumareswaran4, Gokila Brindha
Paramasivam5
1Senior Manager, WESCO International, USA, Student of Osmania University, India.
e-mail: skgooda@gmail.com
2Enterprise Architect, MIT CTO Candidate
e-mail: Narender.chinthamu@gmail.com
3Assistant Professor, Department of Computer Science and Engineering,
Erode Sengunthar Engineering College,
Thudupathi, Perundurai - 638 057, India.
e-mail: Stamilselvancse@gmail.com
4Assistant Professor, Department Of Computer Science and Engineering,
Erode Sengunthar Engineering College, Thuduppathi, Tamil Nadu, India,
e-mail: mailtoraja@gmail.com
5Assistant Professor Sr.G, Department of Computer Technology UG,
Kongu Engineering College,
Perundurai, Tamil Nadu-638060, India.
Email: brindha.ctug@kongu.edu
Abstract The main factor affecting road performance is pavement damage. One of the difficulties in maintaining roads is pavement cracking.
Credible and reliable inspection of heritage structural health relies heavily on crack detection on road surfaces. To achieve intelligent operation
and maintenance, intelligent crack detection is essential to traffic safety. The detection of road pavement cracks using computer vision has
gained popularity in recent years. Recent technological breakthroughs in general deep learning algorithms have resulted in improved results in
the discipline of crack detection. In this paper, two techniques for object identification and segmentation are proposed. The EfficientNet with
residual U-Net technique is suggested for segmentation, while the YOLO v5 algorithm is offered for crack detection. To correctly separate the
pavement cracks, a crack segmentation network is used. Road crack identification and segmentation accuracy were enhanced by optimising the
model's hyperparameters and increasing the feature extraction structure. The suggested algorithm's performance is compared to state-of-the-art
algorithms. The suggested work achieves 99.35% accuracy.
Keywords- Crack, Segmentation, Object detection, Deep learning, EfficientNet, U-Net, YOLO v5.
I. INTRODUCTION
Detecting crack damage in industrial and civil
constructions has long been an issue. Manual and machine
detection methods are used in traditional crack detecting
technologies. The consensus is that manual detection takes
longer and is less reliable. In recent years, the technology for
machine detection methods based on ultrasonic, microwaves,
or other signals has improved rapidly [1-3]. Due to these
restrictions, both commercial and academic institutions have
been researching autonomous crack detection techniques [4].
Due to the widespread availability of smartphones and
cameras, image-based approaches are deemed to be greatly
cost-effective [5].
The effectiveness of computer vision methods has been
demonstrated in automating the image-based crack detection
approach, and their use has become a research problem in
recent decades [6]. As a result, image-based crack localization
investigations are broadly classified as manual or automatic
feature extraction-based methodologies. A computer vision
method for crack detection begins by identifying crack
sensitive features, which can be accomplished using deep
learning approaches or image processing techniques (IPTs).
The employment of techniques like edge detectors,
morphological processes, and thresholding was the subject of
early study.
In addition to noise, different illumination conditions can
affect the techniques for manually identifying cracks based on
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 11 Issue: 4s
DOI: https://doi.org/10.17762/ijritcc.v11i4s.6310
Article Received: 24 December 2022 Revised: 26 January 2023 Accepted: 02 February 2023
___________________________________________________________________________________________________________________
85
IJRITCC | March 2023, Available @ http://www.ijritcc.org
feature extraction. Due to this problem, deep architectures are
now used for crack detection problems that don't require
custom characteristics [7]. A deep architecture uses several
deep layers to extract high-level properties from raw inputs,
making it the "next generation" of neural networks. Objects of
interest in images may be found by using bounding boxes in
computer vision. The sliding window method was found to be
less accurate when it came to localizing and recognizing items
in images.
In the crack detection section, bounding boxes are used to
detect crack areas on input photographs [9]. Computer vision
approaches used during implementation of OR include region-
based convolutional neural networks (R-CNNs) [9], you only
look once (YOLO) [10], and single shot detectors (SSDs) [11].
Crack detection frequently uses members of the R-CNN
family. SSD and YOLO designs have only been used once as
primary frameworks for crack detection, according to the
authors. Identifying cracks in photographs may be viewed as
an object detection and classification issue. As a consequence,
a deep learning-based model might be used to detect surface
crack faults in pavement and bridges. To build up an automatic
crack detection system, four steps must be completed: image
collection, image pre-processing, image segmentation, and
crack detection. The contribution of this research is,
For object detection, YOLOv5 model is used in this
work.
An EfficientNet with residual U-Net is used to segment
the cracks in the road images.
The organization of the work is as follows. Section 2
presents the related works, Section 3 describes the
methodology, Section 4 discusses the results and Section 5
concludes the work.
II. RELATED WORKS
Numerous studies have concentrated on using a multi-class
classification strategy to solve the issue, taking both crack
identification and crack type categorization into consideration
[1214]. According to Park et al. [14], this multi-class
classification technique based on CNN was used to categorize
road images into crack, intact regions and road markers. The
study categorized crack types into five types which were
influenced by the AlexNet and LeNet networks as well as
assessed and compared four CNNs with varied depths. Using
deep architectures and handcrafted features as the main criteria
for crack investigation, [15,16] compared crack investigate
techniques. Deep architectures were compared to the
effectiveness of a number of edge detectors, including Sobel,
Canny, Prewitt, Butterworth, and others.
A comparison between Hessian matrix and Haar wavelet
accelerated robust features technique and convolutional neural
network extraction for crack recognition was conducted by
Kim et al. [17]. In [18], features were learned using pretrained
VGG-16 and AlexNet models on the ImageNet dataset. As an
alternative to conventional machine learning approaches, we
used fully connected layers and soft-max layers for classifying
the attributes. According to Li et al. [19], a multiscale defect
region proposal network (RPN) generates candidate bounding
boxes at various levels to increase detection accuracy. The
authors used a second deep architecture and a geotagged
picture database to perform geolocalization. It is pertinent to
note that the geo-localization module and the crack detection
network are part of the same network.
An improved version of the fast R-CNN has been proposed
for crack detection. To speed up training, a CNN is combined
with a sensitivity detection network to extract deep features.
On the other hand, Deng et al. [21] evaluated the dataset of
concrete that included pictures with handwritten typescripts.
They concluded that handwritten characters may be considered
excessive noise in solid images. An OR setup was used by
Maeda et al. [22] to find cracks in a big data set built using an
SDD architecture. MobileNet and Inception V2 form the
backbone of the SSD framework's feature extraction. It is
important to emphasise that the data set was obtained and
annotated by researchers.
The algorithms used by Ni et al. [23] o identify crack
repairs were GoogleNet and ResNet. To partition the observed
crack locations, Otsu's thresholding was utilised, accompanied
by median filtering and Hessian matrices. These methods are
performed to reduce the impact of brightness and to enhance
crack structures. [24] used transfer learning with a pre-trained
framework on the ImageNet data set to identify crack patches.
Quick blockwise segmentation and vector voting curve
detection techniques were then used to produce the crack mask
and increase crack localization accuracy. GoogLeNet was used
in [25] to anticipate crack fixes. To segment the cracks, the
discovered patches were passed through a feature fusion
component and a number of convolution layers. Zhang et al.
[26] proposed a computationally more efficient Sobel-edge
adaptive sliding window strategy for obtaining crack patch
than the traditional sliding window method.
III. METHODOLOGY
A crack detecting approach based on a deep learning
approach is proposed in this research. The image database is
first organised, and image noise in the dataset is filtered away
to improve the contrast between road cracks and backgrounds.
Following that, the filtered photos are sent into a crack
prediction model for training. Figure 1 depicts the flow of the
proposed work.
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 11 Issue: 4s
DOI: https://doi.org/10.17762/ijritcc.v11i4s.6310
Article Received: 24 December 2022 Revised: 26 January 2023 Accepted: 02 February 2023
___________________________________________________________________________________________________________________
86
IJRITCC | March 2023, Available @ http://www.ijritcc.org
Figure 1. Flow of the proposed road crack detection
3.1 CFD Data Set
The CFD dataset was gathered and made available to the
public [27]. The data collection, according to the authors,
usually represents the status of the urban road surface in
Beijing, China. The photos have been carefully labelled down
to the pixel level. The CFD comprises 118, 480 × 320 pixel
RGB road photos with noise like shadows, water stains, and
oil patches as well as various lighting circumstances. The
images were captured with an iPhone 5 with a focus of 4mm,
an exposure of f/2.4, and an exposure duration of 1/134 s. The
image's width, which varies from 1 to 3 mm, should be noted.
Figure. 2 displays a few photos of road cracks.
Figure 2. Road crack images
3.2 Image preprocessing
Three-channel colour photographs of road cracks were
used. Red, green, and blue made up each of the three colours
that make up a colour pixel. These three colours each had a
connection to a colour image at a particular spatial location,
which led to the creation of a vector to depict the image. Two
colour augmentation techniques (contrast and sharpness) were
utilised to process colour photographs. Contrast augmentation
was employed to expand the gray-level range and enhance
image clarity to address the low contrast issue brought on by
the crack picture's constrained gray-level range. The
probability approach was used to smooth the intensity and
saturation components of the hue-saturation-intensity (HSI)
colour model, resulting in a uniform distribution. Eqs. (1) and
(2) illustrate the computation procedures for the brightness and
saturation components, respectively.
 󰇛󰇜󰇝󰇞󰇛
 󰇜
󰇝
 󰇞 (1)
 󰇛󰇜󰇛
 󰇜
󰇝󰇞
󰇝󰇞
 (2)
where k = 0, 1, . . . , L-1 and t = 0, 1, . . . , M-1; L and M
denote discrete levels of intensity and saturation, respectively.
X = (xH, xS, xI)T is a vector of color pixels representing each
image. F(.) is a probability function, and F(Z) = F(xI, xS) =
P{xI ≤ xI, xS ≤ xS}.
By boosting the contrast of the surrounding pixels,
sharpness reduces the blurring of the picture's object and
defines it. For the crack image, Laplace sharpness produces
gradient values (Laplace operator). Eq.(3) illustrates the
enhancement technique based on the Laplace operator.
󰇛󰇜󰇛󰇜󰇯󰇛󰇜
󰇛󰇜
󰇛󰇜󰇰 (3)
where p (a,b) represents the sharpened crack picture and q
(a,b) depicts the original crack picture, and 󰇛󰇜,
󰇛󰇜, and 󰇛󰇜 are the red, green, and blue
components' respective Laplace operators in colour pictures.
3.3 Segmentation using EfficientNet with residual U-Net
Architecture
A proposed Efficient-U-Net network consists of an
encoder and decoder, as illustrated in Figure 4. Due to limited
resources, we utilize a modified EfficientNetB4 encoder.
There are 9 stages in the encoder: 3x3 convolutional layers, 32
mobile reverse bottleneck convolutional structures, and 11
convolutional layers. There are five upsampling processes and
a sequence of convolutions in the decoder. In order to
determine the segmentation results, the encoder restores the
original picture size based on the retrieved features. We limit
the noise response and concentrate on specific properties of
the segmented crack by adding an attention gate to the skip
connection. The network may be expanded by including the
residual structure. After each convolution, the residual block
applies batch normalization (BN) and ReLU activation. Batch
normalization reduces gradient propagation and vanishment
and accelerates network convergence. Non-linear processing
using ReLU can be used to expand the network's capability to
express itself non-linearly. Figure 3 displays the segmented
image.
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 11 Issue: 4s
DOI: https://doi.org/10.17762/ijritcc.v11i4s.6310
Article Received: 24 December 2022 Revised: 26 January 2023 Accepted: 02 February 2023
___________________________________________________________________________________________________________________
87
IJRITCC | March 2023, Available @ http://www.ijritcc.org
Figure 3. Original and segmented images
The dropout layer, 1x1 convolutions for dimension
reduction, SE modules, depthwise convolutions, and 1x1
convolutions are all incorporated in the MBConv structure.
BN and Swish activation processes are carried out after the
first 1x1 convolution and Depthwise convolution, whereas BN
operations are only carried out in the second 1x1 convolution.
A shortcut link is combined with additional feature
information. The shortcut link will appear if the output and
input MBConv structure feature matrices are identical. The
accuracy of target recognition, picture segmentation, and
image classification have all been greatly enhanced by the SE
module. A Sigmoid activation function, a global average
pooling, and two fully connected layers were all used in this
investigation. Swish activation is introduced between two
layers that are completely connected. Stretch an image with
HWC compression into a 1x1xC format utilising global
pooling and fully connected layers, then multiply the resulting
image by the input image to give each channel weight. In this
method, the SE module allows the network to learn more
about crack-related features. An attention gate is a type of
attention device that may automatically focus on a certain
area, muffle the response of unnecessary regions, and enhance
feature data that is essential to a certain task.
Figure 4. The architecture of the proposed EfficientNet-U-Net
3.4 Object detection using YOLOv5
There are four versions of YOLOv5, each with varying
detection methods: YOLOv5s, YOLOv5m, YOLOv5l, and
YOLOv5x. With a weight of 13.7 M, and a parameter of 7.0
M, YOLOv5s is the fastest and smallest model. A framework
of the algorithm is illustrated in Figure 5, which consists of
three parts: the bottleneck, the backbone, and detection part.
Four modules make up the backbone network: the focus
module (Focus), the standard convolution module (Conv), the
C3 module, and the spatial pyramid pooling module (SPP). As
part of the YOLOv5 network structure adjustment, two
parameters are adjusted: depth factor and width factor.
Because it is a one-stage network with multilayer feature map
prediction, the YOLOv5s approach provides great accuracy
and detection speed. It may be utilised effectively in industry
and satisfies the criteria for pavement crack detecting
operations, especially in terms of speed. YOLOv5s model
performance is improved, model size is reduced, and detection
accuracy is increased with a lightweight network structure
based on accuracy, parameter number, and computational cost.
A convolutional neural network called Backbone is created
by combining visual input with different particle sizes. Head
processes box and class prediction procedures by
incorporating features from Neck (PAnet) and Feature
Pyramid Network (FPN). Neck is a layer sequence that
combines and integrates picture characteristics to
provide prediction. The FPN structure improves detection of
multi-scale items while providing an effective trade-off
between identification speed and accuracy. Focus and Cross-
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 11 Issue: 4s
DOI: https://doi.org/10.17762/ijritcc.v11i4s.6310
Article Received: 24 December 2022 Revised: 26 January 2023 Accepted: 02 February 2023
___________________________________________________________________________________________________________________
88
IJRITCC | March 2023, Available @ http://www.ijritcc.org
Stage Partial Connections (CSP) are the most important
features of YOLOv5. The focus layer was developed to
improve forward and backward performance, lessen the effect
of mAP, and minimize the number of layers, parameters,
FLOPS, and CUDA memory.
The latest YOLOv5 version and its predecessor have two
key changes. First, swap out the Focus layer for a 6 x 6
Conv2d layer. It has the same properties as a conventional 2D
convolution layer, despite lacking space-to-depth
functionality. Convolution layers with kernel sizes of six and
stride two. Focus layers with kernel sizes of three. The second
change was to replace the SPP layer with the SPPF layer.
These actions nearly triple computer performance. This
alternative is therefore quicker and more effective. The main
layer of the original YOLOv5 structure, the Conv layer, was
analyzed and changed. In the first Conv layer, an activation
function called SiLU (Sigmoid-Weighted Linear Units) was
used.
The Conv layer frequently employs ReLU as an activation
function (Rectified Linear Unit). Due to the minimal
processing required, learning occurs quickly, and
implementation is straightforward. The ReLU activation
function has the drawback that if it produces a value less than
zero, the gradient and weight will presumably remain at zero
during learning. As a consequence, we altered a structure
of the Conv layer. As a result, there is also the negative aspect
of ineffective learning. A variation of the ReLU activation
function is the ELU activation function. This shortens training
time and improves the performance of neural network test
datasets.
󰇛󰇜󰇥
󰇛󰇜
 (4)
󰆒󰇛󰇜
󰇛󰇜
 (5)
Figure 5. Structure of YOLOv5 network
IV. EXPERIMENTAL RESULT AND DISCUSSION
Simulation tests were carried out on the CFD dataset using
an NVIDIA TESLA P100 GPU and 16 GB RAM in this study.
The suggested method was created using Pytorch and
Tensorflow in a Python environment on a Linux platform. To
test the suggested models, five performance measures were
generated using Equations (6) to (11), namely Jaccard
coefficient, Dice coincidence index (Sorensen similarity
coefficient), accuracy, precision, recall, and IoU.
 
 (6)
 
 (7)
 
 (8)
 
 (9)

 (10)
 
 (11)
To compare the performance several state-of-the-art
algorithms are used to evaluate the proposed work. All models
were trained for a total of 100 epochs. For the CrackNet, Deep
Crack, Deep ResU-Net, and ResU-Net++ models, as well as
the proposed Efficient-U-Net model, training began with a
batch size of 32.
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 11 Issue: 4s
DOI: https://doi.org/10.17762/ijritcc.v11i4s.6310
Article Received: 24 December 2022 Revised: 26 January 2023 Accepted: 02 February 2023
___________________________________________________________________________________________________________________
89
IJRITCC | March 2023, Available @ http://www.ijritcc.org
Figure 6. Accuracy comparison of the algorithms
Figure 7. Precision comparison of the algorithms
Figure 8. Recall comparison of the algorithms
Figure 9. Dice score comparison of the algorithms
Figure 10. IoU score comparison of the algorithms
Figure 11. Jaccard score comparison of the algorithms
94
95
96
97
98
99
100
Accuracy (%)
82
84
86
88
90
92
94
96
Precision (%)
84
86
88
90
92
94
96
98
Recall (%)
82
84
86
88
90
92
94
96
Dice index (%)
84
86
88
90
92
94
96
IoU (%)
84
86
88
90
92
94
Jaccard (%)
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 11 Issue: 4s
DOI: https://doi.org/10.17762/ijritcc.v11i4s.6310
Article Received: 24 December 2022 Revised: 26 January 2023 Accepted: 02 February 2023
___________________________________________________________________________________________________________________
90
IJRITCC | March 2023, Available @ http://www.ijritcc.org
The accuracy of the algorithms are compared as shown in
Figure. 6, for analyzing the performance of the proposed
system in crack detection. The accuracy obtained by the
proposed Efficient-U-Net is 99.35% which is 0.97% higher
than ResU-Net++, 1.6% higher than Deep ResU-Net, 2.42%
higher than Deep Crack, and 2.97% higher than CrackNet. The
ResU-Net++ obtained the accuracy of 98.38%, Deep ResU-
Net obtains 97.75%, Deep Crack obtains 96.93% and
CrackNet obtains 96.38%. The residual unit improves
performance because feature accumulation with recurrent
residual convolutional layers offers improved feature
representation for segmentation tasks. It enables the creation
of a superior U-Net architecture with the same amount of
network parameters and improved picture segmentation
performance. The precision of the algorithms are presented in
the Figure.7. It is observed that the proposed Efficient-U-Net
obtained the maximum precision of 95.47% and CrackNet,
Deep Crack, Deep ResU-Net, and ResU-Net++ obtained
87.37%, 88.78%, 92.28%, and 94.76% respectively. Figure. 8
shows the recall obtained by the different algorithms. The
recall score obtained by CrackNet, Deep Crack, Deep ResU-
Net, ResU-Net++, and Efficient-U-Net is 88.91%, 89.83%,
93.49%, 95.28%, and 96.97% respectively.
However, it is shown that preprocessing enhances the
efficiency of the suggested approach while dealing with
blurred pictures, raising the Efficient-U-Net's Dice score to
96.97%. The dice score of other algorithms are 87.47%,
88.32%, 91.78%, 93.26% by CrackNet, Deep Crack, Deep
ResU-Net, and ResU-Net++ respectively. The capacity to
identify small cracks and unlabeled cracks is the key benefit of
utilising the Efficient-U-Net model for crack segmentation.
Additionally, the Efficient-U-Net model outperforms other
models in terms of detecting cracks in images that are blurry
and cracks on edges. The model also has the ability to
recognise shadow-producing picture cracks caused by shifting
lighting conditions. The IoU score of CrackNet, Deep Crack,
Deep ResU-Net, ResU-Net++, and Efficient-U-Net is 88.34%,
89.71%, 92.53%, 94.32%, and 95.87% respectively. The
Jaccard score obtained by the CrackNet, Deep Crack, Deep
ResU-Net, ResU-Net++, and Efficient-U-Net is 87.39%,
88.48%, 89.16%, 91.67%, and 93.88% respectively. The
investigation shows that the suggested approach operates more
effectively and has increased efficiency across the board. The
segmentation outcomes have been greatly enhanced by the
usage of two concatenated encoder-decoder designs.
V. CONCLUSION
The need for intelligent monitoring technology is growing
as a result of the rapidly rising road mileage, which makes it
impossible for the conventional road crack monitoring
approach to keep up with demand. In this work, segmentation
and object detection are used to evaluate deep learning
techniques for crack detection on roads. This research presents
a segmentation network that can identify road cracks using an
EfficientNet with residual attention based U-Net architecture.
The Efficient-U-Net network is used as the segmentation
model and the YOLO v5 network is utilised as the detection
model to identify cracks accurately while simultaneously
segmenting the cracks in the roads. The issue of inaccurate
crack localization in the road crack detection network is
resolved by combining the segmentation model with the
detection model. Results from the experiments show that the
suggested model not only performs better than the other
models, but also achieves superior accuracy, precision, and
recall. The suggested system's accuracy is 99.35%, which is
more than that of any existing methods.
REFERENCES
[1] Lacidogna, G.; Piana, G.; Accornero, F.; Carpinteri, A.
Multi-technique damage monitoring of concrete beams:
Acoustic Emission, Digital Image Correlation, Dynamic
Identification. Constr. Build. Mater. 2020, 242, 118114
[2] Zhao, S.; Sun, L.; Gao, J.; Wang, J. Uniaxial ACFM
detection system for metal crack size estimation using
magnetic signature waveform analysis. Measurement 2020,
164, 108090.
[3] Zhang, X.; Wang, K.; Wang, Y.; Shen, Y.; Hu, H. Rail
crack detection using acoustic emission technique by joint
optimization noise clustering and time window feature
detection. Appl. Acoust. 2020, 160, 107141.
[4] Cha, Y.J.; Choi, W.; Suh, G.; Mahmoudkhani, S.;
Büyüköztürk, O. Autonomous Structural Visual Inspection
Using Region-Based Deep Learning for Detecting Multiple
Damage Types. Comput.-Aided Civ. Infrastruct. Eng. 2017,
33, 731747.
[5] Fang, F.; Li, L.; Gu, Y.; Zhu, H.; Lim, J.H. A novel hybrid
approach for crack detection. Pattern Recognit. 2020, 107,
107474.
[6] Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling,
H. Feature Pyramid and Hierarchical Boosting Network for
Pavement Crack Detection. IEEE Trans. Intell. Transp. Syst.
2020, 21, 15251535.
[7] Fang, F.; Li, L.; Gu, Y.; Zhu, H.; Lim, J.H. A novel hybrid
approach for crack detection. Pattern Recognit. 2020, 107,
107474.
[8] Hsieh, Y.A.; Tsai, Y.J. Machine Learning for Crack
Detection: Review and Model Performance Comparison. J.
Comput. Civ. Eng. 2020, 34, 04020038.
[9] Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature
hierarchies for accurate object detection and semantic
segmentation. arXiv 2013, arXiv:1311.2524.
[10] Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only
Look Once: Unified, Real-Time Object Detection. arXiv
2016, arXiv:1506.02640.
International Journal on Recent and Innovation Trends in Computing and Communication
ISSN: 2321-8169 Volume: 11 Issue: 4s
DOI: https://doi.org/10.17762/ijritcc.v11i4s.6310
Article Received: 24 December 2022 Revised: 26 January 2023 Accepted: 02 February 2023
___________________________________________________________________________________________________________________
91
IJRITCC | March 2023, Available @ http://www.ijritcc.org
[11] Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.;
Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector.
arXiv 2016, arXiv:1512.02325.
[12] Li, B.;Wang, K.C.P.; Zhang, A.; Yang, E.;Wang, G.
Automatic classification of pavement crack using deep
convolutional neural network. Int. J. Pavement Eng. 2020,
21, 457463.
[13] Feng, C.; Liu, M.Y.; Kao, C.C.; Lee, T.Y. Deep Active
Learning for Civil Infrastructure Defect Detection and
Classification. Comput. Civ. Eng. 2017, 2017, 298306.
[14] Park, S.; Bang, S.; Kim, H.; Kim, H. Patch-Based Crack
Detection in Black Box Images Using Convolutional Neural
Networks. J. Comput. Civ. Eng. 2019, 33, 04019017.
[15] Nhat-Duc, H.; Nguyen, Q.L.; Tran, V.D. Automatic
recognition of asphalt pavement cracks using metaheuristic
optimized edge detection algorithms and convolution neural
network. Autom. Constr. 2018, 94, 203213.
[16] Dorafshan, S.; Thomas, R.J.; Maguire, M. Comparison of
deep convolutional neural networks and edge detectors for
image-based crack detection in concrete. Constr. Build.
Mater. 2018, 186, 10311045.
[17] Kim, H.; Ahn, E.; Shin, M.; Sim, S.H. Crack and Non crack
Classification from Concrete Surface Images Using
Machine Learning. Struct. Health Monit. 2019, 18, 725
738.
[18] Kim, B.; Cho, S. Automated Vision-Based Detection of
Cracks on Concrete Surfaces Using a Deep Learning
Technique. Sensors 2018, 18, 3452.
[19] Li, R.; Yuan, Y.; Zhang, W.; Yuan, Y. Unified Vision-
Based Methodology for Simultaneous Concrete Defect
Detection and Geolocalization. Comput.-Aided Civ.
Infrastruct. Eng. 2018, 33, 527544.
[20] Huyan, J.; Li,W.; Tighe, S.; Zhai, J.; Xu, Z.; Chen, Y.
Detection of sealed and unsealed cracks with complex
backgrounds using deep convolutional neural network.
Autom. Constr. 2019, 107, 102946.
[21] Deng, J.; Lu, Y.; Lee, V.C.S. Concrete crack detection with
handwriting script interferences using faster region-based
convolutional neural network. Comput.-Aided Civ.
Infrastruct. Eng. 2020, 35, 373388.
[22] Maeda, H.; Sekimoto, Y.; Seto, T.; Kashiyama, T.; Omata,
H. Road Damage Detection and Classification Using Deep
Neural Networks with Smartphone Images. Comput.-Aided
Civ. Infrastruct. Eng. 2018, 33, 11271141.
[23] Ni, F.; Zhang, J.; Chen, Z. Zernike-moment measurement of
thin-crack width in images enabled by dual-scale deep
learning. Comput.-Aided Civ. Infrastruct. Eng. 2019, 34,
367384.
[24] Zhang, K.; Cheng, H.D.; Zhang, B. Unified Approach to
Pavement Crack and Sealed Crack Detection Using
Preclassification Based on Transfer Learning. J. Comput.
Civ. Eng. 2018, 32, 04018001.
[25] Ni, F.; Zhang, J.; Chen, Z. Pixel-level crack delineation in
images with convolutional feature fusion. Struct. Control
Health Monit. 2019, 26, e2286.
[26] Zhang, X.; Rajan, D.; Story, B. Concrete crack detection
using context-aware deep semantic segmentation network.
Comput.-Aided Civ. Infrastruct. Eng. 2019, 34, 951971.
[27] Shi, Y.; Cui, L.; Qi, Z.; Meng, F.; Chen, Z. Automatic Road
Crack Detection Using Random Structured Forests. IEEE
Trans. Intell.Transp. Syst. 2016, 17, 34343445.
... The issue of inaccurate crack localization in road crack detection is addressed by integrating a segmentation model with a detection model. The suggested system's accuracy is 99.35%, which is more than that of any existing methods [5]. ...
Article
Full-text available
This study explores the application of Artificial Intelligence (AI) and Machine Learning (ML) techniques for object detection in real-world scenarios, with a particular focus on Albania. The purpose of the research is to develop and evaluate advanced object detection models that can enhance accuracy and reliability in various applications such as security and road safety. The study employs an experimental approach, leveraging the YOLO algorithm and Convolutional Neural Networks (CNNs) to train and evaluate customized object detection models using diverse datasets. Comparative analyses are conducted to identify the most effective methodologies. The findings demonstrate that larger, high-quality datasets significantly enhance model performance, as evidenced by a maximum F1-score of 0.96 achieved with 80 training images and 50 epochs. The research highlights the transformative potential of AI-driven object detection in improving processing speed and accuracy for critical applications. Challenges such as computational resource limitations and dataset constraints are identified as barriers to broader implementation. The study concludes with practical recommendations for improving model scalability and reliability, emphasizing the importance of integrating AI with complementary technologies for real-world deployment. These insights have implications for policymakers, developers, and industries aiming to leverage AI for enhanced safety and efficiency in infrastructure and beyond.
... EfficientNet [26] used a composite scaling strategy to improve performance while maintaining model efficiency. Satheesh et al. [27] used EfficientNet for crack segmentation. The transformer model [28] has been gradually applied to image-processing tasks due to its success in natural language processing. ...
Article
Full-text available
Concrete surface crack detection is a critical problem in the health monitoring and maintenance of engineering structures. The existence and development of cracks may lead to the deterioration of structural performance, potentially causing serious safety accidents. However, detecting cracks accurately remains challenging due to various factors such as uneven lighting, noise interference, and complex backgrounds, which often lead to incomplete or false detections. Traditional manual inspection methods are subjective, inefficient, and costly, while existing deep learning-based approaches still have the problem of insufficient precision and completeness. Therefore, this paper proposes a new crack detection model based on an improved TransUNet: AG-TransUNet, an adaptive multi-head self-attention mechanism, and a gated mechanism-based decoding module (GRU-T) is introduced to improve the accuracy and completeness of crack detection. Experimental results show that the AG-TransUNet outperforms the original TransUNet with a 4.05% increase in precision, a 2.59% improvement in F1-score, and a 0.36% enhancement in IoU on the CFD dataset. The AG-TransUNet achieves a 2.21% increase in precision, a 5.63% improvement in F1-score, and a 9.07% enhancement in IoU on the concrete crack dataset. In addition, in order to further quantitatively analyze the crack width, the orthogonal skeleton method is used to calculate the maximum width of a single crack to provide a reference for engineering maintenance. Experiments show that the maximum error between the real values and detection results is about 5%. Therefore, the proposed method better meets the needs of crack detection in practical engineering applications and provides a solution for improving the efficiency of crack detection.
... The decision to employ YOLO v5 for object detection and U-Net for semantic segmentation is grounded in numerous studies [66][67][68][69] that advocate for these technologies as the most representative of each technique when it comes to comparisons or complementary applications. For instance, a study on road crack detection demonstrated the effectiveness of the YOLO v5 model for object detection and the U-Net structure for segmentation, achieving an impressive accuracy of 99.35% [70]. Such findings underscore the prominence and reliability of these models in diverse applications, further justifying their selection for our research. ...
Article
Full-text available
The counting and characterization of neurons in primary cultures have long been areas of significant scientific interest due to their multifaceted applications, ranging from neuronal viability assessment to the study of neuronal development. Traditional methods, often relying on fluorescence or colorimetric staining and manual segmentation, are time consuming, labor intensive, and prone to error, raising the need for the development of automated and reliable methods. This paper delves into the evaluation of three pivotal deep learning techniques: semantic segmentation, which allows for pixel-level classification and is solely suited for characterization; object detection, which focuses on counting and locating neurons; and instance segmentation, which amalgamates the features of the other two but employing more intricate structures. The goal of this research is to discern what technique or combination of those techniques yields the optimal results for automatic counting and characterization of neurons in images of neuronal cultures. Following rigorous experimentation, we conclude that instance segmentation stands out, providing superior outcomes for both challenges. Graphical abstract Identifying the optimal pathway for characterizing neurons in complex cultures through structured experimentation
... While single CNNs have made progress in detecting cracks within concrete bridge images, ensemble learning (EL) techniques provide prominent advantages by harnessing both combined learning and diversity among group member models. By training various CNNs on the same bridge imagery data through techniques such as data augmentation, rotations and scaling, specialized CNNs trained within the ensemble gain unique crack detection abilities to contribute collectively beyond the detection ability of any single model [22,23]. The model ensemble then puts together individual predictions using either a stacking approach, where the single models are inputs for a meta-model to deliver the final output. ...
Article
Automatic image-based crack detection of concrete bridge decks contributes to safer bridge operation and bridge health monitoring. Existing models suffer from overfitting and low generalization abilities. Moreover, their performances highly depend on the model architecture, training method, data source, etc. To address these challenges, several hybrid self-designed and transfer learning ensemble models have been introduced for the efficient and accurate intelligent crack detection. Firstly, some self-designed convolutional neural networks (CNNs) are constructed from scratch using labeled crack and non-crack images from modified existing bridge deck image dataset. Secondly, some pretrained transfer learning models, namely the VGG16, VGG19, ResNet50, MobileNetV3Small Model, InceptionResNetV2, EfficientNetV2B0, Xception, and InceptionV3 are adopted to check the efficiency of transfer learning in detecting cracks in bridge decks images. Using the developed CNNs and transfer learning models, several ensemble learning models between the self-designed CNNs, transfer learning CNNs, as well as hybrid self-designed CNNs and transfer learning models are developed. The ensemble learning strategies including the weighted average, stacking, Adaboost, Gradient boosting, and XGBoost ensembles are utilized to construct the ensemble learning models aiming to increase the prediction accuracy and improve generalization ability. Results indicate that the hybrid ensemble learning between the self-designed CNNs and the transfer learning models highly improve the precision and accuracy of the individual models and can be well implemented for image-based bridge deck crack detection.
... DL-based approaches for RDD and IE usually include three processing steps: 1) data collection; 2) data annotation; and 3) training [16]. Various DL techniques, such as SSD [17], Faster R-CNN [18], [19], YOLOV7 [20], [21], EfficientNet [22], and EfficientDet [23] have been used for RDD and IE [24], [25], [26]. Reference [27] also used DL techniques to detect multiple types of RD, such as potholes, alligator cracks, etc. ...
Article
Road damage detection (RDD) through computer vision and deep learning techniques can ensure the safety of vehicles and humans on the roads. Integrating unmanned aerial vehicles (UAVs) in RDD and infrastructure evaluation (IE) has also emerged as a key enabler, contributing significantly to data acquisition and real-time monitoring of road damages such as potholes, cracks, and surface anomalies, facilitating proactive maintenance and improved road conditions. These UAVs are low-powered and resource-constrained devices that work autonomously to perform pattern detection and decision-making leveraging tiny machine learning (Tiny ML) algorithms. These Tiny ML algorithms are designed to run on edge devices, IoT devices, UAVs, etc. In this study, the RDD2022 dataset collected using UAVs and dashboard cameras of vehicles was utilized to train pure and mixed models that exhibit class instance imbalance in certain classes which is addressed by implementing data augmentation as a regularization technique. State-of-the-art two-stage detectors; Faster R-CNN ResNet101 and one-stage detectors; SSD MobileNet V1 FPN, YOLOv5, and Efficientdet D1 are employed. The results indicate that the two-stage detector achieved an impressive mAP of 88.49% overall and 96.62% for focused classes. Notably, the state-of-the-art Efficientdet D1 approach achieved a competitive mAP of 86.47% overall and 95.12% for focused classes, with significantly lower computational cost. These findings highlight the potential of advanced object detection techniques, particularly Efficientdet D1, to enhance the accuracy and efficiency of RDD systems, thereby improving passenger safety and overall performance.
Article
Full-text available
With the increasing automation in today's world, the need for finding and labelling objects in images and videos has grown exponentially. Be it managing traffic, self-driving cars or medical imaging, object detection is being used everywhere around us. Traditional methods for object detection, like SIFT or HOG features, are efficient but no longer compatible for today's needs as the processing of images needed are in real time that can not be done by these methods. These methods also make the procedure of training and preparing our model really complex and can only work with well-lit, front-faced, full-picture images of objects which is not always possible to achieve. So, the deep learning methods for object detection, like R-CNN, YOLO or RetinaNet, were introduced.These methods are being used worldwide to detect objects and make object detection automated and simpler. In this paper, we provide a review on both machine learning and deep learning approaches for object detection. Our review begins with an introduction to object detection, then we focus on all the methods used for object detection-machine learning approach and deep learning approach. Then we move on to all the advantages, challenges and applications of object detection. To conclude it, we mentioned the future scopes everyone can look forward to.
Article
Full-text available
Computer‐vision and deep‐learning techniques are being increasingly applied to inspect, monitor, and assess infrastructure conditions including detection of cracks. Traditional vision‐based methods to detect cracks lack accuracy and generalization to work on complicated infrastructural conditions. This paper presents a novel context‐aware deep convolutional semantic segmentation network to effectively detect cracks in structural infrastructure under various conditions. The proposed method applies a pixel‐wise deep semantic segmentation network to segment the cracks on images with arbitrary sizes without retraining the prediction network. Meanwhile, a context‐aware fusion algorithm that leverages local cross‐state and cross‐space constraints is proposed to fuse the predictions of image patches. This method is evaluated on three datasets: CrackForest Dataset (CFD) and Tomorrows Road Infrastructure Monitoring, Management Dataset (TRIMMD) and a Customized Field Test Dataset (CFTD) and achieves Boundary F1 (BF) score of 0.8234, 0.8252, and 0.7937 under 2‐pixel error tolerance margin in CFD, TRIMMD, and CFTD, respectively. The proposed method advances the state‐of‐the‐art performance of BF score by approximately 2.71% in CFD, 1.47% in TRIMMD, and 4.14% in CFTD. Moreover, the averaged processing time of the proposed system is 0.7 s on a typical desktop with Intel® Quad‐Core™ i7‐7700 CPU@3.6 GHz Processor, 16GB RAM and NVIDIA GeForce GTX 1060 6GB GPU for an image of size 256 × 256 pixels.
Article
Full-text available
Pavement crack detection is a critical task for insuring road safety. Manual crack detection is extremely time-consuming. Therefore, an automatic road crack detection method is required to boost this progress. However, it remains a challenging task due to the intensity inhomogeneity of cracks and complexity of the background, e.g., the low contrast with surrounding pavements and possible shadows with a similar intensity. Inspired by recent advances of deep learning in computer vision, we propose a novel network architecture, named feature pyramid and hierarchical boosting network (FPHBN), for pavement crack detection. The proposed network integrates context information to low-level features for crack detection in a feature pyramid way, and it balances the contributions of both easy and hard samples to loss by nested sample reweighting in a hierarchical way during training. In addition, we propose a novel measurement for crack detection named average intersection over union (AIU). To demonstrate the superiority and generalizability of the proposed method, we evaluate it on five crack datasets and compare it with the state-of-the-art crack detection, edge detection, and semantic segmentation methods. The extensive experiments show that the proposed method outperforms these methods in terms of accuracy and generalizability. Code and data can be found in https://github.com/fyangneil/pavement-crack-detection.
Article
Full-text available
Cracks in civil structures are important signs of structural degradation and may even indicate the inception of catastrophic failure. Image‐based crack detection has been attempted in research communities that bear the potential of replacing human‐based inspection. Among many methodologies, deep learning‐based cracks detection is actively explored in recent years. However, how to automatically extract cracks quickly and accurately at a pixel level, that is, crack delineation (including both detection and segmentation), is a challenging issue. This article proposes a convolutional neural network‐based framework that automates this task through convolutional feature fusion and pixel‐level classification. The resulting network architecture with an empirically optimal fusion strategy, termed the crack delineation network, is trained and tested based on a concrete crack image database. The results show that the proposed framework can delineate cracks accurately and rapidly in images towards a fully autonomous machine vision approach to structural crack detection.
Article
A single-axis alternating current field measurement (ACFM) detection system is proposed for crack size estimation. A single tunneling magneto-resistive (TMR) sensor is used to detect Bz signal, which is able to determine both the length and depth of a crack simultaneously. First, a theoretical analysis is presented to evaluate the crack length and especially depth using Bz signature waveform. The underlying physics principle is supported using a finite element analysis (FEA) method. In the simulation, the cracks with various lengths and depths are analyzed for a given crack width, and corresponding Bzmax values are obtained. Next, a Bzmax characteristic polynomial surface is developed, which can be represented by a fitted polynomial interpolation equation. The crack depth can be inversed by this equation with reference to the measured Bzmax value and crack length. Finally, real ACFM experiments are conducted to demonstrate that the crack lengths and depths can be readily estimated.
Article
Acoustic Emission (AE), Digital Image Correlation (DIC) and Dynamic Identification (DI) techniques are used to analyse crack formation and propagation in plain concrete pre-notched beam specimens subject to three-point bending. Four dimensional scales are considered and scale effects on fracture energy, bending strength, and AE energy per unit area are investigated. The energy brittleness numbers are calculated for the different beam sizes to characterise the fragility of the specimens. For the larger sample, the principal strain directions obtained by DIC are put in relation with the crack path, while the analysis of resonant frequencies is used to correlate the bending stiffness reduction to the crack advancement.
Article
Recently, acoustic emission (AE) technology has been investigated to detect rail cracks. However, AE signals of cracks are often submerged in heavy noises in practical application, and these serious noise interferences should be eliminated to obtain a reliable detection result. Based on the joint optimization clustering and time window feature, an improved detection method of rail crack signal is proposed by using AE technology in this paper. The joint optimization method based on Long Short-Term Memory (LSTM) encoder-decoder network and k-means clustering is utilized to achieve a better clustering result of noise signals. Then, the distance thresholds of noise clusters are selected to suppress most of the noise signals. After that, the detection method based on crack duration time feature of time window is further proposed to eliminate false detection and improve the accuracy of crack detection. The detection ability of the proposed method is verified by the signals which are acquired from the real noise environment of railway. Meanwhile, the effectiveness of the proposed method is also demonstrated by comparing with the previous study. The results clearly illustrate that the improved method is effective in detecting rail crack signals under serious noise interference.
Article
Crack Deep Network (CrackDN) is proposed in this research with the purpose of detecting sealed and unsealed cracks with complex road backgrounds. CrackDN is based on Faster Region Convolutional Neural Network (Fast-RCNN) architecture by embedding a sensitivity detection network parallel to the feature extraction Convolutional Neural Network (CNN), both of which are then connected to the Region Proposal Refinement Network (RPRN) for classification and regression. The state-of-the-art aspect of this research lies in the fusion of sensitivity detection network, which facilitated the CrackDN of being able to detect sealed and unsealed cracks with sever complex background. Four kinds of background conditions are considered for both sealed and unsealed crack analysis: normal and unbalanced illuminations, with markings and shadings. The raw pavement images are first processed simultaneously by the sensitivity extraction network, which is formed by a batch of line filters that each differ in angles for sensitive region extraction, and the CNN, which utilized ZF-Net. Then the extracted sensitive maps and feature maps are applied as the input of RPRN, thereby the prediction scores together with the bounding box can be obtained. The performance of CrackDN is compared with Faster-RCNN and SSD300 architectures for sealed and unsealed crack detection. Results demonstrate that CrackDN can achieve the detection mean average precision of higher than 0.90, which outperforms both Faster-RCNN and SSD300. The detection speed of CrackDN (around 6 fps) are slightly lower than SSD300 but significantly higher than Faster-RCNN. Meanwhile, the performance of sealed crack detection is better than unsealed crack detection for most background conditions. Moreover, sealed and unsealed cracks on the markings are the most difficult conditions for detection. However, CrackDN still can obtain the detection accuracy of above 0.85.
Article
The current bridge maintenance practice generally involves manual visual inspection, which is highly subjective and unreliable. A technique that can automatically detect defects, for example, surface cracks, is essential so that early warnings can be triggered to prevent disaster due to structural failure. In this study, to permit automatic identification of concrete cracks, an ad‐hoc faster region‐based convolutional neural network (faster R‐CNN) was applied to contaminated real‐world images taken from concrete bridges with complex backgrounds, including handwriting. A dataset of 5,009 cropped images was generated and labeled for two different objects, cracks and handwriting. The proposed network was then trained and tested using the generated image dataset. Four full‐scale images that contained complex disturbance information were used to assess the performance of the trained network. The results of this study demonstrate that faster R‐CNN can automatically locate crack from raw images, even with the presence of handwriting scripts. For comparative study, the proposed network is also compared with You Only Look Once v2 detection technique.
Article
Although crack inspection is a routine practice in civil infrastructure management (especially for highway bridge structures), it is time‐consuming and safety‐concerning to trained engineers and costly to the stakeholders. To automate this in the near future, the algorithmic challenge at the onset is to detect and localize cracks in imagery data with complex scenes. The rise of deep learning (DL) sheds light on overcoming this challenge through learning from imagery big data. However, how to exploit DL techniques is yet to be fully explored. One primary component of practical crack inspection is that it is not merely detection via visual recognition. To evaluate the potential risk of structural failure, it entails quantitative characterization, which usually includes crack width measurement. To further facilitate the automation of machine‐vision‐based concrete crack inspection, this article proposes a DL‐enabled quantitative crack width measurement method. In the detection and mapping phase, dual‐scale convolutional neural networks are designed to detect cracks in complex scene images with validated high accuracy. Subsequently, a novel crack width estimation method based on the use of Zernike moment operator is further developed for thin cracks. The experimental results based on a laboratory loading test agree well with the direct measurements, which substantiates the effectiveness of the proposed method for quantitative crack detection.