ArticlePDF Available

A Forest Fire Detection System Based on Ensemble Learning

Authors:

Abstract and Figures

Due to the various shapes, textures, and colors of fires, forest fire detection is a challenging task. The traditional image processing method relies heavily on manmade features, which is not universally applicable to all forest scenarios. In order to solve this problem, the deep learning technology is applied to learn and extract features of forest fires adaptively. However, the limited learning and perception ability of individual learners is not sufficient to make them perform well in complex tasks. Furthermore, learners tend to focus too much on local information, namely ground truth, but ignore global information, which may lead to false positives. In this paper, a novel ensemble learning method is proposed to detect forest fires in different scenarios. Firstly, two individual learners Yolov5 and EfficientDet are integrated to accomplish fire detection process. Secondly, another individual learner EfficientNet is responsible for learning global information to avoid false positives. Finally, detection results are made based on the decisions of three learners. Experiments on our dataset show that the proposed method improves detection performance by 2.5% to 10.9%, and decreases false positives by 51.3%, without any extra latency.
Content may be subject to copyright.
Forests 2021, 12, 217. https://doi.org/10.3390/f12020217 www.mdpi.com/journal/forests
Article
A Forest Fire Detection System Based on Ensemble Learning
Renjie Xu
1
, Haifeng Lin
1
, Kangjie Lu
1
, Lin Cao
2
and Yunfei Liu
1,
*
1
College of Information Science and Technology, Nanjing Forestry University, Nanjing 210037, China;
xurenjie@njfu.edu.cn (R.X.); haifeng.lin@njfu.edu.cn (H.L.); lukangjie@njfu.edu.cn (K.L.)
2
Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University,
Nanjing 210037, China; lincao@njfu.edu.cn
* Correspondence: lyf@njfu.com.cn; Tel.: +86-139-1389-5117
Abstract: Due to the various shapes, textures, and colors of fires, forest fire detection is a challenging
task. The traditional image processing method relies heavily on manmade features, which is not
universally applicable to all forest scenarios. In order to solve this problem, the deep learning tech-
nology is applied to learn and extract features of forest fires adaptively. However, the limited learn-
ing and perception ability of individual learners is not sufficient to make them perform well in com-
plex tasks. Furthermore, learners tend to focus too much on local information, namely ground truth,
but ignore global information, which may lead to false positives. In this paper, a novel ensemble
learning method is proposed to detect forest fires in different scenarios. Firstly, two individual learn-
ers Yolov5 and EfficientDet are integrated to accomplish fire detection process. Secondly, another
individual learner EfficientNet is responsible for learning global information to avoid false posi-
tives. Finally, detection results are made based on the decisions of three learners. Experiments on
our dataset show that the proposed method improves detection performance by 2.5% to 10.9%, and
decreases false positives by 51.3%, without any extra latency.
Keywords: forest fire detection; deep learning; ensemble learning; Yolov5; EfficientDet; EfficientNet
1. Introduction
With the change of the earth’s climate, forest fires occur frequently all over the world,
which not only cause serious economic losses and destroy the ecological environment, but
also pose a great threat to the safety of human life.
Forest fires usually spread quickly and are difficult to control in a short time. There-
fore, it is imperative to detect the early forest fire before it spreads out, but traditional
detection methods have obvious drawbacks in detecting it in open forest areas. Sensors-
based [1–3] detection systems have good performance in indoor space, but it is difficult to
install them outdoors, considering high coverage cost [4,5]. In addition, they cannot pro-
vide important visual information which can help firefighters promptly grasp the situa-
tion of the fire scene. Infrared or ultraviolet detectors [6,7] are easy to be interfered by the
environment, and considering their short detection distance, they are not suitable for large
open areas. Satellite remote sensing [8] is good at detecting large-scale forest fires, but it
cannot detect early regional fire.
Impressed by the rising computer vision technology, researchers start to seek an ef-
ficient and effective fire detection model based on image processing. Chen et al. [9] pro-
posed an RGB (red, green, blue) model based chromatic and disorder measurement for
extracting fire-pixels in the video. The color information is responsible for extracting fire-
pixels, and dynamic information is used to verify if it is a real fire. Töreyin et al. [10] used
1D temporal wavelet transform to detect flame flicker, and applied 2D spatial wavelet
transform to identify fire moving regions. This method, which integrated color and tem-
poral variation information, reduced false alarms in real-world scenes. Çelik et al. [11]
studied diverse video sequences and images, and proposed a fuzzy color model using
Citation: Xu, R.; Lin, H.; Lu, K.; Cao,
L.; Liu, Y. A Forest Fire Detection
System Based on Ensemble
Learning. Forests 2021, 12, 217.
https://doi.org/10.3390/f12020217
Academic Editor:
Stelian Alexandru Borz
Received: 4 January 2021
Accepted: 12 February 2021
Published: 13 February 2021
Publisher’s Note: MDPI stays neu-
tral with regard to jurisdictional
claims in published maps and institu-
tional affiliations.
Copyright: © 2021 by the authors. Li-
censee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and con-
ditions of the Creative Commons At-
tribution (CC BY) license (http://crea-
tivecommons.org/licenses/by/4.0/).
Forests 2021, 12, 217 2 of 17
statistical analysis. Combined with motion analysis, the model achieves a good discrimi-
nation between fire and fire-like objects. Teng et al. [12] analyzed fire characteristics and
proposed a real-time fire detection method based on hidden Markov models (HMMs),
which extracted candidate fire-pixels using moving pixel detection, fire-color inspection,
and pixel clustering. Chino et al. [13] found that most algorithms were designed for video,
which had obvious limitations. To solve this problem, a novel fire detection method
named BowFire was proposed. The method combined color features with superpixel tex-
ture discrimination to detect fire in still images. In conclusion, most traditional fire detec-
tion methods based on image processing focused on creating artificial features like color,
motion, and texture to detect fires.
However, powerful deep learners begin to replace human intelligence. They are bet-
ter at learning features than humans, and the features they extract contain much deeper
semantic information than manmade ones. Recently, deep learning has outperformed tra-
ditional manmade features in many fields, and have been widely used in fire detection.
Zhang et al. [14] created a forest fire benchmark, and used Faster R-CNN (region-based
convolutional neural network) [15], Yolo (you only look once) [16–19], and SSD (single
shot multibox detector) [20] to detect fire. They found that SSD was better regarding effi-
ciency, detection accuracy, and early fire detection ability. Moreover, they proposed an
improved tiny-Yolo by adjusting the network architecture. Kim et al. [21] employed faster
R-CNN to detect fire and non-fire regions based on their spatial features. In addition, long
short-term memory (LSTM) is used to verify the reliability of fire alarm. Lee et al. [22]
proposed a video-based fire detection model, which used faster R-CNN to generate a fire
candidate region for each frame. Then, structural similarity (SSIM) and mean square error
(MSE) were calculated to determine similarity between adjacent frames. Final fire regions
were determined based on spatial and temporal features. Pan et al. [23] proposed a cam-
era-based wildfire detection system via transfer learning, in which block-based analysis
strategy was used to improve fire detection accuracy. Redundant filters, which had low
energy impulse response, were removed to ensure the model’s efficiency on edge devices.
Wu et al. [24] applied principal component analysis (PCA) to process forest fire images,
and then fed them into the training network. The combination of two models was proved
to enhance location results. In conclusion, faced with fire detection task, most researchers
tend to only assign individual learners to perform object detection tasks, which is consid-
ered unreliable, since it may lead to false negatives.
In this paper, a novel method based on ensemble learning for forest fire detection is
proposed. First, forest fire detection is a complicated and difficult task, making it highly
impractical for individual learners to detect fires in diverse scenarios. Every individual
learner has its own expertise, and can extract different features from the image, so inte-
grating different individual learners can significantly improve the robustness of the model
and enhance detection performance. Therefore, two individual object detectors Yolov5
[25] and EfficientDet [26] are integrated to detect the fire in parallel. These two learners
work synergistically in detecting different types of forest fires, thereby improving the de-
tection accuracy. Second, the object detectors only care about what fire is like, so they do
not take the whole image into consideration. In this case, fire-like objects will absolutely
affect the detection results. To solve this problem, the EfficientNet image classifier [27] is
incorporated into our model, whose role is to enable the model to take full advantage of
the global information. Final detection results will be made through the decision strategy
according to results of these three learners, which will efficiently increase detection accu-
racy and decrease the false positives.
2. Materials and Methods
2.1. Datasets
To ensure our learners can handle different kinds of forest fires (ground fire, trunk
fire, and canopy fire), we collected images from multiple public fire datasets: BowFire [28],
Forests 2021, 12, 217 3 of 17
FD-dataset [29], ForestryImages [30], VisiFire [31], etc. After manual filtration, we created
a single integrated forest fire dataset containing 10,581 images, with 2976 forest fire images
and 7605 non-fire images. Representative samples of our dataset are shown in Figures 1–
3.
(a) (b)
(c) (d)
Figure 1. Representative forest fire images in the fire section of our dataset, including (a) ground fire 1, (b) ground fire 2,
(c) trunk fire, and (d) canopy fire.
(a) (b)
(c) (d)
Figure 2. Representative normal forest images in the non-fire section of our dataset, including (a) normal forest scene 1,
(b) normal forest scene 2, (c) normal forest scene 3, and (d) normal forest scene 4. (ad) illustrate normal forest scenes
without fire objects.
Forests 2021, 12, 217 4 of 17
(a) (b)
(c) (d)
Figure 3. Representative images in the non-fire section of our dataset, including (a) wild scene with sun 1, (b) wild scene
with sun 2, (c) wild scene with sun 3, and (d) wild scene with sun 4. (ad) illustrate normal wild scenes containing fire-
like object (e.g., sun).
2.2. Yolov5
Yolo is a state-of-the-art, real-time object detector, and Yolov5 is based on Yolov1-
Yolov4. Continuous improvements have made it achieve top performances on two official
object detection datasets: Pascal VOC (visual object classes) [32] and Microsoft COCO
(common objects in context) [33].
The network architecture of Yolov5 is shown in Figure 4. There are three reasons why
we choose Yolov5 as our first learner. Firstly, Yolov5 incorporated cross stage partial net-
work (CSPNet) [34] into Darknet, creating CSPDarknet as its backbone. CSPNet solves the
problems of repeated gradient information in large-scale backbones, and integrates the
gradient changes into the feature map, thereby decreasing the parameters and FLOPS
(floating-point operations per second) of model, which not only ensures the inference
speed and accuracy, but also reduces the model size. In forest fire detection task, detection
speed and accuracy is imperative, and compact model size also determines its inference
efficiency on resource-poor edge devices. Secondly, the Yolov5 applied path aggregation
network (PANet) [35] as its neck to boost information flow. PANet adopts a new feature
pyramid network (FPN) structure with enhanced bottom-up path, which improves the
propagation of low-level features. At the same time, adaptive feature pooling, which links
feature grid and all feature levels, is used to make useful information in each feature level
propagate directly to following subnetwork. PANet improves the utilization of accurate
localization signals in lower layers, which can obviously enhance the location accuracy of
the object. Thirdly, the head of Yolov5, namely the Yolo layer, generates 3 different sizes
(18 × 18, 36 × 36, 72 × 72) of feature maps to achieve multi-scale [18] prediction, ena-
bling the model to handle small, medium, and big objects. A forest fire usually develops
from small-scale fire (ground fire) to medium-scale fire (trunk fire), then to big-scale fire
(canopy fire). Multi-scale detection ensures that the model can follow size changes in the
process of fire evolution.
Forests 2021, 12, 217 5 of 17
Figure 4. The network architecture of Yolov5. It consists of three parts: (1) Backbone: CSPDarknet, (2) Neck: PANet, and
(3) Head: Yolo Layer. The data are first input to CSPDarknet for feature extraction, and then fed to PANet for feature
fusion. Finally, Yolo Layer outputs detection results (class, score, location, size).
2.3. EfficientDet
EfficientDet is a new family of object detectors developed by Google, and it consist-
ently achieves better efficiency than prior art across a wide spectrum of resource con-
straints. Similar to Yolov5, EfficientDet has also achieved remarkable performances in
Pascal VOC and Microsoft COCO tasks, and is widely used in real-world applications.
The network architecture of EfficientDet is shown in Figure 5. There are three reasons
why we choose EfficientDet as our second learner. Firstly, EfficientDet employed state-of-
the-art network EfficientNet [27] as its backbone, making that the model has sufficient
ability to learn the complex feature of diverse forest fires. Secondly, it applied an im-
proved PANet, named bi-directional feature pyramid network (Bi-FPN) as its neck, to al-
low easy and fast multi-scale feature fusion. Bi-FPN introduces learnable weights, ena-
bling network to learn the importance of different input features, and repeatedly applies
top-down and bottom-up multi-scale feature fusion. Compared with Yolov5s neck
PANet, Bi-FPN has better performances with less parameters and FLOPS. Meanwhile,
different feature fusion strategy brings different semantic information, thereby bringing
different detection results. Thirdly, similar to EfficientNet, it integrates a compound scal-
ing method that uniformly scales the resolution, depth, and width for all backbone, fea-
ture network, and box/class prediction networks at the same time, which ensures the max-
imum accuracy and efficiency under the limited computing resources. With more availa-
ble resources, accuracy will be consistently improved. Our second learner, EfficientDet,
with different backbone, neck, and head, can learn different information that Yolov5 can-
not.
Forests 2021, 12, 217 6 of 17
Figure 5. The network architecture of EfficientDet. It consists of three parts: (1) Backbone: EfficientNet, (2) Neck: Bi-FPN,
(3) Head. Similar to Yolov5, the data are first input to EfficientNet for feature extraction, and then fed to Bi-FPN for feature
fusion. Finally, head outputs detection results (class, score, location, size).
2.4. EfficientNet
EfficientNet is a new efficient network proposed by Google. It applied a novel model
scaling strategy, namely compound scaling method, to balance network depth, network
width, and image resolution for better accuracy at a fixed resource budget. With this, Ef-
ficientNet outperformed other hot networks like ResNet [36], DenseNet [37], ResNeXt [38]
with the highest Top-1 accuracy in ImageNet image classification task.
The network architecture of EfficientNet is shown in Figure 6. The reason why we
choose EfficientNet as our third learner is that it achieves a superior trade-off between
accuracy and efficiency. In our model, the third learner plays the most important role. It
is responsible for learning the whole image to guide the detection, meaning that its deci-
sions directly determine the final results. Meanwhile, it must be highly efficient, otherwise
it will slow down the speed of the entire model.
Figure 6. The network architecture of EfficientNet. It can output a feature map with deep semantic information after the
input data flows through the multi-layer network.
2.5. Our Model
In real-world forest fire detection task, we need to handle different types of forest
fires like ground fire, trunk fire, canopy fire. These fires, influenced by the environment,
are diverse in shape, texture, or even color, bringing great difficulty for individual learner
to extract effective features. By careful observations, we find that Yolov5 is better at learn-
ing long-area fires (Figure 7), but it sometimes misses objects (Figure 8). Meanwhile, even
though EfficientDet is not sensitive to long-area fires (Figure 7), it is more careful than
Forests 2021, 12, 217 7 of 17
Yolov5, meaning that EfficientDet can make a complementary detection (Figure 8). There-
fore, we consider that integrating these two efficient learners with different specialties to
make detection together can improve detection accuracy.
(a) (b)
(c) (d)
Figure 7. Yolov5 is better at detecting long-area fires than EfficientDet. (a) True positive of Yolov5; (b) true positive of
Yolov5; (c) false negative of EfficientDet; (d) false negative of EfficientDet. (a,b) illustrate that Yolov5 detect long-area fires
successfully, while (c,d) show that EfficientDet fails to detect them.
(a) (b)
(c) (d)
Figure 8. EfficientDet is a more careful object detector than Yolov5, meaning that it seldom losses potential objects easily.
(a) Yolov5 fails to cover all fire areas; (b) Yolov5 misses two fire objects; (c) EfficientDet covers all fire areas; (d) EfficientDet
detects four fire objects.
Forests 2021, 12, 217 8 of 17
Another issue is that the ability of the object detector is limited. It only learns the fire
region, which is just a local pattern of the whole image, but ignores the other information
like background. As a result, the object detector may treat fire-like objects (e.g., sun) as
fires (Figure 9), thereby making false alarms. Therefore, a good leader EfficientNet that
has a full understanding of the whole image is needed to guide the detection process.
(a) (b)
(c) (d)
Figure 9. Object detectors Yolov5 and EfficientDet are easy to be deceived by fire-like objects (e.g., sun). (a) False positive
of Yolov5 (confidence score: 0.63); (b) false positive of Yolov5 (confidence score: 0.59); (c) false positive of EfficientDet
(confidence score: 0.84); (d) false positive of EfficientDet (confidence score: 0.71).
To address the above two issues and make sure our model is robust to diverse sce-
narios, three deep learners are integrated to make decisions together (Figure 10). The first
and second learners Yolov5 and EfficientDet act as object detectors, to detect fire locations
in images by generating candidate boxes, respectively. Then, the non-maximum suppres-
sion algorithm [39] (Algorithm 1) is employed to eliminate redundant boxes, preserving
boxes with top confidence. The third learner EfficientNet acts as a binary classifier, re-
sponsible for learning the whole image to determine whether the image contains fire ob-
jects. Finally, the object detection results, and image classification results are sent into a
decision strategy module, in which if the image is considered to contain fire objects, re-
taining object detection results, otherwise ignoring them.
In addition, integrating multiple learners will not affect the overall efficiency of
model, because the three learners are structurally independent, and the whole model is
executed by multi processes, meaning that each learner has a separate process responsible
for it.
Forests 2021, 12, 217 9 of 17
Figure 10. Structure of the proposed model in this paper. Three deep learners are ensembled in parallel. Two object detec-
tors Yolov5 and EfficientDet are integrated to perform object detection task, and the classifier EfficientNet is in charge of
discriminating whether the image contains fire objects. Final detection results are made based on the decisions of three
learners.
Algorithm 1. Non-Maximum Suppression (NMS)
INPUT: B={b,…,b
}, S={s,…,s
}, N
B is the list of initial detection boxes
S contains corresponding detection scores
N is the NMS threshold
Begin:
D←{ }
while B≠empt
y
do
m argmax S
M b
D D M; B B M
for b in B do
if iou󰇛M, b󰇜≥N
then
B←B−b
; S S s
end
end
end
Return D, S
End
2.6. Model Evaluation
We evaluate models using Microsoft COCO criteria (Table 1), which is widely used
in object detection tasks. However, fire is a special object, which is diverse in shape, tex-
ture, and color. Bounding box generated by object detectors may slightly differ from
ground truth (Figure 11), thereby influencing the calculation of average precision, but de-
tectors do identify the fire areas successfully. Therefore, to evaluate models more compre-
hensively, we introduce two additional evaluation metrics, namely frame accuracy (FA)
and false positive rate (FPR). For one image, if the detector misses any fire object, we call
it is a frame false (FF), otherwise frame true (FT). If the detector treats any fire-like object
as fire, we call it is a false positive (FP), otherwise true positive (TP). Note that FA is cal-
culated on the test set containing 476 forest images, and FPR is calculated on our challeng-
ing non-fire dataset containing 641 images with fire-like objects (e.g., sun). The FA and
FPR can be calculated as Equation (1) and Equation (2), respectively:
FA = FT
FT + FF × 100, (1)
FPR = FP
FP + TP × 100. (2)
Forests 2021, 12, 217 10 of 17
(a) (b)
(c) (d)
Figure 11. Bounding boxes generated by (a) Yolov5, (b) EfficientDet, and (c) our model (3 learners) are different from (d)
ground truth, but still has good detection performance.
Table 1. Microsoft COCO criteria—commonly used in object detection task for evaluating the
model precision and recall across multiple scales.
Average Precision (AP)
AP
. AP at IoU = 0.5
AP Across Scales:
AP
AP
. for small objects: area < 32
AP
AP
. for medium objects: 32<area<96
AP
AP
. for big objects: area > 96
Average Recall (AR)
AR. AR at IoU = 0.5
AR Across Scales:
AR AR. for small objects: area < 32
AR AR. for medium objects: 32<area<
96
AR AR. for big objects: area > 96
3. Results
3.1. Training
We applied different strategies to train our three learners: Yolov5, EfficientDet, and
EfficientNet. Object detectors, namely Yolov5 and EfficientDet, are trained with 2381 for-
est fire images, and tested with 476 forest fire images. The image classifier, namely Effi-
cientNet, is trained with 2381 forest fire images and 5804 non-fire images, and tested with
476 forest fire images and 1160 non-fire images. Note that non-fire images contain normal
images, and images with fire-like objects (e.g., sun). Each model is built up by Pytorch [40]
Forests 2021, 12, 217 11 of 17
and trained on NVIDIA GTX 2080TI. The details of our training strategy are shown in
Table 2.
Table 2. Detailed training strategies of models.
Model Train Test Optimizer LR Batch Size Epoch
Yolov5 2381 476 SGD [41,42]
1×10
 8 300
EfficientDet 2381 476 AdamW [43]
1×10
 4 300
EfficientNet 8185 1636 SGD 1×10
 8 300
LR: learning rate, SGD: stochastic gradient descent, AdamW: Adam with decoupled weight decay.
3.2. Comparison
We compare our model with typical one-stage object detectors. As is shown in Table
3, even though Yolov5 and EfficientDet are the most powerful detectors in this task, the
high false positive rate and missing detections cannot be ignored. By integrating them (2
learners), all evaluation metrics are significantly improved, but the false positive rate is
increased to 51.6%, since the false positives come from both Yolov5 and EfficientDet. Un-
der the guide of our third learner EfficientNet, the false positive rate is reduced to 0.3%.
What is also worth mentioning is that, after introducing the third learner, some metrics
are slightly decreased. It is because that EfficientNet wrongly treats some fire images as
non-fire ones, and then ignores the object detection results, but we consider it is worth-
while to sacrifice a tiny decrease in average precision and recall for substantial improve-
ment in the false positive rate. To sum up, our model (3 learners) is superior in AP
., AP
,
AP
, AP
, AR., AR, AR, AR, FPR, and FA compared with other typical object detec-
tors. Comprehensive improvements make the model have better performance in detecting
different types of forest fires: small-scale fires, medium-scale fires, big-scale fires, ground
fires, trunk fires, canopy fires, and fires at night (Figures 12 and 13). Faced with fire-like
objects (e.g., sun), our model will not be interfered. (Figure 14).
Table 3. Experiments on our dataset—evaluating models using Microsoft COCO criteria, FPR, FA, and latency.
Model 𝐀𝐏𝟎.𝟓 𝐀𝐏𝐒 𝐀𝐏𝐌 𝐀𝐏𝐋 𝐀𝐑𝟎.𝟓 𝐀𝐑𝐒 𝐀𝐑𝐌 𝐀𝐑𝐋 FPR FA Latency(ms)
SSD 66.8 37.8 42.4 78.6 70.1 39.1 45.7 82.7 45.6 92.6 88.8
Yolov3 66.4 26.0 44.6 78.1 71.1 26.1 52.5 82.5 22.9 88.0 15.6
Yolov3-SPP 68.3 56.3 49.9 76.7 73.9 60.9 56.6 81.9 30.7 93.3 15.6
Yolov4 69.6 53.7 48.9 78.4 75.5 60.9 57.5 83.9 61.9 94.1 20.5
Yolov5 70.5 51.9 53.7 79.2 75.6 56.5 61.2 83.0 22.6 94.7 28.0
EfficientDet 75.7 63.7 58.5 83.0 79.2 65.2 63.9 86.5 41.8 95.5 65.6
Ours (2 learners) 79.7 72.2 65.6 85.5 84.1 76.1 73.1 89.3 51.6 99.4 66.8
Ours (3 learners) 79.0 72.2 64.9 84.7 83.8 76.1 72.6 88.9 0.3 98.9 66.8
Note that AP
., AP
, AP
, AP
, AR., AR, AR, AR, FPR, and FA are all percentages. The best figure of each metric
are highlighted in bold.
Forests 2021, 12, 217 12 of 17
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
Figure 12. Our ensemble model (3 learners) has better performance on ground fires, trunk fires, and canopy fires. (a) Four
ground fires detected by Yolov5; (b) Yolov5 fails to detect the trunk fire; (c) three canopy fires detected by Yolov5; (d) four
ground fires detected by EfficientDet; (e) the trunk fire detected by EfficientDet; (f) two canopy fires detected by Effi-
cientDet; (g) six ground fires detected by our model; (h) the trunk fire detected by our model; (i) three canopy fires detected
by our model.
(a) (b) (c)
Forests 2021, 12, 217 13 of 17
(d) (e) (f)
(g) (h) (i)
Figure 13. Our improved model has better performance on small-scale, medium-scale, and big-scale fires at night. (a)
Medium-scale and big-scale fires detected by Yolov5; (b) medium-scale and big scale fires detected by Yolov5; (c) small-
scale, medium-scale and big-scale fires detected by Yolov5; (d) medium-scale and big-scale fires detected by EfficientDet;
(e) medium-scale and big scale fires detected by EfficientDet; (f) small-scale, medium-scale, and big-scale fires detected
by EfficientDet; (g) medium-scale and big-scale fires detected by our model; (h) medium-scale and big scale fires detected
by our model; (i) small-scale, medium-scale, and big-scale fires detected by our model.
(a) (b)
(c) (d)
Forests 2021, 12, 217 14 of 17
(e) (f)
Figure 14. Under the guide of EfficientNet, our ensemble model has a good discriminability between fire and fire-like
objects (e.g., sun). (a) True negative of Yolov5; (b) false positive of Yolov5 (confidence score: 0.59); (c) false positive of
EfficientDet (confidence score 0.71); (d) true negative of EfficientDet; (e) true negative of our model; (f) true negative of
our model.
4. Discussion
Compared with other common objects that have fixed form, forest fire is a dynamic
object [44]. In the real-world scenario, a forest fire usually starts from small-scale fire, de-
velops to medium-scale fire, and then to big-scale fire [45]. In terms of types, it starts from
ground fire, then spreads to the trunk, and finally to the canopy [46]. The various shapes,
sizes, textures, and colors of forest fires make the fire evolution a complex process, and
bring great difficulty in fire detection.
Therefore, it is highly imperative for detectors to be sensitive to different types of
fires. Through careful experimental comparisons, we find that no single detector that can
handle all kinds of fires. They have respective advantages and disadvantages. Yolov5 is
excellent at detecting long-area fires (Figure 7), but it frequently misses objects (Figure 8).
EfficientDet is a more careful detector, compared to Yolov5; even though it has a bad per-
formance on long-area fires (Figure 7), it can detect fires that Yolov5 cannot (Figure 8),
meaning that it is a good partner for Yolov5. Our model, which efficiently integrates de-
cisions of these two powerful learners, boost detection performance by 2.5–10.9%, in terms
of AP
., AP
, AP
, AP
, AR. , AR, AR, AR. The significant improvements of aver-
age precision and average recall for small, medium, and big objects make the model more
sensitive to the size changes of fires, thereby enhancing detection performance on differ-
ent types of forest fires: ground fire, trunk fire, canopy fire, and fires at night in the fire
evolution (Figures 12 and 13).
Another problem is that the false positive rate of the improved model (2 learners)
becomes higher: 22.6% to 51.6% since the model also integrates wrong detection results
from both learners. To address this issue, we use 8185 images containing 2381 forest fire
images and 5804 non-fire images (containing fire-like images and normal forest images)
to train our third learner EfficientNet. Sufficient training sets enabled EfficientNet to show
a good discriminability between fire objects and fire-like objects, with 99.6% accuracy on
476 fire images, and 99.7% accuracy on 676 fire-like images. With the help of the leader
learner EfficientNet, wrong detection results are eliminated, and the false positive rate is
significantly decreased to 0.3% (Figure 14). Noticeably, the join of EfficientNet reduces
AP
. , AP
, AP
, AR. , AR, AR by roughly 1%, which is because that EfficientNet
wrongly ignores 2 fire images containing medium-scale and big-scale fire objects.
In terms of latency, the Yolo family is superior compared to EfficientDet and SSD.
Excellent inference speed makes Yolo family widely used in real-world applications, but
experimental results show that they are not able to have a satisfactory performance on
forest fire detection tasks. The latency of EfficientDet is 65.6 ms, which is over twice that
of Yolov5 (28.0 ms), but EfficientDet outperforms Yolov5 by over 5% regarding detection
performance. We ensemble these three learners Yolov5 (28.0 ms), EfficientDet (65.6 ms),
EfficientNet (31.3 ms) in parallel to make sure that our model can achieve the best perfor-
mance without any extra latency. The final latency of our model (3 learners) is 66.8 ms,
Forests 2021, 12, 217 15 of 17
which shows that an excellent trade-off between detection performance and efficiency has
been achieved, and the model is applicable for real-time detection task.
For further improvement, we plan to study the labeling strategy for forest fires, since
the quality of training data directly determines the detection performance. Another inter-
esting extension is to investigate the network architecture of backbones, and modify them
to make sure that they are specially designed for forest fire detection task. Additionally,
we will work on developing a forest fire tracking system, which can classify different
types of forest fires: ground fire, trunk fire and canopy fire, to track the evolution and
spread of forest fires.
5. Conclusions
The successful application of convolutional neural networks significantly improves
the performance of object detection. However, forest fire is a dynamic object with no fixed
form, which the individual object detector cannot handle. In addition, object detectors are
easy to be deceived by fire-like objects and generate false positives due to their limited
visual field. To address these problems, a novel ensemble learning method for real-time
forest fire detection is proposed in this paper. Two powerful object detectors (Yolov5 and
EfficientDet) with different expertise are integrated to make the whole model more robust
to diverse forest fire scenarios. Then, a leader (EfficientNet) is introduced to guide the
detection process to reduce false positives. Experimental results show that, compared
with other popular object detectors, our model achieves a superior trade-off among aver-
age precision, average recall, false positive rate, frame accuracy, and latency. The signifi-
cant improvements make it possible for the model to perform well in real-world forestry
applications.
Author Contributions: R.X. devised the programs and drafted the initial manuscript. H.L. and K.L.
helped with data collection, data analysis, and figures and tables. L.C. contributed to fund acquisi-
tion and writing embellishment. Y.L. designed the project and revised the manuscript. All authors
have read and agreed to the published version of the manuscript.
Funding: This research was funded by the National Key R&D Program of China (grant number
2017YFD0600904) and the Priority Academic Program Development of Jiangsu Higher Education
Institutions (PAPD).
Data Availability Statement: Publicly available datasets were analyzed in this study. The data can
be found here: BowFire [28], FD-dataset [29], ForestryImages [30], VisiFire [31].
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Zhang, J.; Li, W.; Yin, Z.; Liu, S.; Guo, X. Forest fire detection system based on wireless sensor network. In Proceedings of the
4th IEEE Conference on Industrial Electronics and Applications (ICIEA 2009), Xi’an, China, 25–27 May 2009; pp. 520–523.
2. Yu, L.; Wang, N.; Meng, X. Real-time forest fire detection with wireless sensor networks. In Proceedings of the International
Conference on Wireless Communications, Networking and Mobile Computing (WiCOM 2005), Wuhan, China, 26 September
2005; pp. 1214–1217.
3. Chen, S.J.; Hovde, D.C.; Peterson, K.A.; Marshall, A.W. Fire detection using smoke and gas sensors. Fire Saf. J. 2007, 42, 507–515,
doi:10.1016/j.firesaf.2007.01.006.
4. Zhang, F.; Zhao, P.; Xu, S.; Wu, Y.; Yang, X.; Zhang, Y. Integrating multiple factors to optimize watchtower deployment for
wildfire detection. Sci. Total Environ. 2020, 737, 139561, doi:10.1016/j.scitotenv.2020.139561.
5. Zhang, F.; Zhao, P.; Thiyagalingam, J.; Kirubarajan, T. Terrain-influenced incremental watchtower expansion for wildfire
detection. Sci. Total Environ. 2018, 654, 164–176, doi:10.1016/j.scitotenv.2018.11.038.
6. Lee, B.; Kwon, O.; Jung, C.; Park, S. The development of UV/IR combination flame detector. J. KIIS 2001, 16, 1–8.
7. Kang, D.; Kim, E.; Moon, P.; Sin, W.; Kang, M. Design and analysis of flame signal detection with the combination of UV/IR
sensors. J. Korean Soc. Int. Inf. 2013, 14, 45–51, doi:10.7472/jksii.2013.14.2.45.
8. Fernandes, A.M.; Utkin, A.B.; Lavrov, A.V.; Vilar, R.M. Development of neural network committee machines for automatic
forest fire detection using lidar. Pattern Recognit. 2004, 37, 2039–2047, doi:10.1016/j.patcog.2004.04.002.
Forests 2021, 12, 217 16 of 17
9. Chen, T.H.; Wu, P.H.; Chiou, Y.C. An early fire-detection method based on image processing. In Proceedings of the IEEE
International Conference on Image Processing (ICIP 2004), Singapore, 24–27 October 2004; pp. 1707–1710.
10. Töreyin, B.U.; Dedeoğlu, Y.; Güdükbay, U.; Cetin, A.E. Computer vision based method for real-time fire and flame detection.
Pattern Recognit. Lett. 2006, 27, 49–58, doi:10.1016/j.patrec.2005.06.015.
11. Çelik, T.; Özkaramanlı, H.; Demirel, H. Fire and smoke detection without sensors: Image processing based approach. In
Proceedings of the IEEE 15th European Signal Processing Conference (EUSIPCO 2007), Poznan, Poland, 3–7 September 2007;
pp. 1794-1798.
12. Teng, Z.; Kim, J.H.; Kang, D.J. Fire detection based on hidden Markov models. Int. J. Control Autom. Syst. 2010, 8, 822–830,
doi:10.1007/s12555-010-0414-2.
13. Chino, D.Y.; Avalhais, L.P.; Rodrigues, J.F.; Traina, A.J. Bowfire: Detection of fire in still images by integrating pixel color and
texture analysis. In Proceedings of the 28th SIBGRAPI Conference on Graphics, Patterns and Images, Salvador, Brazil, 26–29
August 2015; pp. 95–102.
14. Wu, S.; Zhang, L. Using popular object detection methods for real time forest fire detection. In Proceedings of the 11th
International Symposium on Computational Intelligence and Design (ISCID 2018), Hangzhou, China, 8–9 December 2018; pp.
280–284.
15. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans.
Pattern Anal. Mach. Intell. 2016, 39, 1137–1149, doi:10.1109/TPAMI.2016.2577031.
16. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–
788.
17. Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR 2017), Honolulu, Hawaii, 21–26 July 2017; pp. 7263–7271.
18. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv: 1804.02767.
19. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:
2004.10934.
20. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings
of the European Conference on Computer Vsion (ECCV 2016), Amsterdam, the Netherlands, 8–16 October 2016; pp. 21–37.
21. Kim, B.; Lee, J. A video-based fire detection using deep learning models. Appl. Sci. 2019, 9, 2862, doi:10.3390/app9142862.
22. Lee, Y.; Shim, J. False Positive Decremented Research for Fire and Smoke Detection in Surveillance Camera using Spatial and
Temporal Features Based on Deep Learning. Electronics 2019, 8, 1167, doi:10.3390/electronics8101167.
23. Pan, H.; Badawi, D.; Cetin, A.E. Computationally Efficient Wildfire Detection Method Using a Deep Convolutional Network
Pruned via Fourier Analysis. Sensors 2020, 20, 2891, doi:10.3390/s20102891.
24. Wu, S.; Guo, C.; Yang, J. Using PCAand one-stage detectors for real-time forest fire detection. J. Eng. 2020, 2020, 383–387,
doi:10.1049/joe.2019.1145.
25. Ultralytics-Yolov5. Availabe online: https://github.com/ultralytics/yolov5 (accessed on 1 Januray 2021).
26. Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR 2020), Washington, DC, USA, 14–19 June 2020; pp. 10781–10790.
27. Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International
Conference on Machine Learning (ICML 2019), Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114.
28. BoWFire Dataset. Availabe online: https://bitbucket.org/gbdi/bowfire-dataset/downloads/ (accessed on 1 Januray 2021).
29. FD-Dataset. Availabe online: http://www.nnmtl.cn/EFDNet/ (accessed on 1 Januray 2021).
30. ForestryImages. Availabe online: https://www.forestryimages.org/browse/subthumb.cfm?sub=740 (accessed on 1 Januray
2021).
31. VisiFire. Availabe online: http://signal.ee.bilkent.edu.tr/VisiFire/ (accessed on 1 Januray 2021).
32. Everingham, M.; Eslami, S.A.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes challenge:
A retrospective. Int. J. Comput. Vis. 2015, 111, 98–136, doi:10.1007/s11263-014-0733-5.
33. Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in
context. In Proceedings of the 13th European Conference on Computer Cision (ECCV 2014), Zurich, Switzerland, 6–12
September 2014; pp. 740–755.
34. Wang, C.Y.; Mark Liao, H.Y.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning
capability of cnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2020),
Washington, DC, USA, 14–19 June 2020; pp. 390–391.
35. Wang, K.; Liew, J.H.; Zou, Y.; Zhou, D.; Feng, J. Panet: Few-shot image semantic segmentation with prototype alignment. In
Proceedings of the IEEE International Conference on Computer Vision (ICCV 2019), Seoul, Korea, 20–26 October 2019; pp. 9197–
9206.
36. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778.
Forests 2021, 12, 217 17 of 17
37. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, Hawaii, 21–26 July 2017; pp. 4700–4708.
38. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 26 June–1 July 2016; pp.
1492-–1500.
39. Neubeck, A.; Van Gool, L. Efficient non-maximum suppression. In Proceedings of the 18th International Conference on Pattern
Recognition (ICPR 2006), Hong Kong, China, 20–24 August 2006; pp. 850–855.
40. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L. Pytorch: An
imperative style, high-performance deep learning library. In Proceedings of the Neural Information Processing Systems (NIPS
2019), Vancouver, BC, Canada, 8–14 December 2019; pp. 8026–8037.
41. Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proceedings of the 19th International Conference
on Computational Statistics (COMPSTAT 2010), Paris, France, 22–27 August 2010; pp. 177–186.
42. Zinkevich, M.; Weimer, M.; Li, L.; Smola, A.J. Parallelized stochastic gradient descent. In Proceedings of the Neural Information
Processing Systems (NIPS 2010), Vancouver, BC, Canada, 6–11 December 2010; pp. 2595–2603.
43. Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv: 1711.05101.
44. Merino, L.; Caballero, F.; Martínez-de-Dios, J.R.; Maza, I.; Ollero, A. An unmanned aircraft system for automatic forest fire
monitoring and measurement. J. Intell. Robot. Syst. 2012, 65, 533–548, doi:10.1007/s10846-011-9560-x.
45. Serón, F.J.; Gutiérrez, D.; Magallón, J.; Ferragut, L.; Asensio, M.I. The evolution of a wildland forest fire front. Vis. Comput. 2005,
21, 152–169, doi:10.1007/s00371-004-0278-7.
46. Pimont, F.; Dupuy, J.L.; Linn, R.R.; Dupont, S. Impacts of tree canopy structure on wind flows and fire propagation simulated
with FIRETEC. Ann. For. Sci. 2011, 68, 523–530, doi:10.1007/s13595-011-0061-7.
... The network achieved an accuracy of 87 percent if we agree within 1 of the actual growth stage for these species, while the average accuracy for these species was 70 percent. Renjie Xu et al. [9] Suggest a novel ensemble learning approach for detecting forest fires in various scenarios in this paper. To begin, two separate learners, Yolov5 and EfficientDet, are combined to complete the fire detection process. ...
... Yolov5's structure is based on a single-stage detection structure [9]. An overview of the modified weed detection model is shown in Figure 4, subdivided into the stages discussed below. ...
... The maximum pooling method of 1×1, 5×5, 9×9, and 13×13 is used for multi-scale fusion [13]. As shown in figure 7, the input is 512x20x20, the output is 256x20x20 after the 1x1 convolutional layer, and it is then downsampled by three parallel Max Pools of different kernel sizes (5,9,13). It is important to note that the max pool's padding is consistent throughout. ...
Research
Full-text available
According to a recent study and analysis in agriculture, various factors influence crop yield. Weeds are the most significant threat to crop yield. Weed control is a worldwide issue that has received much coverage in recent years. This paper presents a method for developing a deep convolutional neural network (CNN) for weed identification based on the modified YOLO architecture with several pre-processing techniques. An image labeler using the Roboflow framework is used to locate the regions of interest as part of the image processing. We have used novel Mosaic data augmentation in this model to address the well-known "small object detection problem." To train the developed model, we created 3600 images with different sizes of weed. Sizes of YOLO anchor box were calculated from the training dataset using a k-means clustering approach. The model that resulted was tested on 10% of the images. We may justify that the established model could detect weed with an appropriate recall rate and mAP based on the experimental results. This method determines whether an object on the farm is a weed by drawing a bounding box around it and assigning a label to it.
... The first ICP measure was performed by Queckenstedt in 1916 by a lumbar puncture [20]. However, it was Lundberg who performed the first catheterization to monitor and treat the ICP [20]. ...
... The first ICP measure was performed by Queckenstedt in 1916 by a lumbar puncture [20]. However, it was Lundberg who performed the first catheterization to monitor and treat the ICP [20]. This method was accepted in the 1970s and has since 2007 been a key part of managing serve traumatic brain injury [4]. ...
Article
Full-text available
Purpose Intracranial pressure (ICP) control is important to avoid secondary brain injury in patients with intracranial pathologies. Current methods for measuring ICP are invasive and carry risks of infection and hemorrhage. Previously we found correlation between ICP and the arteriole-venous ratio (A/V ratio) of retinal vessels in an outpatient setting. This study investigated the usability of fundoscopy for non-invasive ICP estimation with the addition of intraocular pressure (IOP) in patients in a neuro-intensive care unit (NICU). Methods This single-center prospective cohort study was conducted at the NICU of Odense University Hospital from September 2020 to May 2021. Adult patients with a Glasgow Coma Score of 8 or less, who underwent invasive pressure neuromonitoring were included. Fundoscopy videos were captured daily and analyzed using deep learning algorithms. The A/V ratio was calculated and correlated with ICP. The data was analyzed using mixed-effect linear regression models. Results Forty patients were enrolled. Fifteen were included in the final analysis. ICP ranged from -1 to 31 mmHg (mean: 10.9, SD: 5.7), and IOP ranged from 4 to 13 mmHg (mean: 7.4, SD: 2.1). The A/V ratio showed a significant negative correlation with ICP > 15 mmHg (regression slope: -0.0659, 95%-CI: [-0.0665;-0.0653], p < 0.001). No significant change in A/V ratio was observed for ICP ≤ 15 mmHg. A similar significant correlation was found for ICP > IOP (regression slope: -0.0055, 95%-CI: [-0.0062;-0.0048], p < 0.001). Taking the IOP into account did not improve the model. The sensitivity analysis showed a sensitivity of 80.08% and a specificity of 22.51%, with an AUC of 0.6389. Conclusion In line with our previous work, non-invasive fundoscopy is a potential tool for detecting elevated ICP. However, challenges such as image quality and diagnostic specificity remains. Further research with larger, multi-center studies are needed to validate the utility. Standardization may enhance the technique's clinical applicability.
... The second is the neck, which fuses features utilizing PANet. Lastly, the YOLO layer, which represents the head, produces the detection results and provides class labels, confidence scores, item locations, and sizes [43]. This structured architecture enables YOLO to efficiently perform object detection tasks with high levels of accuracy and efficiency [44]. ...
... YOLOv5 architecture adapted from[43]. ...
Article
Full-text available
Security has been paramount to many organizations for many years, with access control being one of the critical measures to ensure security. Among various approaches to access control, vehicle plate number recognition has received wide attention. However, its application to boom gate access has not been adequately explored. This study proposes a method to access the boom gate by optimizing vehicle plate number recognition. Given the speed and accuracy of the YOLO (You Only Look Once) object detection algorithm, this study proposes using the YOLO deep learning algorithm for plate number detection to access a boom gate. To identify the gap and the most suitable YOLO variant, the study systematically surveyed the publication database to identify peer-reviewed articles published between 2020 and 2024 on plate number recognition using different YOLO versions. In addition, experiments are performed on four YOLO versions: YOLOv5, YOLOv7, YOLOv8, and YOLOv9, focusing on vehicle plate number recognition. The experiments, using an open-source dataset with 699 samples in total, reported accuracies of 81%, 82%, 83%, and 73% for YOLO V5, V7, V8, and V9, respectively. This comparative analysis aims to determine the most appropriate YOLO version for the task, optimizing both security and efficiency in boom gate access control systems. By optimizing the capabilities of advanced YOLO algorithms, the proposed method seeks to improve the reliability and effectiveness of access control through precise and rapid plate number recognition. The result of the analysis reveals that each YOLO version has distinct advantages depending on the application’s specific requirements. In complex detection conditions with changing lighting and shadows, it was revealed that YOLOv8 performed better in terms of reduced loss rates and increased precision and recall metrics.
... The results of this layer are then passed to the neck, which is based on PANnet and combines image features, passing them to the head. The head basically constitutes convolution layers responsible for generating predictions and bounding boxes [24]. Although more recent and sophisticated versions are available such as YOLOv8, this architecture is quite similar to YOLOv5, which, after training on a custom dataset, proved to be sufficient as a component of our pipeline. ...
Article
Full-text available
Edible flowers, with their increasing demand in the market, face a challenge in labor-intensive hand-picking practices, hindering their attractiveness for growers. This study explores the application of artificial intelligence vision for robotic harvesting, focusing on the fundamental elements: detection, pose estimation, and plucking point estimation. The objective was to assess the adaptability of this technology across various species and varieties of edible flowers. The developed computer vision framework utilizes YOLOv5 for 2D flower detection and leverages the zero-shot capabilities of the Segmentation Anything Model for extracting points of interest from a 3D point cloud, facilitating 3D space flower localization. Additionally, we provide a pose estimation method, a key factor in plucking point identification. The plucking point is determined through a linear regression correlating flower diameter with the height of the plucking point. The results showed effective 2D detection. Further, the zero-shot and standard machine learning techniques employed achieved promising 3D localization, pose estimation, and plucking point estimation.
... In comparison, YOLOv5 architecture consists of three parts, namely the back (CSP Darknet), neck (PANet), and head (YOLO). On the head, YOLO layer generates 3 different sizes (18x18, 36x36, and 72x72) enabling the model to perform multi-scale predictions [15]. ...
Article
Full-text available
This study underscores the critical role of accurate Chaetodontidae fish abundance observations, particularly in assessing coral reef health. By integrating deep learning algorithms (Faster R-CNN, SSD-MobileNet, and YOLOv5) into Autonomous Underwater Vehicles (AUVs), the research aims to expedite fish identification in aquatic environments. Evaluating the algorithms, YOLOv5 emerges with the highest accuracy, followed by Faster R-CNN and SSD-MobileNet. Despite this, SSD-MobileNet showcases superior computational speed with a mean average precision (mAP) of around 92.21% and a framerate of about 1.24 fps. Furthermore, employing the Coral USB Accelerator enhances computational speed on the Raspberry Pi 4, enabling real-time detection capabilities. This study incorporates centroid tracking, facilitating accurate counting by assigning unique IDs to identified objects per class. Ultimately, the real-time implementation of the system achieves 87.18% accuracy and 87.54% precision at 30 fps, empowering AUVs to conduct real-time fish detection and tracking, thereby significantly contributing to underwater research and conservation efforts.
... Network architecture for YOLOv5 of[55] (best viewed when zoomed in). ...
Article
Full-text available
Unmanned aerial vehicle (UAV) detection in real-time is a challenging task despite the advances in computer vision and deep learning techniques. The increasing use of UAVs in numerous applications has generated worries about possible risks and misuse. Although vision-based UAV detection methods have been proposed in recent years, a standing open challenge and overlooked issue is that of adverse weather. This work is the first, to the best of our knowledge, to investigate the impact of adverse weather conditions and image distortions on vision-based UAV detection methods. To achieve this, a custom training dataset was curated with images containing a variety of UAVs in diverse complex backgrounds. In addition, this work develops a first-of-its-kind dataset, to the best of our knowledge, with UAV-containing images affected by adverse conditions. Based on the proposed datasets, a comprehensive benchmarking study is conducted to evaluate the impact of adverse weather and image distortions on the performance of popular object detection methods such as YOLOv5, YOLOv8, Faster-RCNN, RetinaNet, and YOLO-NAS. The experimental results reveal the weaknesses of the studied models and the performance degradation due to adverse weather, highlighting avenues for future improvement. The results show that even the best UAV detection model’s performance degrades in mean average precision (mAP) by −50.62 points in torrential rain conditions, by −52.40 points in high noise conditions, and by −77.0 points in high motion blur conditions. To increase the selected models’ resilience, we propose and evaluate a strategy to enhance the training of the selected models by introducing weather effects in the training images. For example, the YOLOv5 model with the proposed enhancement strategy gained +35.4, +39.3, and +44.9 points higher mAP in severe rain, noise, and motion blur conditions respectively. The findings presented in this work highlight the advantages of considering adverse weather conditions during model training and underscore the significance of data enrichment for improving model generalization. The work also accentuates the need for further research into advanced techniques and architectures to ensure more reliable UAV detection under extreme weather conditions and image distortions.
... Hình 2: Kiến trúc của Yolov5 [5] Kiến trúc: Kiến trúc của YOLOv5 gồm ba phần chính: ...
Article
Deep learning-học sâu lấy ý tưởng từ bộ não sinh học, các mô hình học sâu xây dụng các thuật toán giúp máy duy nghĩ và xử lý thông tin giống như bộ não con người. Các mô hình, thuật toán học sâu phát triển ngày càng rộng rãi và được ứng dụng nhiều vào thực tiễn nhằm giảm thiểu tối đa sức lao động của con người. Bài báo trình bày về các vấn đề liên quan đến mô hình Yolov5, bao gồm nguyên lý hoạt động, áp dụng mô hình để đào tạo cho dữ liệu từ đó nhận diện biển số xe và đánh giá mô hình. Kết quả chỉ ra mô hình có độ chính xác cao chứng tỏ tính khả thi khi ứng dụng trong thực tế.
Article
With the increasing use of light alloy castings in automobiles, ensuring quality control is essential for safety. X-ray imaging offers a practical approach to detecting internal defects in cast components. This study proposes a method to automatically and in real-time identify the location, type, and size of internal defects in aluminum parts produced via high-pressure casting. The proposed two-stage method can detect, segment, and grade defects without expensive hardware in less than a second. Using the YOLOv5 algorithm for defect detection in the first stage, a mean Average Precision (mAP) of 0.971 was achieved. In the second stage, defect grading is performed through segmentation, enabling classification in accordance with international standards without requiring additional training. The methodology provides real-time and highly accurate internal defect quality control and can be applied to different metals and standards. The dataset used in this study contains over 5,000 labelled X-ray images of aluminum cast parts, and it is made available as open access to support the NDT community and researchers.
Article
Full-text available
This study investigated the potential for using principal component analysis (PCA) to improve real‐time forest fire detection with popular algorithms, such as YOLOv3 and SSD. Before YOLOv3/SSD training, the authors utilised PCA to extract features. Results showed that PCA with YOLOv3 increased the mean average precision (mAP) by 7.3%. PCA with SSD increased the mAP by 4.6%. These results suggest that PCA to be a robust tool for improving different objective detection networks.
Article
Full-text available
In this paper, we propose a deep convolutional neural network for camera based wildfire detection. We train the neural network via transfer learning and use window based analysis strategy to increase the fire detection rate. To achieve computational efficiency, we calculate frequency response of the kernels in convolutional and dense layers and eliminate those filters with low energy impulse response. Moreover, to reduce the storage for edge devices, we compare the convolutional kernels in Fourier domain and discard similar filters using the cosine similarity measure in the frequency domain. We test the performance of the neural network with a variety of wildfire video clips and the pruned system performs as good as the regular network in daytime wild fire detection, and it also works well on some night wild fire video clips.
Article
Full-text available
Fire must be extinguished early, as it leads to economic losses and losses of precious lives. Vision-based methods have many difficulties in algorithm research due to the atypical nature fire flame and smoke. In this study, we introduce a novel smoke detection algorithm that reduces false positive detection using spatial and temporal features based on deep learning from factory installed surveillance cameras. First, we calculated the global frame similarity and mean square error (MSE) to detect the moving of fire flame and smoke from input surveillance cameras. Second, we extracted the fire flame and smoke candidate area using the deep learning algorithm (Faster Region-based Convolutional Network (R-CNN)). Third, the final fire flame and smoke area was decided by local spatial and temporal information: frame difference, color, similarity, wavelet transform, coefficient of variation, and MSE. This research proposed a new algorithm using global and local frame features, which is well presented object information to reduce false positive based on the deep learning method. Experimental results show that the false positive detection of the proposed algorithm was reduced to about 99.9% in maintaining the smoke and fire detection performance. It was confirmed that the proposed method has excellent false detection performance.
Article
Full-text available
Fire is an abnormal event which can cause significant damage to lives and property. In this paper, we propose a deep learning-based fire detection method using a video sequence, which imitates the human fire detection process. The proposed method uses Faster Region-based Convolutional Neural Network (R-CNN) to detect the suspected regions of fire (SRoFs) and of non-fire based on their spatial features. Then, the summarized features within the bounding boxes in successive frames are accumulated by Long Short-Term Memory (LSTM) to classify whether there is a fire or not in a short-term period. The decisions for successive short-term periods are then combined in the majority voting for the final decision in a long-term period. In addition, the areas of both flame and smoke are calculated and their temporal changes are reported to interpret the dynamic fire behavior with the final fire decision. Experiments show that the proposed long-term video-based method can successfully improve the fire detection accuracy compared with the still image-based or short-term video-based method by reducing both the false detections and the misdetections.
Article
Traditional human-vision-based watchtower systems are being gradually replaced by the machine-vision-based watchtower system. The visual range of machine-vision-based watchtower is smaller than the range of traditional human-vision-based watchtower, which has led to a sharp increase in the number of towers that should be deployed. Consequently, the overlapping area between watchtowers is larger and overlaps are more frequent than in conventional watchtower networks. This poses an urgent challenge: identifying the optimal locations for deployment. If the number of required watchtowers must be increased to extend the detection coverage, overlaps among watchtowers are inevitable and result in viewshed redundancy. However, this redundancy of the viewshed resources of the watchtowers has not been utilized in the design of fire detection systems. Moreover, fire ignition factors, such as climatic factors, fuels, and human behaviour, cause the fire occurrence risk to differ among forest areas. Thus, the fire risk map of the area should also be considered in watchtower deployment. A fire risk model is used as the first step in producing the fire risk map, which is used to propose a new watchtower deployment model for optimizing the watchtower system by integrating viewshed analysis, location allocation, and multi-coverage of the high-fire-risk area while considering the budget constraints for providing optimal coverage. We use a real dataset of a forest park to evaluate the applicability of our approach. The proposed approach is evaluated against the FV-NB (Full coVerage with No Budget constraint) algorithm and the XV-B (maXimum possible coVerage with a Budget constraint) algorithm in terms of performance. The evaluation results demonstrate that our approach realizes higher coverage gain and excellent multiple-coverage of the fire risk area by integrating the viewshed and the fire risk level into location allocation while satisfying requirements on the overall coverage and budget. The proposed approach is more suitable in the environments with moderate watchtower density, in which overlapping areas are frequent. It offers as much as 8.9–17.3% improvement of multiple-coverage of the high-fire-risk area.
Article
Optimizing the effectiveness of early wildfire detection systems is of significant interest to the community. To this end, watchtower-based wildfire observations are continuing to be practical, often in conjunction with state-of-the-art technologies, such as automated vision systems and sensor networks. One of the major challenges that the community faces is the optimal expansion of existing systems, particularly in multiple stages due to various practical, political and financial constraints. The notion of incremental watchtower expansion while preserving or making minimal changes to an existing system is a challenging task, particularly while meeting coverage and financial constraints. Conventionally and historically, this problem has been treated as a multi-objective optimization problem, and as such, currently employed methods are predominantly focused on the full-fledged optimization problem, where the problem is re-solved every time during the expansion process. In this paper, for the first time, we propose an alternative approach, by treating the expansion as a submodular set-function maximization problem. By theoretically proving that the expansion problem is a submodular set-function maximization problem, we provide four different models and matching algorithms to handle various cases that arise during the incremental expansion process. Our evaluation of the proposed approach on a practical dataset from a forest park in China, namely, the NanJing forest park, shows that our algorithms can provide an excellent coverage by integrating visibility analysis and location allocation while meeting the stringent budgetary requirements. The proposed approach can be adapted to areas of other countries.