ArticlePDF Available

An Improved Faster R-CNN Method to Detect Tailings Ponds from High-Resolution Remote Sensing Images

MDPI
Remote Sensing
Authors:
  • Aerospace Information Research Institute in Chinese Academy of Sciences

Abstract and Figures

Dam failure of tailings ponds can result in serious casualties and environmental pollution. Therefore, timely and accurate monitoring is crucial for managing tailings ponds and preventing damage from tailings pond accidents. Remote sensing technology facilitates the regular extraction and monitoring of tailings pond information. However, traditional remote sensing techniques are inefficient and have low levels of automation, which hinders the large-scale, high-frequency, and high-precision extraction of tailings pond information. Moreover, research into the automatic and intelligent extraction of tailings pond information from high-resolution remote sensing images is relatively rare. However, the deep learning end-to-end model offers a solution to this problem. This study proposes an intelligent and high-precision method for extracting tailings pond information from high-resolution images, which improves deep learning target detection model: faster region-based convolutional neural network (Faster R-CNN). A comparison study is conducted and the model input size with the highest precision is selected. The feature pyramid network (FPN) is adopted to obtain multiscale feature maps with rich context information, the attention mechanism is used to improve the FPN, and the contribution degrees of feature channels are recalibrated. The model test results based on GoogleEarth high-resolution remote sensing images indicate a significant increase in the average precision (AP) and recall of tailings pond detection from that of Faster R-CNN by 5.6% and 10.9%, reaching 85.7% and 62.9%, respectively. Considering the current rapid increase in high-resolution remote sensing images, this method will be important for large-scale, high-precision, and intelligent monitoring of tailings ponds, which will greatly improve the decision-making efficiency in tailings pond management.
This content is subject to copyright.
remote sensing
Article
An Improved Faster R-CNN Method to Detect Tailings Ponds
from High-Resolution Remote Sensing Images
Dongchuan Yan 1,2,3, Guoqing Li 1,*, Xiangqiang Li 3, Hao Zhang 1, Hua Lei 3, Kaixuan Lu 1, Minghua Cheng 3
and Fuxiao Zhu 3


Citation: Yan, D.; Li, G.; Li, X.;
Zhang, H.; Lei, H.; Lu, K.; Cheng, M.;
Zhu, F. An Improved Faster R-CNN
Method to Detect Tailings Ponds from
High-Resolution Remote Sensing
Images. Remote Sens. 2021,13, 2052.
https://doi.org/10.3390/rs13112052
Academic Editor: Naoto Yokoya
Received: 28 March 2021
Accepted: 17 May 2021
Published: 23 May 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China;
yandc@radi.ac.cn (D.Y.); zhanghao612@radi.ac.cn (H.Z.); lukx@radi.ac.cn (K.L.)
2
School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences,
Beijing 100049, China
3Institute of Mineral Resources Research, China Metallurgical Geology Bureau, Beijing 101300, China;
lixiangqiang@cmgb.cn (X.L.); leihua@cmgb.cn (H.L.); chengminghua@cmgb.cn (M.C.);
zhufuxiao@cmgb.cn (F.Z.)
*Correspondence: ligq@aircas.ac.cn
Abstract:
Dam failure of tailings ponds can result in serious casualties and environmental pollution.
Therefore, timely and accurate monitoring is crucial for managing tailings ponds and preventing
damage from tailings pond accidents. Remote sensing technology facilitates the regular extraction
and monitoring of tailings pond information. However, traditional remote sensing techniques are
inefficient and have low levels of automation, which hinders the large-scale, high-frequency, and
high-precision extraction of tailings pond information. Moreover, research into the automatic and
intelligent extraction of tailings pond information from high-resolution remote sensing images is
relatively rare. However, the deep learning end-to-end model offers a solution to this problem. This
study proposes an intelligent and high-precision method for extracting tailings pond information
from high-resolution images, which improves deep learning target detection model: faster region-
based convolutional neural network (Faster R-CNN). A comparison study is conducted and the model
input size with the highest precision is selected. The feature pyramid network (FPN) is adopted to
obtain multiscale feature maps with rich context information, the attention mechanism is used to
improve the FPN, and the contribution degrees of feature channels are recalibrated. The model test
results based on GoogleEarth high-resolution remote sensing images indicate a significant increase in
the average precision (AP) and recall of tailings pond detection from that of Faster R-CNN by 5.6%
and 10.9%, reaching 85.7% and 62.9%, respectively. Considering the current rapid increase in high-
resolution remote sensing images, this method will be important for large-scale, high-precision, and
intelligent monitoring of tailings ponds, which will greatly improve the decision-making efficiency
in tailings pond management.
Keywords: tailings pond; deep learning; object detection; faster R-CNN
1. Introduction
Tailings ponds are typically storage sites enclosed by dams and located around valley
mouths or on flat terrain, where tailings or other industrial waste discharged after ore
extraction are deposited by metal and nonmetal mining companies [
1
]. Tailings ponds are
therefore a source of high potential environmental risk, with accidents leading to serious
damage to the surrounding environment [
2
]. Therefore, tailings pond monitoring has
become the focal point of environmental emergency supervision. In the past century,
the collapse of tailings dams and the resulting mud-rock flows have caused nearly 2000
deaths [
3
]. Moreover, there has been a high incidence of environmental emergencies caused
by tailings ponds in recent years, which have resulted in a large number of casualties and
serious environmental pollution [
4
]. Therefore, to improve the emergency management of
Remote Sens. 2021,13, 2052. https://doi.org/10.3390/rs13112052 https://www.mdpi.com/journal/remotesensing
Remote Sens. 2021,13, 2052 2 of 18
tailings ponds and enable early warning of disasters, a rapid, accurate, and comprehensive
method for identifying the location and status of tailings ponds and providing high-
frequency, regular information updates is urgently required.
Early methods of tailings pond monitoring often relied on manpower. As tailings
ponds are typically located in remote mountainous areas, these methods suffered from
being time-consuming and labor-intensive, with low efficiency and low precision [
5
].
Remote sensing technology is an important data acquisition method that has the advantages
of rapid, large-scale, continuous dynamics, and is less limited by ground conditions. It
can therefore compensate for the shortcomings of traditional monitoring methods, making
it an important monitoring approach for environmental protection [
6
8
]. For example,
Liu et al. [
9
] used Thematic Mapper (TM) images for rapid and efficient monitoring of the
water pollution status of a tailings pond in the Hushan mining area. Moreover, Zhao [
10
]
applied remote sensing monitoring to tailings ponds in Taershan, Shanxi Province to extract
the number, area, mineral type, and other information of tailings ponds over a large area
and in a short time. Based on the composition, structure, and spectral characteristics of
tailings, Hao et al. [
11
] developed tailing indexes and a tailing extraction model, then
extracted mine tailing information using Landsat 8 data from Hubei Province, China. Ma
et al. [
12
] extracted tailings ponds data from the Changhe mining area in Hebei Province
based on the spectral and textural characteristics of Landsat 8 OLI images. Xiao et al. [
13
]
monitored the distribution of tailings ponds in Zhangjiakou and their environmental risks
using object-oriented image analysis technology and drone images. Furthermore, Riaza
et al. [
14
] mapped pyrite waste and dumps in the mining areas on the Iberian Pyrite Belt
using Hyperion and aerial Hymap hyperspectral data.
Therefore, multisource remote sensing data have already been used in the identifica-
tion and monitoring of tailings ponds. However, these methods are limited by a heavy
workload and low level of automation. Owing to relatively large disparities in the scale,
shape, background, and other aspects of tailings ponds on remote sensing images, it is
challenging to achieve large-scale, high-frequency, and intelligent identification and moni-
toring of tailings ponds. Despite rapid increases in the number of high-resolution remote
sensing images, studies on the automatic and intelligent extraction of tailings ponds are
relatively rare. However, the deep learning end-to-end model provides a solution to this
problem. Target detection technology based on deep learning can not only determine the
category of the target but also predict its location. For example, Li et al. [
15
] used a deep
learning-based target detection model (Single Shot Multibox Detector, SSD [
16
]) to extract
and analyze tailings pond distributions in the Jing–Jin–Ji (Beijing–Tianjin–Hebei) Region of
China. Their study proved the effectiveness of the deep learning method for target detec-
tion with high-resolution remote sensing images, which greatly improved the automation
level and efficiency of tailings pond identification from that of traditional methods. With
rapid development of deep learning technology in recent years, a series of convolutional
neural networks (AlexNet [
17
], VGGNet [
18
], ResNet [
19
], DenseNet [
20
]) have achieved
continuous progress and success in the ImageNet Large-scale Visual Recognition Challenge
(ILSVRC). This has established the leading position of deep learning technology in the
field of computer vision and provided pretrained feature extraction networks for the deep
learning-based target detection model.
Compared with traditional methods, the end-to-end target detection method based
on deep learning has notable advantages in terms of precision, efficiency, and automation
level [
21
]. Deep learning-based target detection methods can be divided into two types:
one-stage detectors and two-stage detectors. Two-stage detectors generate a series of region
proposals in the first stage, then perform category classification and accurate position
regression on region proposals in the second stage. At present, the majority of two-stage
detectors are developed and optimized based on the region-based convolutional neural
network (R-CNN) [
22
], including Fast R-CNN [
23
] and Faster R-CNN [
24
]. Faster R-CNN
is a classic two-stage target detection model that automatically generates region propos-
als through the region proposal network (RPN), thereby integrating feature extraction,
Remote Sens. 2021,13, 2052 3 of 18
region proposal generation, bounding box classification, and position regression into one
network structure, which significantly improves the precision and calculation speed of
target detection. One-stage networks regard all positions in the image as potential targets
and performs classification prediction and position regression of targets directly at each
position on the feature map. One-stage detector models in the You Only Look Once (YOLO)
series, including YOLO [
25
], YOLOv3 [
26
], and YOLO9000 [
27
], are extremely fast due to
their simple structures. However, their detection precision is lower than that of two-stage
detectors. The SSD model has a slower detection speed than YOLO and a detection preci-
sion between that of YOLO and two-stage detectors. In summary, compared to one-stage
detectors, two-stage detectors have high detection precision and a low false detection rate
but a relatively slow detection speed and poor real-time performance. One-stage detectors
have simple network structures and fast detection speeds but relatively low detection
precision and poor detection performance for small and dense targets, which is likely to
generate positioning errors [
28
]. Mask R-CNN [
29
] extends Faster R-CNN by adding a
branch for predicting an object mask in parallel with the existing branch for bounding
box recognition; at the same time, the performance of target detection is enhanced. Li
et al. [
30
] propose a novel framework based on Mask R-CNN, to extract new and old rural
buildings even when the label is scarce, achieve a much higher mean Average Precision
(mAP) than the orthodox Mask R-CNN model. Bhuiyan et al. [
31
] applied Mask R-CNN
to automatically detect and classify ice-wedge polygons in North Slope of Alaska, found
promising model performances for all candidate image scenes with varying tundra types.
Zhao, Kang, et al. [
32
] present a method combining Mask R-CNN with building bound-
ary regularization, and its performance is comparable to that of the Mask R-CNN. Mask
R-CNN is an instance segmentation model, which further improves the performance of
target detection. However, the samples that Mask R-CNN used need to mark the accurate
boundary of the target. Unlike buildings and other targets, tailings ponds have complex
boundaries, some of which are difficult to identify. It is difficult to mark the accurate
boundary of tailings pond and need a great deal of work. Therefore, the target detection
model is selected in this study and only need to mark the bounding box of tailings pond.
To detect tailings pond targets from high-resolution remote sensing images, two-
stage detectors satisfy the requirements of detection speed and exhibit better detection
precision than one-stage detectors. Therefore, a two-stage detector is adopted in this study
for the automatic identification of tailings pond targets. The Faster R-CNN model is a
two-stage detector, however, when applied to target identification via high-resolution
remote sensing images with complex backgrounds, its detection precision is relatively low
and needs to be improved to obtain better detection precision [
33
,
34
]. Therefore, further
improvement through fast-developing technologies related to deep learning is required to
enhance the detection precision of tailings ponds. This study presents an improved Faster
R-CNN model that significantly increases the detection precision of tailings ponds with
high-resolution remote sensing images. Considering the rapid increase in the number of
high-resolution remote sensing images, this method has important applications for the
large-scale, high-precision, and intelligent identification of tailings ponds. This improved
method will greatly improve the decision-making efficiency of tailings pond management.
2. Materials and Methods
2.1. Sampling Data Generation
Hebei Province, Shanxi Province, and Liaoning Province in northern China, which
have a large number of tailings ponds, were selected as the study area for sample labeling.
By selecting tailings pond samples in a relatively large area, the limitation of sample
specificity in small areas can be reduced to a certain extent, thereby enhancing the model’s
generalization ability in large-scale applications. Based on GoogleEarth high-resolution
remote sensing image data, a total of 1200 tailings ponds were labeled as sample data to
train and test the models of interest. GoogleEarth high-resolution images have a data level
Remote Sens. 2021,13, 2052 4 of 18
of 18, a spatial resolution of 0.5 m, including three bands of red, green, and blue and 8-bit
data bits. The geographical distribution of the tailings pond samples is shown in Figure 1.
Figure 1. Geographical distribution map of tailings pond samples.
The shape of tailings pond facilities on the ground is determined by the natural
landform features as well as artificial and engineering features [
35
]. Due to the influence of
topography and geomorphology, mineral resource mining, mining technology, operation
scale, and other factors, tailings ponds can be classified into four types: cross-valley, hillside,
stockpile, and cross-river [
15
]. Cross-valley tailings ponds are formed by building a dam at
a valley mouth. Their main characteristics are a relatively short initial dam and a relatively
long reservoir area (Figure 2a). Hillside tailings ponds are surrounded by a dam built at
the foot of a mountain slope. Their main characteristics are a relatively long initial dam and
a relatively short reservoir area (Figure 2b). Stockpile tailings ponds are formed by a dam
at the periphery of a flat area. Their characteristics are a high engineering workload for the
initial dam and subsequent dams of the tailings ponds and a relatively low tailings dam
height (Figure 2c). Cross-river tailings ponds are formed by dams built to the upstream
and downstream of the riverbed. Their main characteristics are a large upstream catchment
area and a complex tailings pond and upstream drainage system. As cross-river tailings
ponds are rarely distributed in China, the sample tailings ponds labeled in this study only
included the other three types.
Based on the characteristics of the three types of tailings pond and their remote sensing
image features, a total of 1200 tailings pond samples were labeled in this study, 80% of
which were used as training samples, with the remaining 20% used as test samples. To
improve sample labeling efficiency, the samples were first marked as the external polygon
vector of the tailings pond. Thereafter, they were uniformly processed into an external
rectangle, which was used as the final detection labeling target, based on the program.
The red boxes in Figure 2indicate the labeled ground truth bounding boxes. Due to
computational limitations such as memory and GPU video memory, the remote sensing
image data were sliced into image blocks of appropriate sizes then resampled before being
input to the model to complete the calculation. According to a statistical analysis of the
labeled tailings pond samples, their lengths and widths typically ranged from 60 m to
1300 m, and the resolution of the image data was 0.5 m. To ensure the integrity of tailings
ponds in the image slices as much as possible, the image slice size was set to 2600
×
2600
pixels for slice processing in this study. An overlapping area of 512 pixels was set between
the image slices, and after processing, image slices without tailings pond information were
Remote Sens. 2021,13, 2052 5 of 18
eliminated. Thus, a total of 1697 effective training slices and 429 test slices were finally
generated. The sample set information is listed in Table 1.
Figure 2.
Remote sensing image of sample tailings pond features, with the ground truth bounding
boxes shown in red: (a) cross-valley type, (b) hillside type, and (c) stockpile type.
Table 1. The sample set information.
Sample Set Spatial Resolution (m) Size (Pixels) Slices Number
Train set 0.5 2600 ×2600 1697
Test set 0.5 2600 ×2600 429
Remote Sens. 2021,13, 2052 6 of 18
2.2. Methodology
2.2.1. Proposed Optimized Method
Faster R-CNN is a classic deep learning-based target detection model in the field of
computer vision [
36
], which exhibits relatively high recognition precision and efficiency
for large target areas. With the continuous development of deep learning technology, there
is still room for improving the precision of the Faster R-CNN model for the detection of
tailings pond targets in high-resolution remote sensing images. In this study, an improved
Faster R-CNN model was developed, whose structure is shown in Figure 3. First, after
resize, the remote sensing image slices were input to ResNet-101 for feature extraction, and
multilevel features were output. Second, the multilevel features were input into the feature
pyramid network (FPN) [
37
] with the attention mechanism (AM) for feature fusion to
generate multiscale feature maps with rich context information. Third, the feature map was
input into the RPN to generate region proposals after predicting the category and bounding
box. Fourth, the feature maps and region proposals were input into the ROIPooling layer
to generate proposal feature maps. Finally, the proposal feature maps were sent to the
subsequent fully connected layers (FC) to determine the target category and obtain the
precise position of the target bounding box.
Figure 3. Proposed optimized network structure.
Compared with the Faster R-CNN, the proposed model exhibits the following im-
provements: (1) ResNet-101 was used as the feature extraction network to enhance the
image feature extraction capability, and the FPN was adopted to perform feature fusion on
the multilevel feature output from the ResNet-101 to obtain feature maps with rich semantic
and location information; (2) the AM was adopted to improve the FPN. The contribution
degrees of feature channels were recalibrated so that features with high contribution de-
grees were enhanced and features with low contribution degrees were suppressed, thereby
further improving FPN performance; (3) the image slice size was set according to the
statistical results of the tailings pond samples, where the integrity of the tailings ponds in
the image slices was maintained as much as possible. In addition, the model input size
with the highest precision was selected by conducting a comparison study.
Attention Mechanism (AM)
The visual AM is a brain signal processing mechanism unique to human vision. In
focus target areas, more attention resources will be allocated to obtain more detailed
information, whereas information in other areas will be suppressed. Thus, high-value
information can be acquired rapidly from a large amount of information, which greatly
improves the information processing efficiency of the brain. Therefore, the AM has become
an important concept in neural networks in recent years [
38
] as it can greatly improve
network performance by focusing on only processing key information or information of
interest among large amounts of input information. In normal cases, the feature layer
extracted by a deep CNN is used, where each channel represents a different feature and
also has a different contribution to network performance. The SENet [
39
] uses the AM to
Remote Sens. 2021,13, 2052 7 of 18
learn the contribution weight of each channel of the feature layer and automatically obtain
the importance of each feature channel. According to the importance level, features with
high contributions are then enhanced and those with low contributions are suppressed,
thereby improving network performance. Therefore, the channel attention mechanism
block was adopted in the design of the FPN in this study, which further improved the
detection precision for tailings pond targets.
As shown in Figure 4, the input F of the channel attention mechanism block represents
the feature map, H represents the height of the feature map, W represents the width of
the feature map, and C represents the number of channels in the feature map. First, F was
compressed into a 1
×
1
×
C one-dimensional vector via Global Average Pooling (GAP).
Each value in the vector has a global receptive field, characterizing the global distribution
of responses on the feature channels. The two subsequent FC layers were used to model
the correlations between channels. The first FC reduces the number of feature channels
to C/r, where r is the scaling factor. After passing through the ReLu activation function,
the second FC increases the number of feature channels back to the original C. Then, the
Sigmoid function was used to obtain normalized weights representing the input feature
contributions. Finally, through the Scale operation, the input feature was multiplied by
the weight, which was extended to an equal dimension, to output the result A. Two FC
layers can add more nonlinearity; however, if the scaling factor r of the first layer is too
small, more parameters will be added and the calculation amount will increase; if it is too
large, more features will be lost and network performance will be reduced. After balancing
the amount of calculation and the network performance, the value of r was set to 4 in
this study.
Figure 4. Schematic of the channel attention mechanism block.
Proposed Feature Pyramid Network (FPN)
With the continuous development of deep learning technology in recent years, many
convolutional neural networks have overcome the problems of gradient dispersion and
gradient explosion caused by an increase in network depth to exhibit powerful feature
extraction capabilities, for example, ResNet and DenseNet [40]. However, for single-scale
features, although deep features have rich semantic information, there is a serious loss of
location information. In target detection applications, location information is crucial. In
comparison, shallow features have weak semantic information but are sensitive to location
information. Therefore, the FPN was used to fuse deep and shallow multiscale features
to fully exploit the feature semantics and location information, thereby further improving
the network performance. As well as using ResNet-101 to improve the feature extraction
capabilities, this study also adopted the channel attention mechanism block and designed
an improved FPN, which fused features at different levels to obtain a more informative
multiscale feature map, thereby greatly improving the detection precision of the model.
The improved FPN is shown in Figure 5.
Remote Sens. 2021,13, 2052 8 of 18
Figure 5.
Feature pyramid network (FPN) structure. AB represents the channel attention mechanism
block, 2
×
up represents two-times upsampling, 256 represents the number of output channels, and
represents element-wise addition.
As shown in Figure 5, ResNet-101 was used as the feature extraction network in this
study. C1, C2, C3, C4, and C5 in the network were used to extract different levels of features,
with C2–C5 selected for feature fusion. The number of channels was 256, 512, 1024, and
2048, respectively. Combining features of different levels via the FPN requires the same
number of feature channels. Therefore, the 1
×
1 convolution operation was used to reduce
the dimensionality of C2–C5. The corresponding outputs were CC2–CC5, and the number
of channels was 256. The channel attention mechanism block was used to calculate the
contribution weight of each channel of CC2–CC5, which were redistributed according to
their weight. Thus, the contributions of important feature channels were further enhanced.
The corresponding outputs were A2–A5, and the number of channels remained unchanged
at 256. When performing feature fusion via FPN, pixels corresponding to features of
different levels were added. In addition to the same number of feature channels, the
number of rows and columns in the feature layer must also be the same. Therefore, the
nearest interpolation method was applied in this study to perform two-times upsampling
on A5, A4, and A3. Subsequently, element-wise addition was performed with A4, A3,
and A2, respectively, to complete the level-by-level feature fusion, where the output was
AA2, AA3, and AA4 and the number of channels was 256. A 3
×
3 convolution operation
was performed on A5, where the output was P5 and the number of channels was 256.
Maximum pooling of 1
×
1 was performed on P5, the stride was set to 2, the output was
P6, and the number of channels was 256. A 3
×
3 convolution operation was performed on
AA2, AA3, and AA4, where the outputs were P2, P3, and P4 and the number of channels
was 256. The feature map output by FPN was {P2, P3, P4, P5, P6}.
Region Proposal Network (RPN)
The most prominent contribution of Faster R-CNN is the RPN, which uses a CNN
instead of the traditional selective search method to generate candidate regions, thereby
significantly improving network speed and precision. RPN is used to generate region
proposals. In this study, the multiscale feature maps {P2, P3, P4, P5, P6} output from the
FPN were used to replace the single-scale feature map to generate region proposals. The
areas of anchors for different scale features were set to {32
2
, 64
2
, 128
2
, 256
2
, 512
2
}, and the
anchor aspect ratios were set to {1:2, 1:1, 2:1}.
Remote Sens. 2021,13, 2052 9 of 18
In this study, feature maps input into ROIPooling with region proposals include {P2,
P3, P4, P5}, rather than a single-scale feature map. In other words, the region proposal
needs to slice the region proposal feature map from {P2, P3, P4, P5}. The following formula
was used for region proposal to select the feature map with the most appropriate scale:
k=k0+log2(wh/H)(1)
where krepresents the level of feature map corresponding to the region proposal, which
is rounded off during calculation; k
0
was set as the highest level of feature maps. In this
study, there were four levels of feature maps and k
0
was set to four;
w
and
h
represent the
width and height of the region proposal, respectively, and
H
represents the model input
height (the height and width are equal in this study) after performing resize processing on
the image slices. This is a more reasonable approach because a large-size region proposal
will correspond to a high-level feature map and generate the region proposal feature map,
which can better detect large targets. Similarly, a small-size region proposal corresponds to
a low-level feature map and generates the region proposal feature map, which can better
detect small targets.
2.2.2. Accuracy Assessment
In the field of deep learning, precision and recall are the commonly used evaluation
indicators for model performance [
41
]. When evaluating the target detection results, the
ground truth bounding box (GT) is the true bounding box of the predicted target, whereas
the predicted bounding box (PT) is the predicted bounding box of the predicted target. The
area encompassed by both the predicted bounding box and the ground truth is denoted
as the area of union, the intersection is denoted as the area of overlap, and the calculation
formula of the intersection over union (IOU) is as follows:
IOU =Area of Overlap
Area of Union (2)
where TP (true positive) refers to the number of detection boxes with correct detection
results and an IOU > 0.5; FP(false positive) refers to the number of detection boxes with
incorrect detection results and an IOU
0.5; and FN (false negative) refers to the number
of GTs that are not detected. The model evaluation indicators used in this study were
precision and recall. Precision refers to the ratio of the number of correct detection boxes
to the total number of detection boxes, whereas recall refers to the ratio of the number of
correct detection boxes to the total number of true bounding boxes. Their corresponding
calculation formulas are as follows:
Precision =TP
TP +FP (3)
Recall =TP
TP +FN (4)
The average precision (AP) of the target, precision-recall curve (PRC), and mean
average precision (mAP) are three common indicators widely applied to evaluate the
performance of object detection methods [
42
]. AP is typically the area under the PRC
and mAP is the average value of AP values for all classes; the larger the mAP value, the
better the object detection performance. As this study only detects one target, namely a
tailings pond, AP was used as the main model evaluation indicator, with the recall and
time consumption of a single iteration used as reference indicators.
Remote Sens. 2021,13, 2052 10 of 18
2.2.3. Loss Function
The formula for the calculation of the Loss Function can be expressed as follows [
24
]:
Lpi,ti=1
Ncls
i
Lcls (pi,p
i)+α1
Nreg
i
p
iLreg (ti,t
i)(5)
where N
cls
represents the number of anchors in the mini batch, N
reg
represents the number
of anchor locations, and
α
represents the weight balance parameter, which was set to 10 in
this study, and irepresents the index of an anchor in a mini batch.
Furthermore,
pi
represents the predictive classification probability of the anchor.
Specifically, when the anchor was positive,
p
i
= 1, and when it was negative,
p
i
= 0.
Moreover, anchors that met the following two conditions were considered positive: (1) the
anchor has the highest intersection-over-union (IOU) overlap with a ground truth box; or
(2) the IOU overlap of the anchor with the ground truth box is > 0.7. Conversely, when the
IOU overlap of the anchor with any ground-truth box was < 0.3, the anchor was considered
negative. Anchors that were neither positive nor negative were not included in the training.
Lcls (pi,p
i)=log[pip
i+(1pi)(1p
i)] (6)
Lreg (ti,t
i)=
i∈{x,y,w,h}
SmoothL1(tit
i)(7)
SmoothL1(x)=0.5x2,i f |x|<1
|x|0.5, otherwise (8)
For the bounding box regression, we adopted the parameterization of four coordinates,
defined as follows:
tx=(xxa)
wa,ty=(yya)
ha
tw=logw
wa,th=logh
ha
t
x=(xxa)
wa,t
y=(yya)
ha
t
w=logw
wa,t
h=logh
ha
where xand yrepresent the coordinates of the center of the bounding box, and wand h
represent the width and height of the bounding box, respectively. Furthermore, x,xa, and
x* correspond to the predicted box, anchor box, and ground truth box, respectively, similar
to y,w, and h.
2.2.4. Training and Optimization
As Faster R-CNN was employed as the baseline network, the hyperparameters were
set according to Faster R-CNN. This study adopts the transfer learning strategy, the base
network was ResNet 101, which was initialized with its pretrained weights on ImageNet.
All new layers were initialized with kaimingnormal. The network was trained using a
64-bit Ubuntu20.04LTs operating system and a NVIDIA GeForce GTX3080, using Xeon E5
CPU and CUDA version 11.1. The model trained 70 epochs of the training set. Stochastic
gradient descent was used as the optimizer, the initial learning rate of the model was set to
0.02, momentum was set to 0.9, weight_decay was set to 0.0001, and the batch size was set
to 2. The hyperparameters settings are listed in Table 2.
Remote Sens. 2021,13, 2052 11 of 18
Table 2. Table for hyperparameters settings.
Hyperparameter Learning Rate Momentum Weight_Decay Batch Size
Value 0.02 0.9 0.0001 2
3. Results and Discussion
In this study, the channel attention mechanism block was adopted to design an
improved FPN on the basis of the Faster R-CNN model. The improved model exhibits a
significant improvement in the detection performance of tailings pond targets compared to
the Faster R-CNN model. Based on the data set of tailings pond samples constructed in
this study, the model input size greatly affected the detection precision; the results show
that when resize = [800, 800], the detection precision of tailings pond is the highest and
both the AP and recall of tailings pond detection increased significantly in the improved
model, by 5.6% and 10.9% to reach 85.7% and 62.9%, respectively. The results above are
analyzed in detail in the following sections.
3.1. Effect of Different Input Sizes
Based on the Faster R-CNN model, the model detection precision was compared for
different model input sizes. It was assumed that the size of the input image slice was [W,
H, C], where W, H, and C are the width of the slice, height of the slice, and number of
channels in the slice, respectively. The size of the image slice in the tailings pond sample
data set was [2600, 2600, 3]. According to the bilinear interpolation resampling method, the
image slices were used as the model input after resize processing in W and H dimensions,
during which the number of channels C remained unchanged. After downsampling, the
sizes of W and H were set to resize = [400, 400], [600, 600], [800, 800], [1000, 1000], [1200,
1200], totaling five sizes. The resize size with the highest precision was selected as the
model input size.
According to the training loss curves in Figure 6, the trends of the model loss curves
are approximately the same for different resize sizes, the loss values are similar, and all
values converge well. However, the test precision curves of the model (Figure 7) indicate
that the model exhibits the strongest generalization ability and maintains the highest test
precision when resize = [800, 800]. The model evaluation indicator results for different
resize sizes are listed in Table 3. When resize = [800, 800], the model AP reaches a maximum
of 80.1%. Compared with resize = [600, 600], the recall is slightly smaller (1%) but the
AP is 2.8% higher. However, as the resize size either increases or decreases, both the AP
and recall of the model decrease, especially for resize = [1200, 1200], where AP and recall
drop to their minimum values of 69.3% and 41.8%, respectively. According to the time
consumption of a single iteration, the calculation amount of the model increases as the
resize size increases, resulting in a longer calculation time. Compared to resize = [400,
400], when resize = [800, 800], the iteration time only increases slightly (0.081 s) but the
AP increases by 5.8% and the recall increases by 0.2%. Overall, a resize value of [800, 800]
generates optimal model performance.
Table 3. Test results for different resize sizes.
Resize AP (%) Recall (%) Iteration Time (s)
[400, 400] 74.3 51.8 0.105
[600, 600] 77.3 53.0 0.128
[800, 800] 80.1 52.0 0.186
[1000, 1000] 77.5 47.3 0.259
[1200, 1200] 69.3 41.8 0.345
Remote Sens. 2021,13, 2052 12 of 18
Figure 6. Training loss curves for different resize sizes.
Figure 7. Test precision curves for different resize sizes.
3.2. Analysis of Model Improvement Results
The optimal model input size was selected through a comparison study. After ob-
taining the optimal performance using the Faster R-CNN model, further improvements
were made to the model. First, the FPN was introduced, and the corresponding model
was represented by Faster R-CNN + FPN. Then, the channel attention mechanism block
was adopted to further improve the FPN, and the corresponding model was represented
by Faster R-CNN + FPN + AB. According to the loss curves, all models exhibit good
convergence (Figure 8). In addition, after improving the model with FPN and AB, the
model exhibits the best convergence and the lowest loss value. Furthermore, according to
the model test precision curves, the improved final model has the highest test precision
(Figure 9). The evaluation indicator results of each model are listed in Table 4, which show
that both the AP and recall indicators of the model are greatly improved by using the
FPN, increasing by 4.2% and 10.6%, respectively. This indicates that the model detection
capability is significantly improved by combining features of different scales, although
the increased calculation amount and number of parameters results in an increase in the
time required for a single iteration. After further adoption of AB, both the AP and recall of
the model increase by 1.4% and 0.3%, respectively, whereas the time required for a single
Remote Sens. 2021,13, 2052 13 of 18
iteration only increases by 0.006 s. Thus, through application of the channel attention mech-
anism, the detection performance is significantly improved and the calculation amount
and iteration time are increased only by a small amount. Compared with the Faster R-CNN
model, the AP and recall of the final improved model increase by 5.6% and 10.9%, reaching
85.7% and 62.9%, respectively.
Figure 8. Loss curves of different network models.
Figure 9. Test precision curves of different network models.
Table 4. Test results of different network models.
Network AP (%) Recall (%) Iteration Time (s)
Faster R-CNN 80.1 52.0 0.186
Faster R-CNN + FPN 84.3 62.6 0.273
Faster R-CNN + FPN + AB 85.7 62.9 0.279
Remote Sens. 2021,13, 2052 14 of 18
In summary, the improved model exhibits a significant increase in the detection
precision of tailings ponds compared to Faster R-CNN, as well as more accurate location
positioning. Furthermore, cases of missed detection and false detection are also reduced.
As shown in Figure 10, (a) is the image of tailings pond, (b) is the feature heat map
extracted by the Faster R-CNN model, and (c) is the feature heat map extracted by the
improved model. The characteristics of tailings pond extracted from the improved model
are obviously improved in terms of shape and contour. As shown in Figures 1113, the
green bounding box represents the tailings pond predicted by the model. Figure 11a is
the prediction result of Faster R-CNN, which has a prediction score of 0.97. However,
an error appears in the predicted bounding box position, where the upper right corner
of the tailings pond is not included. Figure 11b is the prediction result of the improved
model, where the score is increased to 1.0 and the accuracy of the bounding box position is
significantly improved. Moreover, the red arrow in Figure 12a indicates a non-detected
tailings pond, whereas the tailings pond is accurately detected by the improved model.
Additionally, the improved model exhibits a significantly better score and location accuracy
for other detected tailings pond targets than Faster R-CNN. Finally, as shown in Figure 13,
the improved model also avoids the false detection of tailings pond by Faster R-CNN.
Figure 10.
Feature extraction capability improved after model improvement: (
a
) the image of tailings
pond, (b) feature heat map of Faster R-CNN, and (c) feature heat map of the improved model.
Remote Sens. 2021, 13, x FOR PEER REVIEW 14 of 18
Table 4. Test results of different network models.
Network AP (%) Recall (%) Iteration Time (s)
Faster R-CNN 80.1 52.0 0.186
Faster R-CNN + FPN 84.3 62.6 0.273
Faster R-CNN + FPN + AB 85.7 62.9 0.279
In summary, the improved model exhibits a significant increase in the detection pre-
cision of tailings ponds compared to Faster R-CNN, as well as more accurate location po-
sitioning. Furthermore, cases of missed detection and false detection are also reduced.
As shown in Figure 10, (a) is the image of tailings pond, (b) is the feature heat map
extracted by the Faster R-CNN model, and (c) is the feature heat map extracted by the
improved model. The characteristics of tailings pond extracted from the improved model
are obviously improved in terms of shape and contour. As shown in Figures 11–13, the
green bounding box represents the tailings pond predicted by the model. Figure 11a is the
prediction result of Faster R-CNN, which has a prediction score of 0.97. However, an error
appears in the predicted bounding box position, where the upper right corner of the tail-
ings pond is not included. Figure 11b is the prediction result of the improved model,
where the score is increased to 1.0 and the accuracy of the bounding box position is sig-
nificantly improved. Moreover, the red arrow in Figure 12a indicates a non-detected tail-
ings pond, whereas the tailings pond is accurately detected by the improved model. Ad-
ditionally, the improved model exhibits a significantly better score and location accuracy
for other detected tailings pond targets than Faster R-CNN. Finally, as shown in Figure
13, the improved model also avoids the false detection of tailings pond by Faster R-CNN.
Figure 10. Feature extraction capability improved after model improvement: (a) the image of tail-
ings pond, (b) feature heat map of Faster R-CNN, and (c) feature heat map of the improved model.
Remote Sens. 2021, 13, x FOR PEER REVIEW 15 of 18
Figure 11. Increased position prediction accuracy after model improvement: (a) prediction results
of Faster R-CNN and (b) prediction results of the improved model.
Figure 12. Improvement in missed detections of tailings ponds after model improvement: (a) pre-
diction results of Faster R-CNN, (b) prediction results of the improved model, and the red arrow
indicates a non-detected tailings pond.
Figure 11.
Increased position prediction accuracy after model improvement: (
a
) prediction results of
Faster R-CNN and (b) prediction results of the improved model.
Remote Sens. 2021,13, 2052 15 of 18
Remote Sens. 2021, 13, x FOR PEER REVIEW 15 of 18
Figure 11. Increased position prediction accuracy after model improvement: (a) prediction results
of Faster R-CNN and (b) prediction results of the improved model.
Figure 12. Improvement in missed detections of tailings ponds after model improvement: (a) pre-
diction results of Faster R-CNN, (b) prediction results of the improved model, and the red arrow
indicates a non-detected tailings pond.
Figure 12.
Improvement in missed detections of tailings ponds after model improvement: (
a
) pre-
diction results of Faster R-CNN, (
b
) prediction results of the improved model, and the red arrow
indicates a non-detected tailings pond.
Remote Sens. 2021,13, 2052 16 of 18
Remote Sens. 2021, 13, x FOR PEER REVIEW 16 of 18
Figure 13. Improvement in false detections of tailings ponds after model improvement: (a) predic-
tion results of Faster R-CNN, (b) prediction results of the improved model, and the red arrow in-
dicates a false detected tailings pond.
4. Conclusions
This study improved the Faster R-CNN model and proposed an intelligent identifi-
cation method for tailings ponds based on high-resolution remote sensing images, which
significantly improves the detection precision of tailings pond targets. Based on the data
set of tailings pond samples constructed in this study, it was found that the model input
size greatly affected the detection precision and the results show that when resize = [800,
800], the detection precision of tailings pond is the highest. To improve the image feature
extraction capabilities of the model, using ResNet-101 as the feature extraction network,
the channel attention mechanism block was adopted and an improved FPN was designed.
This improved model recalibrated the contribution degrees of the feature channels while
fusing features at different levels, thereby enhancing features with high contribution de-
grees and suppressing features with low contribution degrees. The test results show that
both the AP and recall of tailings pond detection increased significantly in the improved
model, by 5.6% and 10.9% to reach 85.7% and 62.9%, respectively. Considering the rapid
growth in high-resolution remote sensing images, this method has important applications
for large-scale, high-precision, and intelligent identification of tailings ponds, which will
greatly improve the decision-making efficiency of tailings pond management.
Figure 13.
Improvement in false detections of tailings ponds after model improvement: (
a
) prediction
results of Faster R-CNN, (
b
) prediction results of the improved model, and the red arrow indicates a
false detected tailings pond.
4. Conclusions
This study improved the Faster R-CNN model and proposed an intelligent identifi-
cation method for tailings ponds based on high-resolution remote sensing images, which
significantly improves the detection precision of tailings pond targets. Based on the data
set of tailings pond samples constructed in this study, it was found that the model input
size greatly affected the detection precision and the results show that when resize = [800,
800], the detection precision of tailings pond is the highest. To improve the image feature
extraction capabilities of the model, using ResNet-101 as the feature extraction network,
the channel attention mechanism block was adopted and an improved FPN was designed.
This improved model recalibrated the contribution degrees of the feature channels while
fusing features at different levels, thereby enhancing features with high contribution de-
grees and suppressing features with low contribution degrees. The test results show that
both the AP and recall of tailings pond detection increased significantly in the improved
model, by 5.6% and 10.9% to reach 85.7% and 62.9%, respectively. Considering the rapid
growth in high-resolution remote sensing images, this method has important applications
for large-scale, high-precision, and intelligent identification of tailings ponds, which will
greatly improve the decision-making efficiency of tailings pond management.
Remote Sens. 2021,13, 2052 17 of 18
Author Contributions:
Conceptualization, G.L. and D.Y.; methodology, D.Y.; software, D.Y and
K.L.; validation, D.Y., X.L. and H.Z.; formal analysis, M.C.; investigation, H.L.; resources, F.Z.; data
curation, X.L.; writing—original draft preparation, D.Y.; writing—review and editing, G.L. and D.Y.;
visualization, D.Y.; supervision, G.L.; project administration, G.L.; funding acquisition, G.L. and H.Z.
All authors have read and agreed to the published version of the manuscript.
Funding:
This research was funded by National Key Research and Development Program of China
from Ministry of Science and Technology, grant number 2016YFB0501504; the National Natural
Science Foundation of China, grant number 41771397.
Acknowledgments:
The authors would like to thank the editors and the anonymous reviewers for
their helpful suggestions.
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
ILSVRC ImageNet large-scale visual recognition challenge
CNN convolutional neural network
FC fully connected layers
RPN region proposal network
FPN feature pyramid network
AM attention mechanism
AB channel attention mechanism block
GT ground truth bounding box
PT predicted bounding box
IOU intersection over union
AP average precision
mAP mean average precision
PRC precision-recall curve
References
1.
Wang, T.; Hou, K.P.; Guo, Z.S.; Zhang, C.L. Application of analytic hierarchy process to tailings pond safety operation analysis.
Rock Soil Mech. 2008,29, 680–687.
2.
Xiao, R.; Lv, J.; Fu, Z.; Sheng, W.; Xiong, W.; Shi, Y.; Cao, F.; Yu, Q. The Application of Remote Sensing in the Environmental Risk
Monitoring of Tailings pond in Zhangjiakou City, China. Remote Sens. Technol. Appl. 2014,29, 100–105.
3.
Santamarina, J.C.; Torres-Cruz, L.A.; Bachus, R.C. Why coal ash and tailings dam disasters occur. Science
2019
,364, 526. [CrossRef]
[PubMed]
4.
Jie, L. Remote Sensing Research and Application of Tailings Pond–A Case Study on the Tailings Pond in Hebei Province; China University
of Geosciences: Beijing, China, 2014.
5.
Gao, Y.; Hou, J.; Chu, Y.; Guo, Y. Remote sensing monitoring of tailings ponds based on the latest domestic satellite data. J.
Heilongjiang Inst. Technol. 2019,33, 26–29.
6.
Tan, Q.L.; Shao, Y. Application of remote sensing technology to environmental pollution monitoring. Remote. Sens. Technol. Appl.
2000,15, 246–251.
7.
Dai, Q.W.; Yang, Z.Z. Application of remote sensing technology to environment monitoring. West. Explor. Eng.
2007
,4, 209–210.
8.
Wang, Q. The progress and challenges of satellite remote sensing technology applications in the field of environmental protection.
Environ. Monit. China 2009,25, 53–56.
9.
Liu, W.T.; Zhang, Z.; Peng, Y. Application of TM image in monitoring the water quality of tailing reservoir. Min. Res. Dev.
2010
,
30, 90–92.
10.
Zhao, Y.M. Moniter Tailings based on 3S Technology to Tower Mountain in Shanxi Province. Master’s Thesis, China University of
Geoscience, Beijing, China, 2011; pp. 1–46.
11.
Hao, L.; Zhang, Z.; Yang, X. Mine tailing extraction indexes and model using remote-sensing images in southeast Hubei Province.
Environ. Earth Sci. 2019,78, 493. [CrossRef]
12.
Ma, B.; Chen, Y.; Zhang, S.; Li, X. Remote sensing extraction method of tailings ponds in ultra-low-grade iron mining area based
on spectral characteristics and texture entropy. Entropy 2018,20, 345. [CrossRef]
13.
Xiao, R.; Shen, W.; Fu, Z.; Shi, Y.; Xiong, W.; Cao, F. The application of remote sensing in the environmental risk monitoring of
tailings pond: A case study in Zhangjiakou area of China. SPIE Proc. 2012,8538.
14.
Riaza, A.; Buzzi, J.; García-Meléndez, E.; Vázquez, I.; Bellido, E.; Carrère, V.; Müller, A. Pyrite mine waste and water mapping
using Hymap and Hyperion hyperspectral data. Environ. Earth Sci. 2012,66, 1957–1971. [CrossRef]
Remote Sens. 2021,13, 2052 18 of 18
15.
Li, Q.; Chen, Z.; Zhang, B.; Li, B.; Lu, K.; Lu, L.; Guo, H. Detection of tailings dams using high-resolution satellite imagery and a
single shot multibox detector in the Jing–Jin–Ji Region, China. Remote. Sens. 2020,12, 2626. [CrossRef]
16.
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of
the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37.
17.
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Advances in
Neural Information Processing Systems; Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q., Eds.; Curran Associates, Inc.: New
York, NY, USA, 2012; pp. 1106–1114.
18. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556.
19.
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016.
20.
Huang, G.; Liu, Z.; van der Maarten, L.; Weinberger, K.Q. Densely connected convolutional networks. arXiv
2016
,
arXiv:1608.06993.
21.
Wu, X.; Sahoo, D.; Hoi, S.C.H. Recent advances in deep learning for object detection. Neurocomputing
2020
,396, 39–64. [CrossRef]
22.
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014;
pp. 580–587.
23.
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13
December 2015; pp. 1440–1448.
24.
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.
In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; MIT Press:
Cambridge, MA, USA, 2016; pp. 91–99.
25.
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the
2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788.
26. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767.
27.
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271.
28.
Li, Y.; Huang, Q.; Pei, X.; Jiao, L.; Ronghua. RADet: Refine feature pyramid network and multi-layer attention network for
arbitrary-oriented object detection of remote sensing images. Remote Sens. 2020,12, 389. [CrossRef]
29. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the 2017 IEEE International Conference on Computer
Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969.
30.
Li, Y.; Xu, W.; Chen, H.; Jiang, J.; Li, X. A Novel Framework Based on Mask R-CNN and Histogram Thresholding for Scalable
Segmentation of New and Old Rural Buildings. Remote. Sens. 2021,13, 1070.
31.
Bhuiyan, M.A.E.; Witharana, C.; Liljedahl, A.K. Use of Very High Spatial Resolution Commercial Satellite Imagery and Deep
Learning to Automatically Map Ice-Wedge Polygons across Tundra Vegetation Types. J. Imaging 2020,6, 137. [CrossRef]
32.
Zhao, K.; Kang, J.; Jung, J.; Sohn, G. Building extraction from satellite images using mask R-CNN with building boundary
regularization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City,
UT, USA, 18–22 June 2018.
33.
Bai, T.; Pang, Y.; Wang, J.; Han, K.; Luo, J.; Wang, H.; Lin, J.; Wu, J.; Zhang, H. An Optimized Faster R-CNN Method Based on
DRNet and RoI Align for Building Detection in Remote Sensing Images. Remote Sens. 2020,12, 762. [CrossRef]
34.
Liu, Y.; Cen, C.; Che, Y.; Ke, R.; Ma, Y.; Ma, Y. Detection of Maize Tassels from UAV RGB Imagery with Faster R-CNN. Remote
Sens. 2020,12, 338. [CrossRef]
35.
Yu, G.; Song, C.; Pan, Y.; Li, L.; Li, R.; Lu, S. Review of new progress in tailing dam safety in foreign research and current state
with development trend in China. Chin. J. Rock Mech. Eng. 2014,33, 3238–3248.
36.
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv
2015
,
arXiv:1506.01497. [CrossRef]
37.
Lin, T.Y.; Dollar, P.; Girshick, R.; He, H.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. arXiv
2017
,
arXiv:1612.03144.
38. Chaudhari, S.; Mithal, V.; Polatkan, G.; Ramanath, R. An attentive survey of attention models. arXiv 2020, arXiv:1904.02874.
39.
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation Networks. In Proceedings of the 2018 IEEE Conference on Computer Vision and
Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141.
40.
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the
2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017.
41. Buckland, M.; Gey, F. The relationship between Recall and Precision. J. Am. Soc. Inf. Sci. 1994,45, 12–19. [CrossRef]
42.
Han, J.; Zhou, P.; Zhang, D.; Cheng, G.; Guo, L.; Liu, Z.; Bu, S.; Wu, J. Efficient, simultaneous detection of multi-class geospatial
targets based on visual saliency modeling and discriminative learning of sparse coding. ISPRS J. Photogramm. Remote Sens.
2014
,
89, 37–48. [CrossRef]
... Researchers have introduced advanced detection algorithms into the application of remote sensing images, the goal being to overcome the difficulties in satellite remote sensing image detection. Yan et al. [21] used an improved Faster R-CNN model to detect tailing ponds in remote sensing images. Luz et al. [22] used an unsupervised model to detect fire-prone areas. ...
... Yan et al. [21] proposed an improved Faster R-CNN model to detect tailing ponds in remote sensing images. They used an attention mechanism in the Faster R-CNN, and the average precision (AP) of detecting tailing ponds reached 85.7%. ...
... Researchers have tried to improve the feature fusion of the neck network to improve the performance of the model. Yan et al. [21] used the feature pyramid network (FPN) in Faster RCNN to achieve multi-scale feature fusion, and used channel attention in the FPN structure to achieve high accuracy in detecting tailing ponds in remote sensing images. Qu et al. [54] introduced the ASFF structure in the FPN of YOLOv3, which strengthened the feature fusion of FPN, and achieved good detection results. ...
Preprint
Full-text available
Automatic object detection by satellite remote sensing images is of great significance for resource exploration and natural disaster assessment. To solve existing problems in remote sensing image detection, this article proposes an improved YOLOX model for satellite remote sensing image automatic detection. This model is named RS-YOLOX. To strengthen the feature learning ability of the network, we used Efficient Channel Attention (ECA) in the backbone network of YOLOX and combined the Adaptively Spatial Feature Fusion (ASFF) with the neck network of YOLOX. To balance the numbers of positive and negative samples in training, we used the Varifocal Loss function. Finally, to obtain a high-performance remote sensing object detector, we combined the trained model with an open-source framework called Slicing Aided Hyper Inference (SAHI). This work evaluated models on three aerial remote sensing datasets (DOTA-v1.5, TGRS-HRRSD, and RSOD). Our comparative experiments demonstrate that our model has the highest accuracy in detecting objects in remote sensing image datasets.
... For an input RGB image, standardization processing is applied to each dimension of its feature matrix. Yan, Li [27] highlighted that the traditional object detection algorithm, Faster Regionbased Convolutional Neural Network (R-CNN), could result in significant computational expenses. Integrating the lightweight design of MobileNet v2 network into Faster R-CNN can effectively reduce the demand for computational resources, helping to address the issue of limited computing resources. ...
Article
Full-text available
This study aims to explore the integration of the Faster R-CNN (Region-based Convolutional Neural Network) algorithm from deep learning into the MobileNet v2 architecture, within the context of enterprises aiming for carbon neutrality in their development process. The experiment develops a marine oil condition monitoring and classification model based on the fusion of MobileNet v2 and Faster R-CNN algorithms. This model utilizes the MobileNet v2 network to extract rich feature information from input images and combines the Faster R-CNN algorithm to rapidly and accurately generate candidate regions for oil condition monitoring, followed by detailed feature fusion and classification of these regions. The performance of the model is evaluated through experimental assessments. The results demonstrate that the average loss value of the proposed model is approximately 0.45. Moreover, the recognition accuracy of the model for oil condition on the training and testing sets reaches 90.51% and 93.08%, respectively, while the accuracy of other algorithms remains below 90%. Thus, the model constructed in this study exhibits excellent performance in terms of loss value and recognition accuracy, providing reliable technical support for offshore oil monitoring and contributing to the promotion of sustainable utilization and conservation of marine resources.
... In mining appications, [22] eveloped a high-precision method for extracting information on TSFs using the Faster R-CNN model with high-resolution images. This approach achieved a precision of 85.7%, demonstrating the efficacy of deep learning models in the precise detection of MWSFs. ...
Article
Full-text available
MineWaste Storage Facilities (MWSFs) in Chile present substantial environmental and safety risks due to their extensive scale and the hazardous nature of their contents. This study proposes an automated detection approach that integrates Sentinel-2 satellite imagery with advanced deep learning models to address these critical issues. A central contribution of this research is the development of MineWasteCL_DB, a comprehensive public dataset comprising over 30,000 annotated images and 320,093 labels for diverse MWSF types, including Tailings Storage Facilities (TSFs), Waste Rock Dumps (WRDs), and Leaching Waste Dumps (LWDs). The study employs the YOLOv8x-seg model, selected for its high precision, to validate the presence of 96.15% of officially registered TSFs. Furthermore, it identified 141 WRDs and 112 LWDs in the Antofagasta Refgion, facilities absent from any official national registry. These findings underscore the methodology’s potential for widespread application and the necessity for routine monitoring across additional regions. The results provide a robust framework for advancing the understanding and management of MWSFs, thereby improving regulatory oversight and promoting environmental safety. The methodology supports not only the efficient monitoring of registered facilities but also the preliminary identification and prospective registration of unregistered sites. This capability enhances the oversight capacities of regulatory authorities while fostering the protection of environmental and public safety.
... Their results demonstrated that deep learning methods are highly effective for detecting complex feature types in RS images. Yan et al. [16] used the Faster R-CNN model and added the feature pyramid network [17] to monitor tailings ponds in high-resolution RS images. The results showed that the method improved both the average precision and recall rate, making it highly significant for the large-scale, high precision, and intelligent identification of tailings ponds. ...
Article
Full-text available
Tailing ponds are used to store tailings or industrial waste discharged after beneficiation. Identifying these ponds in advance can help prevent pollution incidents and reduce their harmful impacts on ecosystems. Tailing ponds are traditionally identified via manual inspection, which is time-consuming and labor-intensive. Therefore, tailing pond identification based on computer vision is of practical significance for environmental protection and safety. In the context of identifying tailings ponds in remote sensing, a significant challenge arises due to high-resolution images, which capture extensive feature details—such as shape, location, and texture—complicated by the mixing of tailings with other waste materials. This results in substantial intra-class variance and limited inter-class variance, making accurate recognition more difficult. Therefore, to monitor tailing ponds, this study utilized an improved version of DeepLabv3+, which is a widely recognized deep learning model for semantic segmentation. We introduced the multi-scale attention modules, ResNeSt and SENet, into the DeepLabv3+ encoder. The split-attention module in ResNeSt captures multi-scale information when processing multiple sets of feature maps, while the SENet module focuses on channel attention, improving the model’s ability to distinguish tailings ponds from other materials in images. Additionally, the tailing pond semantic segmentation dataset NX-TPSet was established based on the Gauge-Fractional-6 image. The ablation experiments show that the recognition accuracy (intersection and integration ratio, IOU) of the RST-DeepLabV3+ model was improved by 1.19% to 93.48% over DeepLabV3+.The multi-attention module enables the model to integrate multi-scale features more effectively, which not only improves segmentation accuracy but also directly contributes to more reliable and efficient monitoring of tailings ponds. The proposed approach achieves top performance on two benchmark datasets, NX-TPSet and TPSet, demonstrating its effectiveness as a practical and superior method for real-world tailing pond identification.
... The integration of AI in the detection and analysis of MWSF has achieved remarkable progress in recent years, particularly for TSF. Object detection and segmentation models such as YOLO and Faster R-CNN have significantly improved the automatic identification of these mining facilities through satellite imagery [2], [17]. These models not only identify TSF with high precision but also enable the extraction of geometric and geotechnical parameters related to PS evaluation. ...
Article
Full-text available
Chile’s mining industry, a global leader in copper production, faces challenges due to increasing volumes of mining waste, particularly Waste Rock Dumps (WRD) and LeachingWaste Dumps (LWD). The National Service of Geology and Mining (SERNAGEOMIN) requires assessment of the physical stability (PS) of these facilities, but current methods are hindered by data scarcity and resource constraints. This study proposes a simplified evaluation methodology using first-order parameters from open-access data. By integrating Geographic Information Systems (GIS) and Artificial Intelligence (AI)—utilizing models like YOLOv11 and convolutional neural networks—we automate the detection and characterization of WRD and LWD from satellite imagery, extracting critical parameters for PS assessment. This approach reduces analysis time and minimizes human error. Validated in the Antofagasta Region, Chile’s primary mining area, we identified and evaluated 70 WRD and 54 LWD. The results demonstrate the effectiveness of prioritizing deposits based on potential risk, enhancing SERNAGEOMIN’s capacity for supervision. The successful application suggests scalability to other mining regions and adaptability to different facility types, including tailings storage facilities. This work offers a practical tool to improve safety and risk management in the mining industry, addressing critical challenges in PS evaluation under current regulatory constraints.
... To effectively and efficiently extract tailings ponds, a target detection method based on the single-shot multibox detector deep learning technique was developed (Li et al. 2020). Yan et al. (2021) devised an improved method based on the Faster R-CNN target detection model, incorporating attention mechanism and feature pyramid networks (FPNs) for tailings pond identification. Lyu et al. (2021) proposed a method by combining "You Only Look Once" version 4 (YOLOv4) and the random forest algorithm to extract boundary information of tailings ponds. ...
Article
Full-text available
This paper proposes a framework that combines the improved "You Only Look Once" version 5 (YOLOv5) and SegFormer to extract tailings ponds from multi-source data. Points of interest (POIs) are crawled to capture potential tailings pond regions. Jeffries-Matusita distance is used to evaluate the optimal band combination. The improved YOLOv5 replaces the backbone with the PoolFormer to form a PoolFormer backbone. The neck introduces the CARAFE operator to form a CARAFE feature pyramid network neck (CRF-FPN). The head is substituted with an efficiency decoupled head. POIs and classification data optimize improved YOLOv5 results. After that, the SegFormer is used to delineate the boundaries of tail-ings ponds. Experimental results demonstrate that the mean average precision of the improved YOLOv5s has increased by 2.78% compared to the YOLOv5s, achieving 91.18%. The SegFormer achieves an intersection over union of 88.76% and an accuracy of 94.28%.
... Weng et al. introduce a rotating object detection method [17] to tackle the challenge posed by the random orientation of remote sensing objects. Yan [18] employed attention mechanisms to enhance the Feature Pyramid Network (FPN) and extract pertinent information regarding tailings ponds. Yin [19] applies multi-scale training to Fast R-CNN to improve the network's robustness. ...
Article
Full-text available
With the advancement of satellite and sensor technologies, remote sensing images are playing crucial roles in both civilian and military domains. This paper addresses challenges such as complex backgrounds and scale variations in remote sensing images by proposing a novel attention mechanism called ESHA. This mechanism effectively integrates multi-scale feature information and introduces a multi-head self-attention (MHSA) to better capture contextual information surrounding objects, enhancing the model’s ability to perceive complex scenes. Additionally, we optimized the C2f module of YOLOv8, which enhances the model’s representational capacity by introducing a parallel multi-branch structure to learn features at different levels, resolving feature scarcity issues. During training, we utilized focal loss to handle the issue of imbalanced target class distributions in remote sensing datasets, improving the detection accuracy of challenging objects. The final network model achieved training accuracies of 89.1%, 91.6%, and 73.2% on the DIOR, NWPU VHR-10, and VEDAI datasets, respectively.
Article
Since unmanned aerial vehicles (UAVs) provide real-time monitoring of vast areas, their rapid development has been crucial to the advancement of surveillance applications. However, in the face of complex environments, present surveillance systems frequently suffer from an initial lack of efficiency, scalability, and adaptability. In order to detect and track any security threats in real time, this study aims to create a unique AI-based aerial surveillance framework that makes use of CNNs and Fast R-CNNs. It trains and validates object identification models using publicly accessible UAV datasets in relation to important parameters like robustness, processing speed, and accuracy. The suggested framework for object detection using augmented intelligence thus applies to contemporary surveillance systems, which are designed to be reliable, resilient, and able to effectively satisfy contemporary security requirements. This study presents a brand-new, incredibly effective Faster R-CNN created especially to tackle the difficult object placement issue in aerial photos. For pinpointing the precise location of things of interest, the algorithm works incredibly well. The average accuracy has increased significantly to above 70%, according to the results. With an F1-score of 92.7%, the Fast R-CNN model achieved precision and recall scores of 93.1% and 92.4%, respectively, while still performing within the average of 94.7%.
Article
Full-text available
Mapping new and old buildings are of great significance for understanding socio-economic development in rural areas. In recent years, deep neural networks have achieved remarkable building segmentation results in high-resolution remote sensing images. However, the scarce training data and the varying geographical environments have posed challenges for scalable building segmentation. This study proposes a novel framework based on Mask R-CNN, named Histogram Thresholding Mask Region-Based Convolutional Neural Network (HTMask R-CNN), to extract new and old rural buildings even when the label is scarce. The framework adopts the result of single-object instance segmentation from the orthodox Mask R-CNN. Further, it classifies the rural buildings into new and old ones based on a dynamic grayscale threshold inferred from the result of a two-object instance segmentation task where training data is scarce. We found that the framework can extract more buildings and achieve a much higher mean Average Precision (mAP) than the orthodox Mask R-CNN model. We tested the novel framework’s performance with increasing training data and found that it converged even when the training samples were limited. This framework’s main contribution is to allow scalable segmentation by using significantly fewer training samples than traditional machine learning practices. That makes mapping China’s new and old rural buildings viable.
Article
Full-text available
We developed a high-throughput mapping workflow, which centers on deep learning (DL) convolutional neural network (CNN) algorithms on high-performance distributed computing resources, to automatically characterize ice-wedge polygons (IWPs) from sub-meter resolution commercial satellite imagery. We applied a region-based CNN object instance segmentation algorithm, namely the Mask R-CNN, to automatically detect and classify IWPs in North Slope of Alaska. The central goal of our study was to systematically expound the DLCNN model interoperability across varying tundra types (sedge, tussock sedge, and non-tussock sedge) and image scene complexities to refine the understanding of opportunities and challenges for regional-scale mapping applications. We corroborated quantitative error statistics along with detailed visual inspections to gauge the IWP detection accuracies. We found promising model performances (detection accuracies: 89% to 96% and classification accuracies: 94% to 97%) for all candidate image scenes with varying tundra types. The mapping workflow discerned the IWPs by exhibiting low absolute mean relative error (AMRE) values (0.17-0.23). Results further suggest the importance of increasing the variability of training samples when practicing transfer-learning strategy to map IWPs across heterogeneous tundra cover types. Overall, our findings demonstrate the robust performances of IWPs mapping workflow in multiple tundra landscapes.
Article
Full-text available
The timely and accurate mapping and monitoring of mine tailings dams is crucial to the improvement of management practices by decision makers and to the prevention of disasters caused by failures of these dams. Due to the complex topography, varying geomorphological characteristics, and the diversity of ore types and mining activities, as well as the range of scales and production processes involved, as they appear in remote sensing imagery, tailings dams vary in terms of their scale, color, shape, and surrounding background. The application of high-resolution satellite imagery for automatic detection of tailings dams at large spatial scales has been barely reported. In this study, a target detection method based on deep learning was developed for identifying the locations of tailings ponds and obtaining their geographical distribution from high-resolution satellite imagery automatically. Training samples were produced based on the characteristics of tailings ponds in satellite images. According to the sample characteristics, the Single Shot Multibox Detector (SSD) model was fine-tuned during model training. The results showed that a detection accuracy of 90.2% and a recall rate of 88.7% could be obtained. Based on the optimized SSD model, 2221 tailing ponds were extracted from Gaofen-1 high resolution imagery in the Jing–Jin–Ji region in northern China. In this region, the majority of tailings ponds are located at high altitudes in remote mountainous areas. At the city level, the tailings ponds were found to be located mainly in Chengde, Tangshan, and Zhangjiakou. The results prove that the deep learning method is very effective at detecting complex land-cover features from remote sensing images.
Article
Full-text available
In recent years, the increase of satellites and UAV (unmanned aerial vehicles) has multiplied the amount of remote sensing data available to people, but only a small part of the remote sensing data has been properly used; problems such as land planning, disaster management and resource monitoring still need to be solved. Buildings in remote sensing images have obvious positioning characteristics; thus, the detection of buildings can not only help the mapping and automatic updating of geographic information systems but also have guiding significance for the detection of other types of ground objects in remote sensing images. Aiming at the deficiency of traditional building remote sensing detection, an improved Faster R-CNN (region-based Convolutional Neural Network) algorithm was proposed in this paper, which adopts DRNet (Dense Residual Network) and RoI (Region of Interest) Align to utilize texture information and to solve the region mismatch problems. The experimental results showed that this method could reach 82.1% mAP (mean average precision) for the detection of landmark buildings, and the prediction box of building coordinates was relatively accurate, which improves the building detection results. Moreover, the recognition of buildings in a complex environment was also excellent.
Article
Full-text available
Object detection has made significant progress in many real-world scenes. Despite this remarkable progress, the common use case of detection in remote sensing images remains challenging even for leading object detectors, due to the complex background, objects with arbitrary orientation, and large difference in scale of objects. In this paper, we propose a novel rotation detector for remote sensing images, mainly inspired by Mask R-CNN, namely RADet. RADet can obtain the rotation bounding box of objects with shape mask predicted by the mask branch, which is a novel, simple and effective way to get the rotation bounding box of objects. Specifically, a refine feature pyramid network is devised with an improved building block constructing top-down feature maps, to solve the problem of large difference in scales. Meanwhile, the position attention network and the channel attention network are jointly explored by modeling the spatial position dependence between global pixels and highlighting the object feature, for detecting small object surrounded by complex background. Extensive experiments on two remote sensing public datasets, DOTA and NWPUVHR -10, show our method to outperform existing leading object detectors in remote sensing field.
Article
Full-text available
Maize tassels play a critical role in plant growth and yield. Extensive RGB images obtained using unmanned aerial vehicle (UAV) and the prevalence of deep learning provide a chance to improve the accuracy of detecting maize tassels. We used images from UAV, a mobile phone, and the Maize Tassel Counting dataset (MTC) to test the performance of faster region-based convolutional neural network (Faster R-CNN) with residual neural network (ResNet) and a visual geometry group neural network (VGGNet). The results showed that the ResNet, as the feature extraction network, was better than the VGGNet for detecting maize tassels from UAV images with 600 × 600 resolution. The prediction accuracy ranged from 87.94% to 94.99%. However, the prediction accuracy was less than 87.27% from the UAV images with 5280 × 2970 resolution. We modified the anchor size to [852, 1282, 2562] in the region proposal network according to the width and height of pixel distribution to improve detection accuracy up to 89.96%. The accuracy reached up to 95.95% for mobile phone images. Then, we compared our trained model with TasselNet without training their datasets. The average difference of tassel number was 1.4 between the calculations with 40 images for the two methods. In the future, we could further improve the performance of the models by enlarging datasets and calculating other tassel traits such as the length, width, diameter, perimeter, and the branch number of the maize tassels.
Article
Full-text available
Southeast Hubei province is an important iron–copper production base in China, which has produced a large number of mine tailings from mining activities. Although they contain a certain amount of iron or copper as secondary mineral resources, the mine tailings and related acid wastewater can lead to environmental pollution through sand blowing or seepage. For effective resource utilization and environmentally conscious development, rapid evaluations of the spatial distribution, type, and age of mine tailings are of national importance. Using spectral features, which are determined by the structure and composition of tailings, we develop an all-band tailing index, a modified normalized difference tailing index (MNTI), and a normalized difference tailings index for Fe-bearing minerals (NDTIFe). The all-band tailings index reflect the micro-structure and overall high reflectivity of mine tailings by comprehensively utilizing information from each band of Landsat 8 data. The MNTI and NDTIFe provide enhanced tailings composition information from the perspective of anion (carbanion and hydroxy) and cation (mainly ferric ion) contents, respectively. A tailing extraction model (TEM) is built using these three indexes to extract mine tailing information in Huangshi city. The TEM proposed in this paper can successfully and rapidly extract mine tailings information with an extraction precision of 84% in the research area.
Article
Attention Model has now become an important concept in neural networks that has been researched within diverse application domains. This survey provides a structured and comprehensive overview of the developments in modeling attention. In particular, we propose a taxonomy that groups existing techniques into coherent categories. We review salient neural architectures in which attention has been incorporated and discuss applications in which modeling attention has shown a significant impact. We also describe how attention has been used to improve the interpretability of neural networks. Finally, we discuss some future research directions in attention. We hope this survey will provide a succinct introduction to attention models and guide practitioners while developing approaches for their applications.
Article
Object detection is a fundamental visual recognition problem in computer vision and has been widely studied in the past decades. Visual object detection aims to find objects of certain target classes with precise localization in a given image and assign each object instance a corresponding class label. Due to the tremendous successes of deep learning based image classification, object detection techniques using deep learning have been actively studied in recent years. In this paper, we give a comprehensive survey of recent advances in visual object detection with deep learning. By reviewing a large body of recent related work in literature, we systematically analyze the existing object detection frameworks and organize the survey into three major parts: (i) detection components, (ii) learning strategies, and (iii) applications & benchmarks. In the survey, we cover a variety of factors affecting the detection performance in detail, such as detector architectures, feature learning, proposal generation, sampling strategies, etc. Finally, we discuss several future directions to facilitate and spur future research for visual object detection with deep learning.