Conference PaperPDF Available

STN PLAD: A Dataset for Multi-Size Power Line Assets Detection in High-Resolution UAV Images

Authors:
STN PLAD: A Dataset for Multi-Size Power Line
Assets Detection in High-Resolution UAV Images
Andr´
e Luiz Buarque Vieira-e-Silva, Heitor de Castro Felix, Thiago de Menezes Chaves,
Francisco Paulo Magalh˜
aes Sim˜
oes∗† , Veronica Teichrieb, Michel Mozinho dos Santos,
Hemir da Cunha Santiago, Virginia Ad´
elia Cordeiro Sgotti, and Henrique Baptista Duffles Teixeira Lott Neto§
Voxar Labs, Centro de Inform´
atica, Universidade Federal de Pernambuco, Recife, Brazil
{albvs,hcf2,tmc2,vt}@cin.ufpe.br
Departamento de Computac¸˜
ao, Universidade Federal Rural de Pernambuco, Recife, Brazil
francisco.simoes@ufrpe.br
In Forma Software, Recife, Brazil
{mmozinho,hsantiago}@informasoftware.com.br, vsgotti@informa.com.br
§Sistema de Transmiss˜
ao Nordeste - STN, Recife, Brazil
hlott@stnordeste.com.br
Abstract—Many power line companies are using UAVs to per-
form their inspection processes instead of putting their workers
at risk by making them climb high voltage power line towers,
for instance. A crucial task for the inspection is to detect and
classify assets in the power transmission lines. However, public
data related to power line assets are scarce, preventing a faster
evolution of this area. This work proposes the STN Power Line
Assets Dataset, containing high-resolution and real-world images
of multiple high-voltage power line components. It has 2,409
annotated objects divided into five classes: transmission tower,
insulator, spacer, tower plate, and Stockbridge damper, which
vary in size (resolution), orientation, illumination, angulation, and
background. This work also presents an evaluation with popular
deep object detection methods and MS-PAD, a new pipeline for
detecting power line assets in hi-res UAV images. The latter
outperforms the other methods achieving 89.2% mAP, showing
considerable room for improvement. The STN PLAD dataset is
publicly available at https://github.com/andreluizbvs/PLAD.
I. INTRODUCTION
Nowadays, practically all human activities depend on the
constant availability of electricity. Power transmission lines,
which have an essential role in this task, are constantly ex-
posed to the depreciating action of the environment. They have
components that may break, rust, loosen or even go missing.
The malfunction of such equipment affects the electricity grid,
causing inefficiency in the power transmission and, some-
times, blackouts. According to [1], most of the power grids
today are interconnected. Thus, these blackouts can initiate
others, affecting even larger regions, like a cascade effect [2].
That can trigger catastrophic consequences such as shutting
down hospitals, production at water supplies companies, and
telecommunication services [3], which leads to significant
economic losses for the energy company and, ultimately,
severe social impacts [4], [5]. According to Bruch et al. [4],
a power cut of only 30 minutes in the USA results in an
average loss of over 15 thousand US dollars for midsize and
large industrial clients and a loss of more than 90 thousand
US dollars for an eight-hour interruption [1].
Fig. 1. A few image clippings from the proposed dataset.
Given this scenario, constant maintenance of the equipment
is necessary, replacing defective ones before they can cause
any loss. Classically, a team has to check each station person-
ally. That implies having someone climbing the transmission
tower and checking the condition of its components, which
is dangerous and time-consuming [6]. In other words, there
is a cost to move people, and it takes time to get up at a
station, apart from safety issues associated with the service.
For example, Rahmani et al. [7] showed that there were 119
injured workers due to accidents between 2006 and 2012 in
an Iranian electricity distribution company, with seven deaths.
Recent developments in computer vision can be applied in
the field of Maintenance & Inspection, improving security and
productivity. As an example of application in the security area,
automatically detecting power line assets via UAVs eliminates
much of the safety risks of manual inspection, as inspectors
would not need to climb towers as often as before. Instead,
much of the inspection could occur through a direct analysis
of the detected components, which a human inspector could
perform in a secure environment or by a fault classification
method [8]. In addition to the security gains, there are also
arXiv:2108.07944v3 [cs.CV] 2 Sep 2021
financial and time gains, as the frequency of moving teams
and providing the necessary equipment would decrease.
The dataset plays a major role in the training of deep
learning networks, where its quality directly influences the
accuracy. That is why we should pay more attention to data, as
stated by [9]. Furthermore, public datasets play a fundamental
role in a rapidly advancing area. They allow researchers to
propose ideas and perform experiments even in scenarios
where they cannot get the data by themselves. Also, those
datasets usually serve as a benchmark for a specific task,
providing a fair comparison among new techniques.
It is evident how quickly certain areas of computer vision
evolved after the introduction of datasets such as CIFAR10
[10], ImageNet [11] and MNIST [12]. In that sense, public
datasets on power line assets for object detection are extremely
scarce, and the existing ones are quite limited, typically only
supporting one asset type or two, at most [13]–[15]. That
happens because most of the works in the area are privately
funded by companies that want to maintain a competitive
advantage by not making their datasets available.
As the main contribution, this paper introduces a new real-
world, high-resolution, and multi-category dataset for multi-
size power line assets recognition, the Power Line Assets
Dataset, or STN PLAD. It serves as publicly available develop-
ment data and benchmark for the computer vision community
working on automatic power line inspection. In addition,
experiments with state-of-the-art techniques show the dataset’s
strengths and limitations. These experiments used two popular
general-purpose object detectors, namely SSD and Faster R-
CNN. Based on its analysis, a variation of the training pipeline,
which we call MS-PAD, is proposed to improve the overall
object detection performance in the STN PLAD dataset.
This paper is organized as follows. The prior works are
presented in section II. The properties of the new STN Power
Line Assets Dataset are described in section III. section IV
shows the methods used to evaluate the proposed dataset. Next,
comparative and performance results of techniques applied
in STN PLAD are presented in section V, followed by a
discussion about what was seen in the tests in section VI and,
lastly, the final remarks in section VII.
II. RE LATE D WO RKS
This section is divided into two parts. First, the public
datasets closely related to power lines are presented, along
with their characteristics and limitations. Then, the existing
methods that attempt to detect multi-size power line assets in
high-resolution images are shown.
A. Public datasets related to power lines
A common problem found in the literature when using Deep
Learning to detect power line objects is finding data. There are
not enough publicly available datasets to feed detectors based
on deep learning methods, or they do not cover enough power
line components [1], [16]. Many similar works use private
datasets, generally provided by the companies or government
agencies financing the projects, which tend not to publish them
[1], [17]–[20]. Nevertheless, in the literature search presented
by Liu et al. [16] and in the work of Abdelfattah et al. [15],
a few publicly available datasets were found but with many
limitations. Table I summarizes the main public image datasets
related to power lines assets for the object detection task. The
last line shows the proposed dataset for comparison purposes.
As evidenced in Table I, there is a minimal amount of
public datasets related to power line asset detection. They
target distinct tasks that can be detection, classification, or
segmentation. For instance, the one in [21] is specifically
related to conductor wires in low-resolution images for bi-
nary classification. Zhang et al. [22] propose two datasets
of binarized masks of conductor wires of power lines in
urban and mountain scenarios, respectively. Abdelfattah et al.
[15] propose a dataset containing pixel-wise annotation (a.k.a.
instance segmentation) of both transmission towers and power
lines. However, the main competing datasets are CPLID [13],
and the one from Tomaszewski et al. [14] as they are the only
ones that use bounding box annotations.
CPLID [13] is a dataset related only to a specific type of
insulator with a specific shape and size. Although it also has
annotations of defects in some of those insulators, it lacks a
diversity of data since there is only one type of power line
asset. The defective samples are also limited because all of
them are from data augmentation, i.e., a single faulty insulator
was cropped from an image and then pasted into a limited set
of backgrounds, like seen in Figure 2(a). On the other hand,
STN PLAD provides asset variability in diverse scenarios.
The dataset in Tomaszewski et al. [14] has even more lim-
itations. They mainly target data quantity (they reported 2630
images) rather than data variability, as they video recorded a
ceramic long rod insulator hanging on an apparatus built by
them and then extracted some of the frames using a stationary
camera. These images only contain one of nine different
backgrounds that do not correspond to real-world power line
scenarios. An image from one of these nine variations is
shown in Figure 2(b). From the perspective of deep learning
techniques, this dataset has a very limited variability of scenes.
In summary, although this dataset has a reasonable amount of
data, the images taken in the same scenario have a high degree
of similarity. Thus, the dataset ends up not being independent
and identically distributed (IID) [23], [24], an essential dataset
property. In practical terms, it is not feasible to use it in most
techniques based on deep learning since it poses little to no
challenge to these techniques. In STN PLAD, the images are
collected by a drone in the field, providing several real-world
scenarios with multiple objects (size, appearance, position,
orientation, self-occlusion, background).
B. Detection of power line assets in high-resolution images
A few works address the issue of detecting objects related
to power lines in high-resolution images. The first one, Zhang
et al. [19], is a study on object detection in high-resolution
images captured through Unmanned Aerial Vehicles (UAVs),
using deep learning techniques. In their work, the authors
propose the MOHR dataset. This private dataset has over ten
TABLE I
MAI N PUB LI C IM AGE DATAS ETS R EL ATED T O POW ER LI NE S AS SET D ET EC TIO N.
Dataset #Assets Instances/image (average) Image size Instances Images Background variation
CPLID [13] 1 1.9 1152×864 1569 848 Limited
Tomaszewski et al. [14] 1 1 5616×3744 2630a2630bVery limited
STN PLAD 5 18.1 5472×3078 or 5472×3648 2409 133 Diverse
aAll the instances correspond to the same object.
bImages from this dataset are extremely similar and captured from just nine different points of view.
Fig. 2. Sample images from public datasets as compared to our dataset STN
PLAD in which (a) CPLID [13] with an asset inserted in a background, (b)
[14] with an asset in a non-real-world scenario, and (c) our dataset (STN
PLAD) with multi-objects in real-world scenes (zoomed in image).
thousand high-resolution UAV images with five classes: car,
truck, building, collapse, and flood damage. The UAV altitudes
are high, ranging from 200 to 400 meters, making the objects
look quite small. The authors apply six general-purpose object
detectors, SSD [25] and Faster R-CNN [26] included, to the
MOHR dataset. The results suggest that detecting small object
instances in high-resolution UAV images remains challenging
since they perform poorly. The best mean Average Precision
(mAP) achieved was 43.94%, yielded by RFCN-DF [27].
Those results reinforce the relevance of this task.
The works of Kong et al. [28] and Zhu et al. [29] share
some of their authors and contents, indicating that one is an
incremental improvement over the other. The former proposes
a technique to detect small objects in high-resolution images.
The technique, based on Faster R-CNN, is tested on a private
dataset of 3700 high-resolution images. However, the proposed
approach here is limited and prone to issues since only small
objects inside the context of a large one are attainable. For
instance, dampers are usually small independent objects far
from larger ones. Another issue is the low Average Precision
(AP) of some classes, such as the tower plate (73.2%), which
the authors justify by saying they are too small.
Finally, Zhu et al. [29] attempts to improve the efficiency
of their previous work by merging the two stages in order to
share early convolutional layers. They also use a private dataset
with high-resolution images with six classes: electric tower,
vibration damper, spacer, insulator, bird’s nest, and tower plate.
Despite the efficiency of this method, the same issues and
limitations that existed before regarding object detection are
maintained. The only difference is that the objects are not so
small as their previous work, which is one of the main factors
that positively impact the mAP.
These last two works, [28] and [29], are not reproducible
since the datasets are private, and they are not open-source.
III. STN PLAD: DATASET DESCRIPTION
The images were captured using a DJI Phantom 4 Pro 1,
and Figure 1 shows some image clippings. A set of policies
for data collection was proposed to ensure data variability and
consistency. First, the drone was handled by certified drone
pilots, who were instructed to capture the images, always
maintaining a similar distance to the transmission tower in
a wide shot due to the high-resolution nature of the camera.
In addition, the drone’s viewing angles were varied to ensure
better learning by models based on neural networks and
diverse daytime, weather, angulation, and illumination con-
ditions. Finally, several transmission towers were captured to
obtain background and component variation. This data capture
protocol provides images with a wide range and number of
power line assets in each one of them, with a mean of 18.1
instances per captured image as can be seen in Table I .
The equipped camera is a DJI FC6310 and it can take
pictures with a resolution of 5472 ×3078 (3:2) or 5472 ×3648
(16:9). Both aspect ratios were used during data collection.For
annotating the 2409 objects in all 133 captured images, the
LabelImg tool was used [30]. Two annotators were responsible
for carefully surrounding each object with a bounding box.
Each person took, on average, 10 minutes to annotate one im-
age. Each image is assigned to only one annotator to perform
all its annotations. To maintain the annotation consistency
between different annotators, they labeled each assigned image
with their highest possible scrutiny and were in touch during
the entire annotation process.
The total amount of images in STN PLAD may appear small
but, considering the employed data collection protocol, the
camera’s resolution, and, more importantly, the total amount of
object instances, it can be seen that it has a reasonable amount
of data. Images from STN PLAD have considerably more
information than regular images from common datasets, such
as ImageNet [11] and MSCOCO [31]. On average, the STN
PLAD has more than 18 objects per image with an average
area of at least 2.89×103pixels. This 18 objects/image density
is way above the related datasets, as seen in Table I. Finally,
the STN Power Line Assets Dataset is publicly available in
https://github.com/andreluizbvs/PLAD 2.
1https://www.dji.com/phantom-4-pro
2In case the article is accepted, the dataset will be posted on a web page
with a structured presentation.
TABLE II
STN PLAD STATIS TI CS .
Class name Label Instances Instances per image Average Area (px) Standard Deviation (px)
Transmission tower tower 253 1.9 2.61 ×1063.12 ×106
Insulator insulator 312 2.3 8.84 ×1048.55 ×104
Spacer spacer 253 1.9 2.82 ×1042.41 ×104
Tower plate plate 86 0.6 9.42 ×1031.11 ×104
Stockbridge damper damper 1505 11.3 2.89 ×1035.78 ×103
Fig. 3. Examples of all ve classes of power line assets in STN PLAD. Each column shows instances from one class. From left to right: Transmission tower,
Insulator, Spacer, Tower plate, and Stockbridge damper.
IV. MET HODS
This section describes two techniques that are often used
to validate object detection datasets [13], SSD and Faster R-
CNN. Their performance on STN PLAD is presented in the
next section and discussed later. The observed limitations in
dealing with the proposed dataset inspired the creation of a
pipeline called MS-PAD, which is also detailed here.
A. Single Shot MultiBox Detector (SSD)
In the context of power line inspection, SSD is one of the
suggested techniques of two recent reviews [1], [16] to target
the problem of detecting assets on power transmission towers.
Both reviews have the same context as this work, focusing on
inspecting power line assets from UAV images. Moreover, they
analyze Deep Learning techniques applied to solve problems
in the area. Some of the mentioned problems are assets
detection, assets segmentation, assets fault identification.
The parameters of the SSD used are the same as in the
original work by Liu et al. [25], such as the backbone, VGG16,
and all the parameters and dimensions for the convolutional
layers. In the original work, two different input layers were
proposed, 300×300 and 512×512. In our experiment, the latter
was used since it achieved better accuracy than the former,
according to the original results. Also, the images used for
this experiment have a higher resolution. This high-resolution
implies that the larger the size of the input layer, the less
the resizing effect will affect the input image quality. Finally,
weights pre-trained with the COCO Dataset [31] were used.
B. Faster R-CNN
The objective of including the Faster R-CNN [26] in the
tests was to use a recent technique of object detection to obtain
results close to the current state of the art. Faster R-CNN-
based networks are also suggested to detect and inspect power
transmission towers according to the same reviews mentioned
in subsection IV-A. They are also well consolidated, have
performed well in object detection competitions, and are used
by similar works [1], [16].
The network used for this experiment was the Feature Pyra-
mid Network (FPN) Faster R-CNN [32]. FPN aims to improve
the detection of small objects, as it uses multi-scale feature
maps and higher resolution layers to build new semantically
rich layers. Thus, information from the initial layers is used.
These layers are traditionally less condensed, but even so,
they already have a high semantic level. ResNet-101 was
used as a backbone, which had the best detection result in
its publication [32]. Also, the input resolution was chosen in
order to decrease the image resizing impact. The input image
is resized to 2736×1824, representing a downscaling factor of
4when compared to the original size, which is much less than
the downscaling factor of approximately 19 used in the SSD
experiment. All other parameters were kept as the original.
C. MS-PAD
After observing the results related to the SSD and Faster
R-CNN methods, it was noticed that a simple pipeline modi-
fication could enhance the overall performance of power line
assets detection in STN PLAD. This approach takes advantage
of the images’ high resolution, where information is lost after
resizing. In the Multi-Size Power line Asset Detection (MS-
PAD) workflow, represented in Figure 4, two independent
networks are trained separately. The SSD was chosen because
it performed better, as it will be shown in the next section.
The first of these two networks uses the original pipeline
that resizes its input, but this time trained without the Stock-
bridge damper class. The damper asset was excluded because
it has a much smaller size, being harder to identify after
image resizing. This strategy can be applied to other assets
not contemplated in this work that are small. An important
note is that, although the tower plate is also small, it still had
enough features after resizing that made it highly recognizable.
The second network is responsible for detecting small
objects, in this case, the Stockbridge damper class. It goes
through a different process, where the initial image is split
following a grid. The image is divided into 16 smaller ones
with a fixed resolution of 1368×769 or 1368 ×912, depending
on the original image, which can be 5472 ×3078 (3:2) or
5472 ×3648 (16:9). This 4×4division is constant since the
drone pilots followed a data collection protocol, in which the
drone had to stay at similar distances from the transmission
towers, as described in section III.
In summary, only the Stockbridge damper class is submitted
to the second step of MS-PAD, which divides the high-
resolution image in a 4×4grid. This choice is based on
the average area of the classes of Table II and the AP in the
original image-resizing approach. The Stockbridge damper has
the lowest average area and was not well detected using the
previously mentioned methods, indicating the need to receive
extra attention compared to other classes.
V. EX PE RIMEN TAL RESULTS
This section presents the two conducted experiments and
their results. The first one is responsible for comparing the
performance of the two mentioned object detectors in the
Fig. 4. MS-PAD pipeline. The input image is resized for the first network.
Meanwhile, the image is divided into a grid, resulting in the second network’s
inputs that generate different bounding boxes.
proposed dataset. The second one demonstrates another way
to deal with the input data, using MS-PAD, which was
detailed in subsection IV-C. In all experiments, the standard
metric of Average Precision (AP) is used to evaluate object
detection performance on STN PLAD. In order to validate
and obtain a greater degree of confidence in the results of the
proposed MS-PAD, the Monte Carlo cross-validation method
[33] was chosen and implemented in its experiments presented
in subsection V-B. This method creates krandom splits of
train and test sets of the whole dataset. Then, the model is
trained and tested for each ksplit, and the final result is the
average. In the end, this section shows which object detector
and which pipeline present the best results in the described
scenario according to the considered metric.
For the experiments, STN PLAD was split in a standard
80/20 proportion for the training and test sets, respectively.
Also, to consider that an object was correctly detected, the
Intersection over Union (IoU) between the ground truth and
the predicted bounding box had to be equal to or larger
than 0.5. Also, data augmentation is already an embedded
stage in both implementations of the techniques. Finally, the
experiments were performed on a desktop running the Ubuntu
18.04 Operating System, powered by an Nvidia RTX 2080 Ti
GPU (11 GB of VRAM) and an Intel Core i7 - 4790K CPU
@ 4.00 GHz with 32 GB of available RAM.
A. SSD and Faster R-CNN results
For this test, both detectors were trained once and for the
same period, about two days. The mAP results of using the
MS-PAD approach for SSD and Faster R-CNN were 90.2%
and 88.6%, respectively, showing that SSD has a slight advan-
tage over Faster R-CNN. These methods were also applied to
the two main dataset competitors of the proposed STN PLAD.
In CPLID [13], SSD and Faster R-CNN achieved 98.17% and
98.31% mAP, respectively. Regarding Tomaszewski et al. [14],
both detectors reached 100% mAP.
Figure 5 and Figure 6 show the visual results regarding
the detected objects by the SSD and the Faster R-CNN
methods, respectively, using the original approach, which only
resizes the input image. In the images, the bounding boxes’
colors are connected with the dataset classes: blue is for
the Insulators; yellow is for the Spacers; green is for the
Stockbridge dampers; red is for the Tower plate; white is for
the Transmission tower. It is possible to see in both figures how
Fig. 5. Qualitative detection results of the SSD technique using the original
image-resizing approach.
Fig. 6. Qualitative detection results of the Faster R-CNN technique using the
original image-resizing approach.
small the Stockbridge dampers are related to other objects.
These images also illustrates failure cases, like the middle
insulator in Figure 5 and the transmission tower in Figure 6.
B. MS-PAD results
For this experiment, k= 5, so five splits with randomly
selected samples were used, in which each split is used
twice: one time for the original image-resizing approach and
another time for MS-PAD. The total amount of iterations for
each training session is fixed at 20,000. The obtained results
regarding AP for each split are shown in Table III. In addition,
Table IV shows the average results from Table III reached by
each approach side-by-side, in a direct comparison.
Figure 7 and Figure 8 present the qualitative performance
of MS-PAD for big and small objects, respectively. The color
codes previously mentioned are maintained for these images.
VI. DISCUSSION
This section details and discusses all results presented in
section V, also giving insights into the usage of the MS-PAD
approach in the proposed STN Power Line Asset Dataset.
Fig. 7. Qualitative detection results of the MS-PAD pipeline for the big object
classes.
Fig. 8. Qualitative detection results of the MS-PAD pipeline for the small
object classes.
A. STN PLAD strengths & limitations
The proposed STN PLAD is the first public power line
assets dataset with multiple objects in real-world scenarios. It
contains five classes of entirely different objects with multiple
instances each, in varied real backgrounds. The data collection
protocol allows for a balance in data quantity and variability
since the captured images vary in illumination, backgrounds,
and weather conditions. Also, the drone position is not fixed
in order to obtain objects data from different perspectives.
Another STN PLAD challenging characteristic is that there
are many objects per image (18.1, on average) compared to
the related public datasets, which commonly have a small
instance per image rate (1 and 1.9, on average). Thanks to this
process, STN PLAD poses a reasonable challenge to recent
deep learning techniques, as observed in the section V, in
which the best of the tested approaches achieved an 89.2%
mAP, leaving considerable room for improvement.
Although the proposed STN PLAD provides new grounds in
the power line area and stimulates the development of power
line asset detection methods, it still has limitations. The main
one is related to its total amount of images. That prevents some
data-hungry object detectors from performing successfully
since they would require a more extensive dataset. Another
disadvantage is that the images were only collected from one
private transmission line. Even though different transmission
TABLE III
COMPARISON OF DETECTION RESULTS OF THE ORIGINAL IMAGE-RESI ZIN G AP PRO ACH (ORIGINAL)AN D THE MS-PAD PIPELINE (OURS)O F EA CH
MON TE CAR LO C ROS S-VALI DATI ON SP LI T (k)RE LATI VE TO T HE AVER AGE PRECISION (AP). TH E BE ST M AP RE SU LTS AR E IN B OLD .
k= 1 k= 2 k= 3 k= 4 k= 5
Original Ours Original Ours Original Ours Original Ours Original Ours
Transmission tower 0.885 0.905 0.883 0.901 0.875 0.874 0.920 0.883 0.945 0.938
Insulator 0.825 0.938 0.924 0.866 0.931 0.893 0.839 0.884 0.874 0.889
Spacer 0.917 0.810 0.789 0.850 0.914 0.910 0.863 0.805 0.853 0.905
Tower plate 0.932 0.994 0.830 1.00 0.941 0.990 0.984 0.997 0.995 0.876
Stockbridge damper 0.189 0.829 0.201 0.882 0.189 0.870 0.264 0.824 0.227 0.787
mAP 0.750 0.895 0.725 0.900 0.770 0.907 0.774 0.879 0.779 0.879
TABLE IV
DET ECT IO N AVERA GE RE SU LTS FR OM TAB LE III OF B OTH A PP ROAC HE S,
SI DE-B Y-SID E. THE BE ST R ES ULTS F OR EA CH C LAS S AN D TH E MAP I S IN
BO LD.
Original Ours
Transmission tower 0.902 0.900
Insulator 0.879 0.894
Spacer 0.867 0.856
Tower plate 0.936 0.971
Stockbridge damper 0.214 0.838
mAP 0.755 0.892
lines tend to be similar, it would be better to have images of
several transmission lines in other places to reduce the bias
of background, environment, and electrical assets appearance.
Finally, the images belong to a power line in Brazil, which
may not apply to other countries.
B. SSD and Faster R-CNN comparison in STN PLAD
This discussion is related to the experiment in subsec-
tion V-A. According to the proposed methodology, when
performing this experiment, it was expected that the results
related to the Faster R-CNN would surpass the results from
the SSD network considering the applied metrics. However, it
can be observed in the results reported in subsection V-A that
it did not occur. This result was obtained due to the limitations
of the used data. Deep learning techniques benefit from the use
of large amounts of data. According to Ng [34] [35] when the
number of data limits a deep learning technique, shallower
techniques can obtain comparable or even better results than
deeper techniques. The used Faster R-CNN is much deeper
and has more trainable parameters than the used SSD network.
Therefore, for a limited amount of data, the learning of the
used Faster R-CNN is limited.
The other results in this comparison were related to the SSD
and Faster R-CNN performance in the competing datasets. In
[14], both methods achieved 100% mAP, as expected due to
the reasons presented in subsection II-A. In [13], the original
SSD and Faster R-CNN obtained performances above 98%
mAP. The high mAP values obtained by both object detectors
in both competing datasets showed how well-resolved their
challenges already are.
C. MS-PAD in STN PLAD
It is possible to see in Table III the mAP values reached
by MS-PAD are higher than all mAP values of the Original
approach in at least ten percentage points (k= 5) and at
most 17.5 percentage points (k= 2).Table IV shows a direct
comparison, in which the values for each approach are an
average of the five splits showed in Table III. The best values
for each class AP and mAP are in bold. MS-PAD yields the
best AP result in three out of the five total classes, and there
is a gap of 13.7 percentage points between mAPs. That gap
is primarily due to the Stockbridge damper AP improvement,
which grew 62.4 percentage points using MS-PAD.
It is noteworthy that the performance of large objects
changes when comparing the Original and the MS-PAD.
That may happen because during the MS-PAD resize branch
training, one less class is considered (Stockbridge damper).
That directly influences how the network learns since there is
a different amount of objects and classes, directly impacting
the final performance of large assets. Also, it is important to
note that there is no guarantee that the performance impact
will be positive or negative when training with one less class.
VII. CONCLUSIONS
This work proposes a new public real-world high-resolution
power line asset dataset with multiple assets categories, called
STN Power Line Assets Dataset (PLAD). Its images were
captured by an Unmanned Aerial Vehicle (UAV) following
a data collection protocol to ensure data variability in order
to benefit deep learning models. STN PLAD contains 2409
annotated objects across 133 images divided into five classes
with different shapes and sizes. It has the biggest amount
of power line asset types among all public power line assets
datasets, with the highest density of objects per image between
them as well. The latter is possible due to its images having
far above average resolutions, 5472×3078 and 5472×3648,
more precisely. After evaluating STN PLAD in recent general-
purpose object detectors, a different pipeline called MS-PAD
is proposed. This pipeline contains a simple modification that
allows for an mAP improvement from 75.5% to 89.2%. STN
PLAD is publicly available to mitigate the lack of data in
the power line inspection area and provide a new challenge
to the computer vision community in order to stimulate the
proposition of new asset detection methods for power lines.
ACKNOWLEDGMENT
The authors acknowledge the financial support of STN - Sis-
tema de Transmiss˜
ao Nordeste S.A. through the ANEEL R&D
Program for the development of development of the research
project entitled: “PD-04825-0006/2019: Inspec¸˜
ao com Drones
por Meio do Acoplamento Eletrost´
atico para Carregamento
de Baterias em Voo e Uso de Aprendizagem de M ´
aquina para
Classificac¸˜
ao Autom´
atica de Defeitos”.
This research was funded in part by the Coordenac¸˜
ao
de Aperfeic¸oamento de Pessoal de N´
ıvel Superior - Brasil
(CAPES) - Finance Code 001 and by the Conselho Nacional
de Desenvolvimento Cient´
ıfico e Tecnol´
ogico (CNPq).
REFERENCES
[1] V. N. Nguyen, R. Jenssen, and D. Roverso, Automatic autonomous
vision-based power line inspection: A review of current status
and the potential role of deep learning,” International Journal of
Electrical Power & Energy Systems, vol. 99, pp. 107 120,
2018. [Online]. Available: http://www.sciencedirect.com/science/article/
pii/S0142061517324444
[2] Y. Pradeep, S. A. Khaparde, and R. K. Joshi, “High level event ontology
for multiarea power system, IEEE Transactions on Smart Grid, vol. 3,
no. 1, pp. 193–202, 2012.
[3] A. Castillo, “Risk analysis and management in power outage and
restoration: A literature survey,” Electric Power Systems Research, vol.
107, pp. 9 15, 2014. [Online]. Available: http://www.sciencedirect.
com/science/article/pii/S0378779613002435
[4] M. Bruch, V. M¨
unch, M. Aichinger, M. Kuhn, M. Weymann, and
G. Schmid, “Power blackout risks, in Cro Forum, 2011, p. 28.
[5] L. Li, H. Wu, Y. Song, and Y. Liu, “A state-failure-network method to
identify critical components in power systems, Electric Power Systems
Research, vol. 181, p. 106192, 2020.
[6] Y. Hu and K. Liu, Inspection and Monitoring Technologies of Transmis-
sion Lines with Remote Sensing. Academic Press, 2017.
[7] A. Rahmani, M. Khadem, E. Madreseh, H.-A. Aghaei, M. Raei, and
M. Karchani, “Descriptive study of occupational accidents and their
causes among electricity distribution company workers at an eight-year
period in iran,” Safety and health at work, vol. 4, no. 3, pp. 160–165,
2013.
[8] J. Li, H. Wu, C. Hu, and C. Yu, A fault diagnosis system based on
case decision technology for uav inspection of power lines, in IOP
Conference Series: Earth and Environmental Science, vol. 632, no. 4.
IOP Publishing, 2021, p. 042077.
[9] N. Sambasivan, S. Kapania, H. Highfill, D. Akrong, P. Paritosh, and
L. M. Aroyo, ““everyone wants to do the model work, not the data
work”: Data cascades in high-stakes ai,” in proceedings of the 2021
CHI Conference on Human Factors in Computing Systems, 2021, pp.
1–15.
[10] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features
from tiny images,” Master’s thesis, Department of Computer Science,
University of Toronto, 2009.
[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks, Advances in neural informa-
tion processing systems, vol. 25, pp. 1097–1105, 2012.
[12] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning
applied to document recognition,” Proceedings of the IEEE, vol. 86,
no. 11, pp. 2278–2324, 1998.
[13] X. Tao, D. Zhang, Z. Wang, X. Liu, H. Zhang, and D. Xu, “Detection
of power line insulator defects using aerial images analyzed with
convolutional neural networks, IEEE Transactions on Systems, Man,
and Cybernetics: Systems, 2018.
[14] M. Tomaszewski, B. Ruszczak, and P. Michalski, “The collection of
images of an insulator taken outdoors in varying lighting conditions
with additional laser spots,” Data in Brief, vol. 18, pp. 765 768,
2018. [Online]. Available: http://www.sciencedirect.com/science/article/
pii/S2352340918302701
[15] R. Abdelfattah, X. Wang, and S. Wang, “Ttpla: An aerial-image dataset
for detection and segmentation of transmission towers and power lines,
in Proceedings of the Asian Conference on Computer Vision, 2020.
[16] X. Liu, X. Miao, H. Jiang, and J. Chen, “Review of data analysis in
vision inspection of power lines with an in-depth discussion of deep
learning technology, 2020.
[17] X. Lei and Z. Sui, “Intelligent fault detection of high voltage line
based on the faster r-cnn, Measurement, vol. 138, pp. 379–385, 2019.
[Online]. Available: https://www.sciencedirect.com/science/article/pii/
S0263224119300831
[18] Y. Yang, L. Wang, Y. Wang, and X. Mei, “Insulator self-shattering
detection: a deep convolutional neural network approach, Multimedia
Tools and Applications, vol. 78, no. 8, pp. 10 097–10 112, 2019.
[19] H. Zhang, M. Sun, Y. Ji, S. Xu, and W. Cao, “Learning-based object
detection in high resolution uav images: An empirical study, in 2019
IEEE 17th International Conference on Industrial Informatics (INDIN),
vol. 1, 2019, pp. 886–889.
[20] Z. A. Siddiqui, U. Park, S.-W. Lee, N.-J. Jung, M. Choi, C. Lim, and
J.-H. Seo, “Robust powerline equipment inspection system based on a
convolutional neural network, Sensors, vol. 18, no. 11, p. 3837, 2018.
[21] Y. ¨
O. Emre, G. ¨
O. Nezih et al., “Powerline image dataset (infrared-ir
and visible light-vl),” Mendeley Data, vol. 7, 2017. [Online]. Available:
https://data.mendeley.com/datasets/n6wrv4ry6v/7
[22] H. Zhang, W. Yang, H. Yu, H. Zhang, and G.-S. Xia, “Detecting
power lines in uav images with convolutional features and structured
constraints,” Remote Sensing, vol. 11, no. 11, p. 1342, 2019.
[23] L. Bottou, “Large-scale machine learning with stochastic gradient de-
scent,” in Proceedings of COMPSTAT’2010. Springer, 2010, pp. 177–
186.
[24] A. Rakhlin, O. Shamir, and K. Sridharan, “Making gradient descent
optimal for strongly convex stochastic optimization, in Proceedings
of the 29th International Coference on International Conference on
Machine Learning, 2012, pp. 1571–1578.
[25] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and
A. C. Berg, “Ssd: Single shot multibox detector, in Computer Vision
ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Cham:
Springer International Publishing, 2016, pp. 21–37.
[26] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time
object detection with region proposal networks, in Advances in Neural
Information Processing Systems, C. Cortes, N. Lawrence, D. Lee,
M. Sugiyama, and R. Garnett, Eds., vol. 28. Curran Associates, Inc.,
2015, pp. 91–99. [Online]. Available: https://proceedings.neurips.cc/
paper/2015/file/14bfa6bb14875e45bba028a21ed38046-Paper.pdf
[27] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable
convolutional networks, in Proceedings of the IEEE International
Conference on Computer Vision (ICCV), Oct 2017.
[28] L. Kong, X. Zhu, and G. Wang, “Context semantics for small target
detection in large-field images with two cascaded faster r-CNNs,
Journal of Physics: Conference Series, vol. 1069, p. 012138, aug 2018.
[Online]. Available: https://doi.org/10.1088%2F1742-6596%2F1069%
2F1%2F012138
[29] X. Zhu, L. Kong, G. Wang, Z. Hu, and S. Li, “Multi-size object
detection assisting fault diagnosis of power systems based on improved
cascaded faster R-CNNs,” in Tenth International Conference on Digital
Image Processing (ICDIP 2018), X. Jiang and J.-N. Hwang, Eds., vol.
10806, International Society for Optics and Photonics. SPIE, 2018,
pp. 342 351. [Online]. Available: https://doi.org/10.1117/12.2503064
[30] Tzutalin, “Labelimg,” Git code, 2015,
https://github.com/tzutalin/labelImg.
[31] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,
P. Doll´
ar, and C. L. Zitnick, “Microsoft coco: Common objects in
context,” in European conference on computer vision. Springer, 2014,
pp. 740–755.
[32] T.-Y. Lin, P. Doll´
ar, R. Girshick, K. He, B. Hariharan, and S. Belongie,
“Feature pyramid networks for object detection,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2017, pp.
2117–2125.
[33] W. Dubitzky, M. Granzow, and D. P. Berrar, Fundamentals of data
mining in genomics and proteomics. Springer Science & Business
Media, 2007.
[34] A. Ng, Machine learning yearning. Stanford Press, 2017, http://www.
mlyearning.org/(96).
[35] A. Tang, R. Tam, A. Cadrin-Chˆ
enevert, W. Guest, J. Chong, J. Barfett,
L. Chepelev, R. Cairns, J. R. Mitchell, M. D. Cicero et al., “Canadian
association of radiologists white paper on artificial intelligence in
radiology, Canadian Association of Radiologists Journal, vol. 69, no. 2,
pp. 120–135, 2018.
... Vieira et al. [67] highlighted that the scarcity of publicly available data related to power line assets hinders progress in this field. To address this gap, they introduced the STN (Sistema de Transmissao Nordeste) Power Line Assets Dataset, a comprehensive collection of high-resolution, real-world images featuring diverse high-voltage power line components. ...
Article
Full-text available
Electrical power systems are susceptible to several damaging effects, potentially leading to faults reaching safety limits and posing critical operational risks. Traditionally, manual inspection has been employed to detect such faults; however, this method is inefficient—being both time-consuming and lacking precision. Once a fault is observed, prompt recognition becomes paramount to ensure the safe resumption of system operations. Addressing this issue, Drone-based strategies have proven to be effective in recognizing these irregularities. In particular, intelligent inspection methods have gained much attention in the past few years, evidenced by a remarkable 1000% surge in the adoption of deep learning techniques and a 420% surge in the utilization of drones. In this survey, we explore the main strategies of evolving Drone-based intelligent inspection methods for fault recognition in electrical power systems. The application of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses methodology revealed a total of 36 papers in the literature on the subject. As primary results, a synthetic description of the works was provided, unveiling the most frequently used algorithms, fault types, and sensors, along with their relationships established through a heatmap diagram. The identification of literature gaps and future research directions reveals the path for further exploration, including the need for more robust algorithms to improve fault detection accuracy, techniques to mitigate the impact of blurred images, methods for detecting multiple faults simultaneously, advancements in real-time processing, increased automation for field deployment, and the development of more comprehensive and diverse datasets.
Article
Full-text available
In recent years, power line maintenance has seen a paradigm shift by moving towards computer vision-powered automated inspection. The utilization of an extensive collection of videos and images has become essential for maintaining the reliability, safety, and sustainability of electricity transmission. A significant focus on applying deep learning techniques for enhancing power line inspection processes has been observed in recent research. A comprehensive review of existing studies has been conducted in this paper, to aid researchers and industries in developing improved deep learning-based systems for analyzing power line data. The conventional steps of data analysis in power line inspections have been examined, and the body of current research has been systematically categorized into two main areas: the detection of components and the diagnosis of faults. A detailed summary of the diverse methods and techniques employed in these areas has been encapsulated, providing insights into their functionality and use cases. Special attention has been given to the exploration of deep learning-based methodologies for the analysis of power line inspection data, with an exposition of their fundamental principles and practical applications. Moreover, a vision for future research directions has been outlined, highlighting the need for advancements such as edge–cloud collaboration, and multi-modal analysis among others. Thus, this paper serves as a comprehensive resource for researchers delving into deep learning for power line analysis, illuminating the extent of current knowledge and the potential areas for future investigation.
Article
Transmission lines (TLs), as the crucial link between power generation and consumption, constitute an indispensable component of modern power systems. Conducting multi-label classification (MLC) on TL images facilitates the extraction of equipment information across diverse scenarios, aiding the implementation of downstream tasks such as target detection and image segmentation. However, occlusions between different equipment, inconsistent appearances, and confounding inter-class relationships significantly impede the accuracy of MLC. In this paper, we propose a novel approach for MLC of TL equipment based on dual-mask Transformer. This method takes both images and labels as joint inputs and utilizes a Transformer to learn the complex dependencies between visual features and labels. Before being fed into the network, image and label masks are designed based on the characteristics of TL images. Specifically, the image mask simulates real-world occlusion issues of equipment, thereby enhancing the model’s robustness, while the label mask strengthens the semantic correlations between equipment and guides the network in updating image features during training. Furthermore, the network adopts a collaborative training strategy by incorporating a branch network onto the main network. This branch network captures different spatial regions occupied by objects from various categories through a class-specific residual attention mechanism, thus improving the quality of basic visual features. Experimental results demonstrate that the proposed method outperforms mainstream MLC methods on TL images. Our source code is available at https://github.com/AHU-psy/-Dual-mask-Transformer.
Article
Full-text available
Robust person detection in aerial images under all-weather conditions stands as a fundamental technology pivotal to the efficacy of intelligent search and rescue (SaR) tasks. However, the challenges stem from the varied postures, sparsity, diminutiveness, and faintness of personnel objects when viewed from an air-to-ground perspective, leading to issues with insufficient feature representation and suboptimal detection accuracy. This survey commences by underscoring the extensive potential applications and the prevailing limitations associated with aerial person detection (APD) within the scope of drone-assisted SaR scenarios. To meet the requirement of APD applications, we thoroughly investigate advancements and challenges in 4 related methodologies, including object-aware methods for size and perspective variability, sample-oriented methods with sparse distribution, information-fusion methods for the issue of lighting or visibility, and lightweight methods on constrained devices. Furthermore, to foster advancements in APD, we have conducted a comprehensive APD dataset labeled as “VTSaR”, which stands out from the existing publicly accessible APD datasets by offering a greater diversity of scenes, varying personnel behaviors, flexible capture angles, differing capture heights, and an inclusion of aligned visible and infrared samples along with synthetic samples. Finally, we evaluate the performance of mainstream detection methods on VTSaR benchmarks, advocating for APD’s broader application across various domains.
Preprint
Full-text available
In recent years, power line maintenance has seen a paradigm shift by moving towards computer vision-powered automated inspection. The utilization of an extensive collection of videos and images has become essential for maintaining the reliability, safety, and sustainability of electricity transmission. A significant focus on applying deep learning techniques for enhancing power line inspection processes has been observed in recent research. A comprehensive review of existing studies has been conducted in this paper, to aid researchers and industries in developing improved deep learning-based systems for analyzing power line data. The conventional steps of data analysis in power line inspections have been examined, and the body of current research has been systematically categorized into two main areas: the detection of components and the diagnosis of faults. A detailed summary of the diverse methods and techniques employed in these areas has been encapsulated, providing insights into their functionality and use cases. Special attention has been given to the exploration of deep learning-based methodologies for the analysis of power line inspection data, with an exposition of their fundamental principles and practical applications. Moreover, a vision for future research directions has been outlined, highlighting the need for advancements such as edge-cloud collaboration, and multi-modal analysis among others. Thus, this paper serves as a comprehensive resource for researchers delving into deep learning for power line analysis, illuminating the extent of current knowledge and the potential areas for future investigation.
Article
Full-text available
Though Deep Learning CNN is mostly used in UAV power line-inspection system for the application of intelligent image recognition technology, can design image features easily and has strong adaptability to complex environments, but three problems deafly influence the actual results of application system such as insufficient image samples library, scarce labeling samples, and absent open-data source. To conquer these problems, CBR is proposed as a strategy for knowledge reasoning, which transform the similar case-space to a new situation for problem-solving, so the combination of RBR and CBR is expected to construct our flexible case- decision diagnosis system, which integrates efficient machine learning methods to give their full advantages to guarantee the good performance of the system for fault detection. The on spot experimental results indicates our system performs efficiently, assist people in decision-making and can find potential equipment faults.
Article
Full-text available
Electric power line equipment such as insulators, cut-out-switches, and lightning-arresters play important roles in ensuring a safe and uninterrupted power supply. Unfortunately, their continuous exposure to rugged environmental conditions may cause physical or electrical defects in them which may lead to the failure to the electrical system. In this paper, we present an automatic real-time electrical equipment detection and defect analysis system. Unlike previous handcrafted feature-based approaches, the proposed system utilizes a Convolutional Neural Network (CNN)-based equipment detection framework, making it possible to detect 17 different types of powerline insulators in a highly cluttered environment. We also propose a novel rotation normalization and ellipse detection method that play vital roles in the defect analysis process. Finally, we present a novel defect analyzer that is capable of detecting gunshot defects occurring in electrical equipment. The proposed system uses two cameras; a low-resolution camera that detects insulators from long-shot images, and a high-resolution camera which captures close-shot images of the equipment at high-resolution that helps for effective defect analysis. We demonstrate the performances of the proposed real-time equipment detection with up to 93% recall with 92% precision, and defect analysis system with up to 98% accuracy, on a large evaluation dataset. Experimental results show that the proposed system achieves state-of-the-art performance in automatic powerline equipment inspection.
Article
Full-text available
The fault detection of insulators is very important because these insulators, as insulation controls, play an important role in transmission lines. Under the background that the unmanned aerial vehicle (UAV) instead of manual inspection has become the trend for power line inspection, the automatic recognition of insulator faults from big data of aerial images is undoubtedly a key issue that must be solved. In this paper, a method using the deep convolutional neural network (DCNN) to detect insulator self-shattering is proposed. Compared with the traditional method, the proposed method can extract fault features from aerial images automatically and can recognize insulator self-shattering under the big data condition. The experiments of a testing set with 341 real-world images captured from a UAV show that the correct identification rate can reach 98.53%, which suggests that the model outperforms existing methods in detecting insulator self-shattering. The proposed method can be further improved when the training dataset is supplemented and updated in applications.
Article
Full-text available
Computer vision and image processing techniques have been widely applied to power transmission line inspection. However, the successful detection of small targets in large scenes is still challenging due to their low resolution and poor feature representation. Existing methods, such as multi-scale image pyramid, multi-scale feature pyramid and multiple heterogeneous feature fusion, etc. can extract more representative features of small objects, but they usually require high computation cost. In this paper, we propose an effective two cascaded Faster R-CNN strategy, which is based on multi-scale features and semantic information between the objects and the background, to address the small target detection in large scenes. Specially, we detect large object candidate proposals that may contain small objects at first and then map them to the original images to detect the small-sized targets on the high resolution regions. Experiments show that our strategy could lead to higher (83.0%) accuracy of small target detection and recognition than the one-stage Faster R-CNN (78.3%) on the dataset of aerial images.
Chapter
Accurate detection and segmentation of transmission towers (TTs) and power lines (PLs) from aerial images plays a key role in protecting power-grid security and low-altitude UAV safety. Meanwhile, aerial images of TTs and PLs pose a number of new challenges to the computer vision researchers who work on object detection and segmentation – PLs are long and thin, and may show similar color as the background; TTs can be of various shapes and most likely made up of line structures of various sparsity; The background scene, lighting, and object sizes can vary significantly from one image to another. In this paper we collect and release a new TT/PL Aerial-image (TTPLA) dataset, consisting of 1,100 images with the resolution of 3,840 ×\times 2,160 pixels, as well as manually labeled 8,987 instances of TTs and PLs. We develop novel policies for collecting, annotating, and labeling the images in TTPLA. Different from other relevant datasets, TTPLA supports evaluation of instance segmentation, besides detection and semantic segmentation. To build a baseline for detection and segmentation tasks on TTPLA, we report the performance of several state-of-the-art deep learning models on our dataset. TTPLA dataset is publicly available at https://github.com/r3ab/ttpla_dataset.
Article
In order to mitigate cascading failure blackout risks in power systems, the critical components whose failures lead to high blackout risks should be identified. In this paper, such critical components are identified by the state-failure network (SF-network) formed by cascading failure chain and loss data, which can be gathered from either utilities or simulations. The failures along the chains are rearranged in the SF-network, where each failure is allocated a value that can reveal the blackout risks after their occurrences. Thus, critical failures can be identified in the SF-network where the failures raise up blackout risks, and thus the critical components can be found based on their critical failure risks. The simulation results validate the effectiveness of the proposed method.
Article
To realize intelligent fault detection of high voltage line, a deep convolution neural network method based on Faster R-CNN method is proposed to locate the broken insulators and bird nests. With the region proposal network, the Faster R-CNN chooses a random region in the features of the image as the proposal region, and trains them to get the corresponding category and location for a certain component in the image. Since the internal and regional features of the image can be learned, the Faster R-CNN method transforms the problem of target classification into the problem of target detection and recognition. Based on the ResNet-101 network model, the damage of insulators and bird nests in the electric power line can be located effectively.
Article
As the failure of power line insulators leads to the failure of power transmission systems, an insulator inspection system based on an aerial platform is widely used. Insulator defect detection is performed against complex backgrounds in aerial images, presenting an interesting but challenging problem. Traditional methods, based on handcrafted features or shallow-learning techniques, can only localize insulators and detect faults under specific detection conditions, such as when sufficient prior knowledge is available, with low background interference, at certain object scales, or under specific illumination conditions. This paper discusses the automatic detection of insulator defects using aerial images, accurately localizing insulator defects appearing in input images captured from real inspection environments. We propose a novel deep convolutional neural network (CNN) cascading architecture for performing localization and detecting defects in insulators. The cascading network uses a CNN based on a region proposal network to transform defect inspection into a two-level object detection problem. To address the scarcity of defect images in a real inspection environment, a data augmentation method is also proposed that includes four operations: 1) affine transformation; 2) insulator segmentation and background fusion; 3) Gaussian blur; and 4) brightness transformation. Defect detection precision and recall of the proposed method are 0.91 and 0.96 using a standard insulator dataset, and insulator defects under various conditions can be successfully detected. Experimental results demonstrate that this method meets the robustness and accuracy requirements for insulator defect detection.