Content uploaded by Athanasios Bachoumis
Author content
All content in this area was uploaded by Athanasios Bachoumis on Mar 10, 2023
Content may be subject to copyright.
Deep Learning-based Application for Fault Location Identification and Type Classification
in Active Distribution Grids
V. Rizeakosa, A. Bachoumisa,∗, N. Andriopoulosa, M. Birbasa, A. Birbasa
aDep. of Electrical and Computer Engineering, University of Patras, Rio Campus,26504, Patras, Greece
Abstract
The high penetration of distributed energy resources, especially weather-dependent sources, even at the edge of the distribution
grids, has increased the power system uncertainties and drastically shifted the operational status quo for the system operators. For
the operators to ensure the uninterrupted electricity supply of the end-consumers, the fast and accurate response to fault events
is of critical importance. This paper proposes a data-driven fault location identification and types classification application based
on the continuous wavelet transformation and convolutional neural networks optimally configured through Bayesian optimization.
This application leverages the proliferation of high-resolution measurement devices in distribution networks. It can locate the exact
position of the short-circuit faults and classify them into eleven different types. Its intrinsic models grasp the spatial characteristics
and the converted in frequency domain temporal ones of the three-phase voltage and current timeseries measurements stemming
from the field devices, thus increasing the operators’ visibility of their networks in real-time. We conduct simulations through
synthetic data, which we provide in an open-source repository, that replicate a wide range of fault occurrence scenarios with
eleven different types, with the resistance ranging from 50Ωto 2kΩand with duration from 20ms to approximately 2s, under noise
conditions injected by devices and load variability. The results showcase the efficacy of the proposed method reaching an accuracy
of 91.4% for fault detection, 93.77% for correct branch identification, 94.93% for fault type classification, and RMSE value of
2.45% for location calculation.
Keywords: Active distribution grids, CNNs, Deep learning, Fault detection and location identification, Wavelet transformation
1. Introduction
In the ever-evolving environment of the active distribution
networks, the uninterrupted and high-quality supply of the end-
customer is threatened due to the intermittent nature of the Re-
newable Energy Resources (RES). Grid operators shall ensure
power system reliability by performing high grid observability
even at the edge of the Low-Voltage Distribution Grid (LVDG)
in real-time conditions to achieve rapid recovery after the emer-
gence of contingencies. The advancements in Information and
Communication Technology (ICT) through the emergence of
low-latency telecommunication networks and Advanced Meter-
ing Infrastructure (AMI) [1], along with the accelerated progress
in Machine Learning (ML) and especially in Deep Neural Net-
works (DNNs) [2], can act as a catalyst for alleviating the prob-
lems arising in rich distribution networks, i.e., highly RES-
penetrated with a significant number of prosumers [3, 4].
⋆This research has been financed by the European Union, under the Horizon
2020 project 864537: Flexible Energy Production, Demand and Storage-based
Virtual Power Plants for Electricity Markets and Resilient DSO Operation –
“FEVER” H2020-LC-SC3-2018-2019-2020.
∗Corresponding author
Email addresses: up1053537@upatras.gr (V. Rizeakos),
abachoumis@ece.upatras.gr (A. Bachoumis),
nadriopoulos@ece.upatras.gr (N. Andriopoulos),
mbirbas@ece.upatras.gr (M. Birbas), birbas@ece.upatras.gr (A.
Birbas)
Active and smart LVDGs can self-heal in case a fault oc-
curs. Several strategies are followed by the grid operators to
react to contingencies and apply self-healing control practices
[5–8]. The cornerstone of these practices is the execution of
accurate and fast actions for fault diagnosis, i.e., fault detection
and identification of the specific location and type. However,
this activity is not trivial in LVDGs, due to particular character-
istics that impede the traditionally used methods. Specifically,
LVDGs have a high number of branches, are multi-phase, and
usually have an unbalanced operation due to single-phase con-
nected loads. Different types of conductors do exist, connecting
the nodes with different characteristics and lengths, leading to
a wide range of resistance (R) and reactance (X) values with a
high R/Xratio. In addition, a limited number of AMI devices
exist, reducing the overall observability, and have a radial struc-
ture and operation [9]. Therefore, the development of a fault
diagnosis method for LVDGs shall consider from the initial de-
sign process all the above-mentioned inherent characteristics of
the LVDGs.
1.1. Literature
The approaches used for fault diagnosis in power systems
can be classified into three categories. The first category in-
cludes the classical approaches that use direct modeling tech-
niques, with the most important being the impedance-based
[10–12] and traveling-wave methods [13, 14]. System opera-
tors have widely used over the last decades these two methods
Preprint submitted to Applied Energy July 29, 2022
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
to perform fault diagnosis in high and medium voltage networks
and are dependent on the line parameters. However, for both
methods, their efficiency in LVDGs decreases significantly. For
the former, the ability to accurately detect the fault location de-
teriorates due to the high number of branches at the LVDGs
[15]. The latter method lacks accuracy in LVDGs, due to the
presence of several branches, impending the differentiation be-
tween waves [16]. The exact fault location is crucial to be iden-
tified in rich LVDGs to ensure the security of supply for the
end-consumers.
The second category consists of data-driven methods with
a wide spectrum of techniques from signal processing to Ar-
tificial Intelligence (AI) domains. In both [17] and [18], fault
diagnosis methods using Support Vector Machines (SVMs) in
a real distribution feeders are presented. [19] proposes a fault
diagnosis method in LVDGs based on gradient boosting trees.
Moreover, several works with applications both at the trans-
mission and distribution levels leverage domain transformation-
based methods, such as wavelet transform, to conduct fault de-
tection [20–22]. In [23], a hybrid clustering algorithm based
on k-Nearest Neighbors (k-NN) and k-Means is developed, us-
ing as a preprocessing method the Matching Pursuit Decom-
position (MPD). A complementary clustering technique, the
Density-Based Spatial Clustering of Applications with Noise
(DBSCAN), has been implemented in [24] for fault diagnosis.
The feature selection and dimensionality reduction methods of
Principal Component Analysis (PCA) and Random Forest (RF)
have been used for active LVDGs in [25] and [26], respectively.
A Multi-Layer Perceptor (MLP) along with Extreme Learning
Machine (ELM) are deployed in [27], particularly for radial
grid topologies. DNNs are employed in [28] for identifying the
fault location and type for radial topologies in LVDGs. Finally,
Convolutional Neural Networks (CNNs) have also been lever-
aged for fault diagnosis in both transmission and distribution
levels [29, 30].
The last category includes methods that use a hybrid ap-
proach for fault diagnosis. These methods combine data-driven
and model-based approaches to conduct fault location and type
identification processes. In [31], both the impedance-based method
and DNNs are employed, where in [32], a ruled-based fuzzy
logic is developed, which can detect the difference between
simulated data and the actual measurements to extract the ac-
curate short-circuit fault location.
1.2. Work Contributions
From the above-conducted literature review, we can con-
clude that in the context of LVDGs, data-driven methods have
gained popularity mainly for two reasons: (i) the limitations
that model-driven methods experience at the edge of the grid
and (ii) the accelerating integration of AMI even at the residen-
tial end-consumer level. However, the amount of data-driven
works that focus on the LVDGs is limited [33]. By further con-
tributing towards that direction, this work proposes an applica-
tion for smart, rich, and radial LVDGs for Fault Location Iden-
tification and Type Classification, called hereafter the FLITC
application, based on advanced DNN architectures. Specifi-
cally, Continuous Wavelet Transformation (CWT) and CNN ar-
chitectures are employed to consider the spatio-temporal char-
acteristics of the AMI measurements. This work aims to detect
the fault occurrence, its type, and exact location in real-time
conditions. This information is used as an input into an ad-
vanced self-healing application aiming at repairing the contin-
gencies and performing energy restoration efforts to reduce the
impact of energy interruption on the consumers, by decreasing
the number (SAIFI) and duration (SAIDI) of the interruptions
[34].
The main contributions of this work can be concisely de-
scribed as follows:
•Propose of an application of DNNs to identify in real-
time conditions eleven different types of short-circuit faults
in active LVDGs and find the exact location (feeder and
branch) and distance from the root node, considering the
constraints of AMI devices resolution and the computa-
tional time,
•Leverage the CWT along with Dynamic Mode Decom-
position (DMD) techniques as preprocessing stages into
the CNNs, constituting the application’s cornerstone. To
the authors’ knowledge, it is the first time across litera-
ture that this data-driven approach, empowered by meth-
ods of the signal processing domain, is used for the fault
detection and location identification application in active
LVDGs, under noisy measurements that are introduced
by the AMI and the variability of loads and distributed
generation,
•Dataset generation that simulates the LVDG operation
under fault occurrences for tuning and evaluation of the
application,
•Optimal hyperparameters tuning in each model based on
the Bayesian Optimization (BO) algorithm and particu-
larly on the Tree Parzen Estimator (TPE),
•Showcase the superiority of the proposed models com-
pared to benchmark models existing in the literature, and
•Empowerment of the research results’ reproducibility and
transparency across the academia, by making available
the source code both for the dataset generation and the
FLITC application, in an open-source repository [35].
1.3. Paper Outline
The paper is structured as follows: Section 2 includes a
thorough description of the problem. Section 3 presents the
mathematical building blocks and concepts upon which the FLITC
application is constructed. Section 4 provides the proposed
FLITC application, a deep analysis of the DNN-based archi-
tecture, the algorithm used for hyperparameter tuning, and the
employed loss functions. Section 5 includes a description of the
use case serving for validation purposes. Section 6 introduces
the extensive exploration of the proposed application’s results
to showcase its efficiency and applicability. We conclude our
work in Section 7, where a summary of the proposed applica-
tion and recommendations for further work are given.
2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
2. Problem statement and specifications
This section includes the description of the fault detection
problem and the minimum requirements the FLITC application
has to meet to perform with high accuracy. Initially, a descrip-
tion of the faults considered in this work and their physical
meaning takes place. Then, the specifications of the AMI and
the corresponding reporting rate are presented. Finally, based
on this analysis, we derive the functionalities of the FLITC ap-
plication.
2.1. Fault types
As reviewed in [36], failures in electrical grids are catego-
rized based on their origin. Natural-made causes are the most
common type of occurring faults in power grids and mainly
comprise disturbances due to extreme weather phenomena. An-
other category of faults emerges from malfunctions in electric
grid equipment, as well as from human failure. Lastly, a portion
of the occurring faults is associated to man-made hazards, i.e.,
either cyber-attacks that aim to affect the grid integrity or other
forms of intentional attacks.
Based on the fault origin, its severity has a wide range.
On the one hand, a lightning strike can cause instability in the
power flow of the electric grid, while on the other hand, it can
create a localized blackout due to the destruction of the grid
infrastructure. The aftermath of a fault occurring to the grid
might overlap with two or more of the possible causes dis-
cussed above, e.g., a localized blackout might stem from both
a lightning strike and a substation malfunction. This work in-
vestigates a subset of all the possible types of errors. More
specifically, the proposed methodology emphasizes the detec-
tion of single-phase (1Φ), double-phase (2Φ), and three-phase
(3Φ) short-circuits with respect to ground and phase-to-phase
ones. According to [37], the origin of those faults is mainly at-
tributed to physical contact between one or more phases with
the ground (i.e., tree fall, broken insulators, natural phenom-
ena such as lightning storms, hurricanes, floods, heatwaves,
etc.), overloading, corrosion or lack of maintenance of solar
plants and wind turbines, generators’ overheating, short-circuit
of generator rotor windings, etc.
According to the EN 50160 [38], IEEE Std 1159-1995 [39]
and IEEE Std 1250-1995 standards [40], the duration of the oc-
curring fault can range from half a cycle to up to three minutes,
during short interruptions, depending on both the fault cause
and the grid security level. However, most disruptions due to
phase-to-phase or phase-to-ground faults do not exceed 1 s, be-
yond which normal operation is restored. Power grid outages
longer than 3 minutes are usually owned to scheduled mainte-
nance of the grid’s components, or construction works in the
LVDG district.
The frequency of occurrence of each short-circuit fault de-
pends on its type. According to [41], the most common type
is the 1Φshort-circuits with the ground at a rate of 70%, while
around 5% concerns 3Φshort-circuits. The remaining 25% in-
cludes phase-to-phase short-circuits, and double-phase short-
circuits with the ground. For the purposes of this work, it is
permissible to consider that the respective frequency for each
Types Frequency Severity
L-G 70% Low
L-L 7.5% Medium
L-L-G 17.5% Medium
L-L-L 2% High
L-L-L-G 3% High
Table 1: Occurring frequency and severity of different line fault types
of the five types of fault is as follows: 70% for 1Φto ground,
17.5% for 2Φto ground, 7.5% for line-to-line, 3% for 3Φto
ground, and 2% for 3Φfault. When one or two lines contribute
to short-circuits, this creates a significant imbalance in a most
likely already unsymmetrical power grid, where each phase’s
power consumption differs. However, this effect restricts the
power flow of the affected lines when more are involved. Thus,
it is acceptable to assume that the severity of a fault increases
with each contributing phase (Table 1).
2.2. Metering infrastructure
AMI enables the collection of data stemming from the grid’s
buses, such as 3Φvoltage (VRMS ), 3Φcurrent (IR MS ), hence-
forth denoted as 3Φ-Vand 3Φ-I, respectively, and active power
(P), and transmits them to the operators’ data management sys-
tem. Since most of the faults’ duration does not exceed 1 s, the
proposed model is able to simulate faults ranging from a cycle,
i.e., from 20 ms up to 2 s. The AMI should be able to support
transmitting data measurements from each bus. In addition to
that, since data arriving at such a high frequency will be difficult
to be handled, it is considered that the data are sent in packages,
e.g., every 5 or 20 s. Therefore, a framework is created for ev-
ery measurement unit at the LVDG nodes with a sample rate
of 20 ms, which accumulates the information in either 5 s or
20 s intervals. Data with high granularity can be provided by
measuring devices, such as either Phasor Measurement Units
(PMUs) or Power Quality Meters (PQMs), which have a sam-
pling rate of up to 2880 and 100.000 samples/s, respectively.
2.3. FLITC specifications
Based on the above-described problem statement, the fol-
lowing requirements and specifications of the FLITC applica-
tion are derived:
•Data handling and decision timeframe: Because data ar-
riving in high frequency are difficult to be handled, it is
considered that the application shall be capable of receiv-
ing and analyzing data in packages, e.g., every 5 s or 20 s.
Then, a decision (model inference) is taken in real-time
conditions (sub-minute or sub-second scale based on the
dimensions of the grid topology).
•Faulty feeder detection: The model has the capability of
detecting the feeder in which the fault took place and dif-
ferentiating that feeder from the healthy ones,
3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
•Faulty branch detection: The model has the capability of
accurately detecting which branch is the one that the fault
has emerged from,
•Faulty class identification: The model can identify the
type of the short-circuit fault that occurred in the LVDG.
Eleven faulty classes are included: A-G, B-G, C-G, A-
B, B-C, A-C, A-B-G, B-C-G, A-C-G, A-B-C, A-B-C-G,
where A, B, and C denote the corresponding phases and
G the ground,
•Fault branch location: The model can calculate the dis-
tance from the root node.
3. Technical Preliminaries
This section presents the theoretical framework that the FLITC
application is built upon. Initially, the CNN is introduced, which
is the employed DNN architecture and is adept at capturing spa-
tial changes in the input data. They are commonly used in im-
age and video recognition algorithms and have further varia-
tions depending on the specific task. Then, the Dynamic Mode
Decomposition (DMD) method is described, i.e., a completely
data-driven dimensionality reduction technique independent of
the underlying dynamics, firstly proposed in the hydrodynam-
ics domain. Finally, the CWT method is presented that is used
in the data preprocessing stage for data transformation from the
time to the frequency domain.
3.1. Convolutional Neural Networks
CNNs are a widely used architecture of DNNs, which are
primarily used in image and video recognition due to their ad-
vantage of determining spatial features in the given data. CNNs
are feed-forward neural networks whose primary feature is the
convolution of the input data with a shared weight kernel. The
kernels slide across the input data and extract feature matrices
for each convolution. Depending on the datasets’ dimensions,
the CNN’s kernels or filters have a corresponding dimension.
At the same time, the output of their matrix multiplications pro-
duces a feature map with lower dimensions through employing
pooling. This process of consecutive convolution and pooling
layers can continue to the point of creating a vector layer, also
known as a dense layer. The input of a CNN model has dimen-
sions of the following structure:
Dataset Length ×Input Height ×Input Width ×Input Channels
In the case of image classification, for example, the dimen-
sions correspond to the size of the dataset fed into the network,
the picture height in pixels, the picture width in pixels, and the
three-color channels (R, G, B) respectively. Then, a kernel with
dimensions smaller than the above slides through the input and
carries out matrix multiplication. The hyperparameters of ker-
nel size, number of filters, stride, and padding of the kernel are
all set by the user, according to the nature of the dataset. The in-
put of the following stage is the feature map extracted from the
previous one. This is passed through a pooling layer, which re-
duces the dimensionality of the processed data, either by using
max or average pooling, which yields the kernel’s maximum or
average value, respectively. The flattened data from the pooling
layer are fed into dense layers with a depth size determined by
the user.
The selection of the CNNs in this work is based on two fac-
tors: (i) the CNN layer can capture the spatial information in-
herently included in the measured 3Φtimeseries data stemming
from different locations of the LVDG, and (ii) the dimensions
of the grid’s branch data are similar to the CNN model input di-
mensions. Hence, the CNN can classify measured Vtimeseries
data to a particular fault type by extracting data from the grid’s
3Φ.
3.2. Dynamic mode decomposition
A dimensionality reduction technique that computes a set
of modes, where each of them has a fixed oscillation frequency
and a decay/growth rate [42]. These modes and frequencies
are analogous to the normal modes of the system, but more
generally, they are approximations of the modes and eigenval-
ues of the composition operator. Due to the intrinsic temporal
behaviors associated with each mode, DMD differs from di-
mensionality reduction methods such as PCA, which computes
orthogonal modes that lack predetermined temporal behaviors.
Because its modes are not orthogonal, DMD-based representa-
tions can be less parsimonious than those generated by PCA.
However, they can also be more physically meaningful because
each mode is associated with a damped sinusoidal behavior in
time.
Given a time series data Xof size N-by-T where Nis the
number of variables and Tis the number of time steps, then in
any time step t, the first-order vector autoregression takes the
form:
xt=Axt−1+ϵt(1)
where xtdenotes the snapshot vector in time tand the size is
Nx1, Ais the coefficient matrix size of NxN, and ϵtis the error
term. To find a well-behaved coefficient matrix and use it to rep-
resent temporal correlations, we reformulate the above equation
as:
X2≈AX1(2)
Then, we employ the singular value decomposition of the X1to
factorize it, writing:
X1=UΣVT(3)
where Uis consists of left singular vectors, V consists of right
singular vectors and Σis diagonal. We define e
Aas:
e
A=UTX2VΣ−1(4)
The eigenvalues and eigenvectors of the e
Aare then computed
from the following equation:
e
Ay = Λy(5)
Finally, the DMD mode decomposition of the DMD eigenvalue
Λis given by:
Φ = Uy (6)
4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Figure 2: Indicative example of CWT transformation for Vrms timeseries.
This work uses the DMD method to reduce the dimensions of
the input dataset of the CNNs to facilitate the training and in-
ference of the models by decreasing the requirements both for
computational power and memory without significantly affect-
ing the model’s accuracy.
3.3. Continuous Wavelet Transformation
The CWT of a signal s(t), can be defined as:
Z(α, β)=1
√αZ∞
−∞
s(t)Ψ∗(t−β
α)dt (7)
where s(t) is a finite energy signal, Ψ∗is the complex conjugate
of the mother wavelet, and aand bare the scaling and trans-
lational factors of the wavelet, respectively. Large scale values
expand the wavelet in time, revealing low-frequency informa-
tion in the signal, while smaller scale values shrink the wavelet
and reveal high frequencies present in the signal. The CWT is
calculated by continuously varying variables a and b over the
range of scales and length of the signal, respectively.
Across the academia over the last years, many works pro-
vide solutions to classification and regression problems, using
CWT as a preprocessing stage of the CNN models, especially
for applications in the medical engineering domain [43, 44].
In this work, the advantage of this approach is to transfrm the
noisy 3Φ-Vand -Itimeseries signals into the frequency domain
and thus decompose complex information and patterns into el-
ementary forms, thus boosting the ability of the CNN models
to capture the faulty states/occurrences. This work uses Morlet
as a mother wavelet, where an indicative example of this CWT
conversion is illustrated in Fig. 2.
4. FLITC application
4.1. Architecture
The FLITC application fulfills the following functionalities:
•Faulty feeder detection through the employment of a feeder
detection CNN model (FFNN),
•Faulty branch detection through the employment of a branch
detection CNN model (FBNN),
•Faulty type classification in eleven fault classes, as de-
fined in section 2.3, through the employment of a faulty
class detection CNN model (FCNN),
•Faulty distance calculation by estimating the distance from
the root node through the employment of a distance cal-
culation CNN model (FDNN).
The FLITC application’s architecture is illustrated in Fig. 3.
The developed components for the application’s materialization
are input, preprocessing, fault diagnosis, and output.
4.1.1. Input stage
It handles the measurements stemming from the AMI de-
vices in the LVDG. Data are received in 20s intervals in or-
der not to obstruct the AMI communication system and cre-
ate bandwidth issues. The accumulated 20s data, i.e., feeder
3Φ-Imeasurements, and each node 3Φ-Vmeasurements are
segmented into four 5s batches, which are stored until all the
timeseries have been processed. The data are successively for-
warded to the next stage, responsible for the data preprocessing.
4.1.2. Preprocessing stage
The data preprocessing stage is an intermediate function be-
tween the incoming data and the diagnostic tools. It prepro-
cesses the data as follows:
•Cleaning step: Checks the data for any missing measure-
ments and replaces them based on their position in the
input timeseries,
•Grouping step: Separates the Imeasurements from the
Vmeasurements and isolates the latter depending on the
branch that each node belongs to,
•Interpolation step: As firstly introduced in [28], this step
interpolates the measurements of each node to create a
branch with a generalized number of virtual nodes, thus
facilitating the distance estimation functionality by re-
ducing the input vector. These virtual nodes are located
in 5 different locations of each branch, i.e., 0%, 25%,50%,
75%, 100% of the total faulty branch distance,
•Normalization step: Normalizes the data to insert them
in the different models,
Figure 3: Architecture diagram of the FLITC application.
5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Figure 4: Dimensionality (in parenthesis) and CWT transformation of the 3Φ-I
dataset during the preprocessing stage.
•CWT step: Transforms the data from the time domain
to the frequency domain to enhance the performance of
the models. Even though the result of this process is a
32x32 image, the intermediate steps can vary. There-
fore, two different CWT approaches are explored: (i) the
CWT of the timeseries is a 32x250 image, retaining the
original time dimension while extracting the data for the
convolution via the mother wavelet to a new dimension
with equal size to the final one, i.e., 32, (ii) CWT cre-
ates a 250x250 image, expanding the first dimension to
get an orthogonal matrix including further information of
the original timeseries. Each approach is used for differ-
ent datasets according to the volume of information each
CNN model requires. In this work, all the obtained best
models (see section 6) yield better results by employing
the first approach due to the fact that the second one pro-
vides excessive information overwhelming the CNN lay-
ers,
•DMD step: Reduces the dimensionality of the input data
to alleviate problems provoked by memory overloading
during the hyperparameter tuning and training of the mod-
els without significantly deteriorating the models’ effi-
cacy. This work uses DMD for reduction of the 3Φ-V
timeseries dimensionality.
Fig. 4 and 5 illustrate the preprocessing stage for the 3Φ-Iand
3Φ-Vinput datasets, respectively, for the specific use case pre-
sented below in section 5.
4.1.3. Fault diagnosis stage
The DNN blocks are the main computational blocks of the
system since this is where the fault diagnosis of the LVDG is
conducted. Data insert into each module as follows:
•FFNN module is fed with the preprocessed 3Φ-Imea-
surements,
•FBNN module is fed with the preprocessed 3Φ-Vmea-
surements of the feeder classified as faulty in the previous
stage,
•FCNN module is fed with the preprocessed 3Φ-Vmea-
surements of the branch classified as faulty in the previ-
ous stage,
•FDNN module is fed with the preprocessed 3Φ-Vmea-
surements of the branch classified as faulty in the previ-
ous stage.
The FFNN module receives the preprocessed input data and
sends its classification results to the output. If the model pre-
dicts that the state of the LVDG during these 5s is healthy, the
fault diagnosis is concluded. Otherwise, if the model finds
a faulty feeder, the process continues. Since the number of
branches for each feeder can vary, the input shape of the FBNN
is not standard. Thus, within the DNN training stage, models
with feeders of different branches are trained. Therefore, the
number of the DNNs selected from the training stage equals the
number of different branch-size feeders. Depending on which
is the faulty feeder, the corresponding fault branch is fed with
the Vdata. The last stage involves both the FCNN and FDNN
since they are independent and their input data have already
been preprocessed.
Categorical cross-entropy loss function is used in the case
of multi-class classification problems. It is a so f tmax activation
function (Eq. (8)), followed by a cross-entropy loss (Eq. (9)).
f(s)i=esi
PC
jesj
(8)
CE =−
C
X
i
tilog(f(s)i) (9)
Since the labels are in one-hot encoding, only the positive
class Cpkeeps its term in the loss, so there is only one element
of the target vector twhich is not zero, i.e., ti=tp. Hence, by
discarding the zero elements in the summation, the formula of
the categorical cross-entropy is derived as such:
CCE =−log(esp
PC
jesj
) (10)
4.1.4. Output stage
The DNNs output is provided in a comprehensive format
and informs the user about the LVDG state. Specifically, it
concatenates the results of all the four DNNs as follows: (i)
”No Fault Detected” when the model classifies the LV grid as
healthy, and (ii) ”C Fault Detected in (F, B, D)” where C, F, B,
and D denote the faulty class, faulty feeder, faulty branch and
the estimated distance from the root node.
4.2. Modes of operation
Three different modes of operation can be defined for the
FLITC application. Specifically:
6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Figure 5: Dimensionality (in parenthesis) and transformations of the 3Φ-Vdataset during the preprocessing stage for the FFNN (upper) and the FBNN (lower).
4.2.1. Configuration and hyperparameter tuning
The configuration mode concerns the parameters configura-
tion, such as the input vector dimensions of each DNN model
and the number of the CNN models that will be applied for the
branch detection. This mode is executed whenever the FLITC
application uses a new LVDG. The required data for the config-
uration are the amount of LVDG feeders, branches per feeder,
the amount and location of the devices that measure the 3Φ-V
values, and the length of each branch.
After the models’ configuration, the hyperparameter tuning
process is carried out. Hyperparameter tuning is the problem
of optimizing a loss function over a graph-structured configura-
tion space and thus calculating the optimal value of the model
hyperparameters, i.e., the parameters that define the model’s ar-
chitecture and control the learning process. Computing the op-
timal hyperparameter value enhances the overall model’s per-
formance. As described thoroughly in section 3, this work uses
the CNN architecture as a foundation to perform the different
tasks. The hyperparameters that need to be selected in this case
are the batch size, number of hidden layers and units in each
of them, the dropout rate for each layer, the window size in the
incoming data, and the filter size.
However, this multi-dimensional hyperparameter space ren-
ders the methods used to identify the optimal values, such as
the random or grid search, inefficient and time-consuming. To
tackle this issue, the concept of BO has been introduced, where
the number of samples drawn from the hyperparameter search
space is probabilistically guided and reduced, thus allowing for
proper evaluations of the most promising candidates for hyper-
parameters selection [45]. In this work, an automated approach
for hyperparameter tuning of each model is performed based
on the BO, namely, the Tree-structured Parzen Estimator (TPE)
method [46]. The TPE is a sequential model-based optimiza-
tion (SMBO) approach, which sequentially constructs models
to approximate the performance of hyperparameters based on
historical measurements. It then selects new hyperparameters
to test based on this model. This method solves the problem of
dealing with categorical and conditional parameters and there-
fore increases the efficiency of the hyperparameters selection
process [46]. In literature, the TPE algorithm is broadly used in
different domains and applications, such as image processing
[47, 48], load forecasting [49–51], and solar irradiance fore-
casting [52].
The TPE algorithm models p(θ|y) transforming that genera-
tive process by replacing distributions of the configuration prior
with non-parametric densities [46]. By using in each iteration
tdifferent observations (θ(1), θ(2) , .., θ(M)) in the non-parametric
densities, a learning algorithm is generated that can produce a
variety of densities over the configuration space Θ. The TPE
defines p(θ|y) such as:
p(θ|y)=
k(θ),if y<y∗.
l(θ),if y≥y∗.(11)
where k(θ) is the density formed by using observations θ(i), so
that the corresponding loss y=f(θ(i)) is less than y∗and l(θ) is
the density formed by the rest observations. The TPE algorithm
chooses y∗to be some quantile γof the observed yvalues, so
that p(y<y∗)=γ, but no specific model for p(y) is necessary.
An Expected Improvement (EI) is defined as the fraction of the
probability k(θ) divided by l(θ), and is maximized in order to
select the θ∗(t+1) set for the next iteration. The algorithm ter-
minates when the maximum number of iterations is completed.
The runtime of each iteration of the TPE algorithm can scale
linearly in |H|, i.e., the sorted history lists of observed variables,
and linearly in the number of hyperparameters (dimensions) be-
ing optimized [46].
4.2.2. Training and testing modes
After the optimal configuration of the parameters mentioned
above, training is conducted using the stochastic gradient de-
7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Figure 6: Aggregated local consumption and generation.
scent method, specifically the ADAM optimizer, to calculate
the models’ weights and calibrate them on the specific net-
work’s topology and amount of metering devices. After the
completion of the training process, the models’ performance is
evaluated throughout sample data. If the FLITC applications’
performance is acceptable to the operator, the application can
be used in a real operational environment.
4.2.3. Operational mode
During the real operational mode, the FLITC application
can be hosted either in a server of the operator’s control center
or in an edge server located near the LVDG, e.g., the LV trans-
former, which has adequate computational resources to execute
the application. The first implementation has the advantage
of the computing capabilities of the cloud servers. However,
high data bandwidth is needed for the data transmission to the
server with the issues of data drop and security to exist. On
the contrary, the edge implementation secures the LVDG data
from breaches. It requires less transmission bandwidth because
only the output is sent to the control center without transmitting
the LVDG raw data. Regardless of the hosting environment,
the FLITC application can be integrated into a self-healing ap-
plication, offering real-time fault diagnosis and thus allowing
for initiating other grid control functionalities in the LVDG for
fault clearance and restoration of the consumer’s power supply.
5. Use Case
5.1. Simulated Dataset
The dataset used for the CNN models’ hyperparameter tun-
ing, training, and evaluation process is generated in a simulation
environment. For the dataset generation process, 1min−resolution
real data of local consumption and distributed production are
retrieved from the publicly available dataset of Pecastreet Dat-
aport [53]. Fig. 6 illustrates the aggregated profile of the user’s
data. The penetration rate of the renewables for the case study
is 25%. Through the simulations, data interpolation is con-
ducted to replicate the operation of a real LVDG and investigate
a diverse and large number of fault scenarios, which are ab-
sent on already-open source available datasets that comprise on-
field real measurements. Therefore, the simulated dataset can
provide adequate information containing all the possible short-
circuit-related faults that can occur in an LVDG. Furthermore,
the generated data also include noise introduced by the measur-
ing devices and the load and generation variability; replicating
the operation of a common unbalanced, radial high-RES pene-
trated active LVDG.
The original LVDG, upon which this simulation model is
based, is a Portuguese one given in [54] and shown in Fig. 7.
A radial LVDG encompasses multiple feeders and secondary
branches starting from an MV grid equivalent. Each node of
the described network is integrated with a device that measures
3Φ-V, while the main feeders of the network monitor both the
3Φ-Vand -Imeasurements. A variable length distribution line
is added between each node, emulating distribution losses over
LVDGs. Furthermore, the consumer loads consist of variable
parallel-connected resistive and inductive loads. Lastly, local
distributed generation is simulated by AC Vsources with vari-
able nominal power rates.
The flowchart for data generation, illustrated in Fig. 8. It
uses as input the fault characteristics of each simulation sce-
nario, as well as the consumers’ loads and the local production
of the distributed generation. The consumers’ reactive power
loads Qare calculated using a power coefficient equal to arccos ϕ=
0.95. In this model, during the fault event of each scenario, the
resistance Rbetween the 3Φand the ground is drawn from a
Log-Normal distribution, with an average value ranging from
50Ωto 2kΩ. Fault duration is also calculated using a Weibull
distribution, with a duration ranging from 20ms to approxi-
mately 2s. Fig. 9 illustrates the duration and resistance his-
tograms of the generated faults. It is noteworthy that the fault
initialization is a uniformly distributed variable among the length
of the simulation.
5.2. Baseline models
For the validation of our method, we are comparing it with
the state-of-the-art models regarding fault diagnosis and anomaly
detection domains, which are:
Figure 7: Topology of the LVDG used as a testbed.
8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Figure 8: Flowchart of the dataset generation process.
•LSTM Autoencoder for unsupervised learning for detec-
tion of the outliers, i.e., the short-circuit faults in our case.
This technique is utilized in several domains for anomaly
detection, such as supply chain management [55] or net-
work [56], where the model learns to forecast the sig-
nal of healthy states, then reconstructs the training data
and calculates the Mean Absolute Error (MAE) for each
training sample. The maximum MAE value is considered
a threshold. If the reconstruction loss for a test sample is
greater than this threshold value, then we can infer that
the model is seeing an unfamiliar pattern, and thus char-
acterizing it as a fault,
•Convolutional LSTM (ConvLSTM) for all the stages of
the FLITC application, i.e., FFNN, FBNN, FCNN, and
FDNN. We leverage ConvLSTM, which can consider the
spatiotemporal characteristics of the data [57] and outper-
form MLP [58]. In this work, the data stemming from the
measuring devices inherently contain the grid’s topology
characteristics.
5.3. Performance evaluation
For the evaluation of the model, a set of metrics is used to
elucidate each layer’s performance from different perspectives.
Firstly, the confusion matrix is utilized for the case of a multi-
class classification DNN model. Specifically, it depicts a matrix
of n×ndimensions, where n denotes the number of classes the
DNN is trained for. Across the matrix rows lie the true labels of
the test data, while across the columns are the predicted ones.
The diagonal of the matrix showcases the correct classifications
of the DNN models. For binary confusion matrices, where there
are only two classes to select from, each value of the matrix
belongs to one of the following categories:
•True positives (TP): an outcome where the model cor-
rectly predicts the positive class,
•True negatives (TN): an outcome where the model cor-
rectly predicts the negative class,
•False positives (FP): an outcome where the model incor-
rectly predicts the positive class,
•False negatives (FN): an outcome where the model incor-
rectly predicts the negative class.
Beyond the confusion matrix of the model, the accuracy,
precision, recall, and F1-score of the DNN are calculated. Ac-
curacy is the fraction of all correctly classified values over the
entire dataset. On the other hand, precision is defined as the
fraction for a particular class of the correctly classified values
by the total number of predicted ones. Recall is the fraction of
correctly predicted values by all the true labels. Lastly, F1-score
is the harmonic mean of the precision and recall. The formulas
of the above-described metrics are:
Accuracy =T P +T N
T P +T N +F P +FN (12)
Precision =T P
T P +FP (13)
Recall =T P
T P +FN (14)
F1-score =2Precision ×Recall
Precision +Recall (15)
For the regression problem, the Root Mean Square Error
(RMSE) metric is used, i.e., a quadratic scoring rule that cal-
culates the average magnitude of the error. It is the square root
of the average of squared differences between prediction and
actual observation given by the formula:
RMS E =sPN
i=1(xi−x′
i)2
N(16)
where iare the different training samples, Nis the amount of
training data points, xiis the actual fault distance and x′is the
estimated by model distance. Since the errors are squared be-
fore they are averaged, the RMSE gives a relatively high weight
to large errors. This means the RMSE should be more use-
ful when large errors are particularly undesirable. This applies
directly to the FLITC case, due to the fact that its primary ob-
jective is to reduce the time that the operators’ inspection teams
spend to find the exact fault location.
5.4. Computational environment
For the dataset generation process, the simulation model
is developed in the Matlab-based Simulink graphical program-
ming environment. The FLITC application models are devel-
oped in python, by using the TensorFlow library with Keras as
9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Figure 9: Histograms of resistance and duration values of the generated faults.
Table 2: Derivation of the best models for the FLITC application from the hyperparameter tuning process.
FFNN FBNN-1 FBNN-2 FBNN-3 FCNN FDNN
Hidden Layers 2 1 1 2 4 3
Hidden Layer Units [128, 160] 224 192 [80, 240] [144, 80, 144, 96] [192, 48, 208]
Regularization Dropout Dropout, L2 Dropout, L2 Dropout Dropout, L2 Dropout
Dropout Rate [0.05, 0.025, [0.05, 0.025, [0.05, 0.025, [0.05, 0.025, [0.05, 0.025, 0.0167, [0.05, 0.025,
0.0167, 0.0125] 0.0167] 0.0167] 0.0167, 0.0125] 0.0125, 0.01, 0.0083] 0.0167, 0.0125, 0.01]
L2 Regularization Factor - 0.01 0.01 - 0.01 -
Accuracy (%) 90.49 85.4 93.77 95.02 94.93 -
Loss (%) 29.62 52.7 26.92 15.13 21.37 0.12
F1-score (%) 62.36 - 93.90 16.22 - 88.78 83.75 - 95.56 94.91 - 95.25 89.87 - 99.10 -
RMSE (%) - - - - - 2.45
Mean epoch time (s) 1.82 1.24 3.7 0.73 9.6 2.52
an API. For hyperparameter tuning, the HyperOpt, i.e., an open-
source library for large-scale automated machine learning, is
used [59]. HyperOpt gives the capability of conducting hyper-
parameter tuning through the utilization of the TPE algorithm.
All the computations were conducted on an AlmaLinux server
with 8 CPU cores with a total of 16 GB RAM and an NVIDIA
RTX A6000 GPU card with a total RAM capacity equal to 48
GB.
6. Results
6.1. FLITC performance
Table 2 includes the best models for each stage of the FLITC
application, as derived from the hyperparameter tuning process.
The validation accuracy for the best models is 90.76% for the
FFNN, 85.4%, 93.77%, 95.02% for each feeder model, namely
FBNN-1, FBNN-2, and FBNN-3, respectively, and 94.93% for
the FCNN. The RMSE value is equal to 2.45 for the FDNN
model. The most computationally intensive derived model is
the FCNN, which consists of 4 hidden layers and has a mean
training time for each epoch equal to 9.6s.
Fig. 10 illustrates the confusion matrices for the best three
FFNN models derived from the hyperparameter process. As it
can be thoroughly observed, even though the model with the
highest accuracy is the one in the upper left, it exhibits worse
performance in detecting the healthy states compared to the
lower depicted model (precision value for healthy state 84.97%
and 90.62%, respectively). For the fault diagnosis process, it
is of crucial importance the distinction between the healthy and
faulty states. Therefore, the sensitivity of the FLITC applica-
tion, i.e., the recall metric for binary classification, to identify
the faulty occurrences is equal to 91.4% and the specificity is
equal to 90.62%. In addition, as illustrated in Fig. 11, the per-
formance of the FFNN model is identical regardless of the fault
duration. On the other hand, Fig. 12 depicts that the fault diag-
nosis performance significantly decreases, as long as the fault
resistance increases. This is mainly caused due to the inability
of the models to capture the high resistance faults, which pro-
duce smaller variations to the magnitude of the grid variables
and have less cascading effects on neighboring branches and
feeders.
Fig. 13 illustrates the confusion matrix for the best FBNN
model concerning the feeder with the four branches. The low
accuracy of that model compared to the rest of the FBNN mod-
els is mainly attributed to its inability to classify the faults of the
second branch correctly due to its small length (the existence of
only one node). This can be considered as a limitation of the
model. Other models derived from the hyperparameter tuning
process have better accuracy in detecting the faults for that par-
ticular branch, however, their overall accuracy was lower.
10
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Figure 10: Confusion matrices for the best FFNN models. The upper left is the best model with a total accuracy of 90.5%. The upper right is the second best model
with accuracy equal to 89.32%, and the lower one is the third best model with accuracy equal to 88.74%.
Figure 11: Average test performance for the different fault duration ranges of
the best FFNN models.
Fig. 14 presents the confusion matrix for the FCNN model.
The 2Φand 3Φto ground faults are grouped in the same cate-
gory as the ones without ground. As it can be seen, the model
exhibits almost flawless performance in classifying the 2Φand
3Φfaults, except for the A-B fault, where all the misclassified
faults are categorized as 1ΦB to ground faults, probably due
to the network asymmetry and the fact that 1ΦB has the most
Figure 12: Average test performance for the different classes of fault resistance
of the best FFNN models.
connected loads. For the 1Φfaults, the accuracy of the FCNN
model drops due to the misclassification of the faults to the rest
1Φfault categories.
Fig. 15 illustrates the density function of the absolute er-
ror, normalized by the maximum distance, as calculated from
the FDNN model. It can be seen that the computed normal-
ized error values loosely follow the beta distribution. The blue
11
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Figure 13: Confusion matrix of the FBNN model for the first feeder.
Figure 14: Confusion matrix for the best FCNN model.
line represents the beta distribution with parameters a=0.52 and
b=329.77, which have the best goodness-of-fit ratio, i.e., the
lowest Residual Sum of Squares (RSS) value (95.52), among
other 89 univariate distributions using the library from [60].
The calculated fault distances from the FDNN model are be-
low the 7.21% of the normalized branch distance from the root
node with a 95% confidence. Therefore, we can conclude that
the FDNN model exhibits high performance providing valuable
assistance to the maintenance crew of the operators to locate the
fault correctly.
6.2. Comparison to benchmark models
We conduct evaluation of the proposed FLITC application,
compared to other benchmark models, summarized below:
•Fault detection: A comparison takes place between the
FLITC application’s ability to detect the faulty states,
in comparison with an unsupervised method that is used
Figure 15: Density function of the observed error values calculated from the
FDNN model.
broadly across the academia for fault diagnosis and anomaly
detection, namely the LSTM Autoencoder. Our proposed
model exhibits slightly worse sensitivity than the LSTM
Autoencoder in detecting the faulty states (91.6% com-
pared to 91.4%),
•Feeder and branch identification: For the faulty feeder
detection (if fault exists) the best hyperparameter opti-
mally tuned ConvLSTM model (batch size=200, units=
180, dropout rate=0.2, hidden layers=4, window size=
50, and filter size=256) achieves an accuracy of 87.12%,
which is lower than the 90.49% achieved by the FFNN
model of the FLITC application. For the branch identifi-
cation, our model outperforms all the ConvLSTM mod-
els in all the three different feeders, i.e., 85.4% >67.43%,
93.76% >74.65%, and 95.02% >76.42%.
•Faulty class: Our model outperforms the best hyperpa-
rameter optimally tuned ConvLSTM model, which is used
as multi-class classifier. Particularly, the best accuracy
achieved by the ConvLSTM (batch size=150, units=160,
dropout rate=0.2, hidden layers=3, window size=25
and filters=64) is equal to 83.1%, which is significantly
lower than the 94.93% achieved by the FLITC applica-
tion.
•Distance calculation: Our model outperforms the best hy-
perparameter optimally tuned ConvLSTM model, which
is used as a regressor. Particularly, the lowest RMSE
value achieved by the ConvLSTM (batch size=200, units=
80, dropout rate=0.2, hidden layers=2, window size=
25 and filters=64) is equal to 21.54%, i.e., significantly
lower than the 2.52% attained by the FLITC application.
6.3. Sensitivity analysis
To explore the robustness of the FLITC application, we con-
duct a thorough analysis to quantify the performance of the best
12
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Figure 16: Testing accuracy of the FBNN and FCNN models for different levels
of measuring devices availability rate.
models in LVDGs against different availability rates of measur-
ing units. For a particular network topology, we define the unit
availability rate as the fraction of nodes in which 3Φ-Vmeasur-
ing units exist, divided by the total number of nodes (we con-
sider that 3Φ-Imeasurements devices in the root of each feeder
exist in all the cases). This sensitivity study is essential for dis-
tribution system operators’ planning and innovation strategies
since it provides the minimum requirements for infrastructure
upgrades so as to integrate automatic fault diagnosis and self-
healing practices.
In the results presented in section 6.1, we quantified the per-
formance of the FLITC application for an availability rate of
100%. Fig. 16 showcases the performance of the models (ex-
cept for the FBNN, which uses 3Φ-Ivalues) for measuring unit
availability rates ranging from 30% to 100%. As it can be seen,
there is no significant drop in the models’ accuracy if the avail-
ability rate is higher than 60%. There is a trend in most of the
models to have better accuracy with higher availability rates.
However, the slight variations in the different models’ perfor-
mance can be subject to the aleatoric uncertainty [61], which is
introduced due to the variation in the input data (sampling data
to create the different versions of the measuring units availabil-
ity rate) and thus it cannot be concluded that their performance
significantly deteriorates. For the FDNN model, the normalized
error limit for the 95% confidence interval reaches up to 23%
for measuring availability rate of 30%, compared to the 7.21%
for 100%.
6.4. Application limitations
The DNN-based approach followed in this work, has three
limitations that are discussed in this section. First, it requires
a large amount of data for their training, which are often not
available, especially when it comes to real data. To the authors’
knowledge, there is no publicly available dataset for fault diag-
nosis studies in LVDGs, due to the fact that structural and in-
frastructure upgrades have taken place in LVDGs over the last
years. Thus, obtaining curated high-resolution datasets from a
real operational environment is not yet feasible. Since real data
are unavailable, our method is based on synthetic data, which
in turn requires a deep knowledge of the grid parameters to ap-
proximate faulty conditions, as accurately as possible. In sec-
tion 5, we have thoroughly presented the analysis of the data
generation process while making publicly available the source
code. Second, due to the supervised nature of the FLITC appli-
cation, the models’ accuracy is guaranteed neither in fault cases
that are not included in the training phase nor when changes in
the LVDGs topology/measuring infrastructure occur. The pre-
processing stage employed in this work, particularly the inter-
polation and dimensionality reduction techniques, increases the
generalizability of the model and its performance robustness,
with the cost of increasing also the complexity and the com-
putational time of the models. Last, the authors recognize the
limitations of the model to detect faults in mesh topologies that
night exist even at the edge of the LVDGs.
7. Conclusions
The primary focus of this paper is the selection of efficient
data-driven DNN-based methods to create an application that
is able to be used as a fault diagnostic tool for smart LVDGs.
Cornerstones of the proposed FLITC application are the CWT
and CNNs models, which are able to handle the big amount of
data stemming from the measuring units and detect with high
accuracy the fault patterns. Furthermore, by using the TPE al-
gorithm for conducting exploration of the most suitable hyper-
parameters for a specific aspect of the fault diagnosis tool, i.e.,
the faulty feeder, branch, class, and distance models. The re-
sults showcased its efficacy in providing fine-grained and ac-
curate fault diagnosis analytics to the system operators, i.e., an
accuracy of 91.4% for fault detection, of 93.77% for correct
branch identification, of 94.93% for fault type classification,
and RMSE value of 2.45% for location calculation. Further
work could be conducted to make the FLITC application ap-
plicable in mesh network topologies and explore both privacy-
preserving and resource-constraint methods for fault diagnosis
in LVDG leveraging the computation resources at the edge.
References
[1] A. E. Salda ˜
na-Gonz´
alez, A. Sumper, M. Arag ¨
u´
es-Pe˜
nalba, M. Smol-
nikar, Advanced distribution measurement technologies and data ap-
plications for smart grids: A review, Energies 13 (14) (2020).
doi:10.3390/en13143730.
URL https://www.mdpi.com/1996-1073/13/14/3730
[2] S. Barja-Martinez, M. Arag¨
u´
es-Pe˜
nalba, ´
I. Munn´
e-Collado, P. Lloret-
Gallego, E. Bullich-Massagu´
e, R. Villafafila-Robles, Artificial intel-
ligence techniques for enabling big data services in distribution net-
works: A review, Renewable and Sustainable Energy Reviews 150 (2021)
111459.
[3] L. Cipcigan, P. Taylor, Investigation of the reverse power flow require-
ments of high penetrations of small-scale embedded generation, IET Re-
newable Power Generation 1 (3) (2007) 160–166.
[4] M. Liserre, T. Sauter, J. Y. Hung, Future energy systems: Integrating re-
newable energy sources into the smart power grid through industrial elec-
tronics, IEEE industrial electronics magazine 4 (1) (2010) 18–37.
13
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
[5] E. Shittu, A. Tibrewala, S. Kalla, X. Wang, Meta-analysis of the strategies
for self-healing and resilience in power systems, Advances in Applied
Energy (2021) 100036.
[6] S. A. Arefifar, Y. A.-R. I. Mohamed, T. H. El-Fouly, Comprehensive op-
erational planning framework for self-healing control actions in smart
distribution grids, IEEE Transactions on Power Systems 28 (4) (2013)
4192–4200.
[7] E. Shirazi, S. Jadid, Autonomous self-healing in smart distribution grids
using agent systems, IEEE Transactions on Industrial Informatics 15 (12)
(2018) 6291–6301.
[8] M. Ramos, M. Resener, P. Oliveira, D. P. Bernardon, Self-healing in
power distribution systems, in: Smart Operation for Power Distribution
Systems, Springer, 2018, pp. 37–70.
[9] J.-H. Teng, A direct approach for distribution system load flow solutions,
IEEE Transactions on power delivery 18 (3) (2003) 882–887.
[10] J. Mora-Florez, J. Mel ´
endez, G. Carrillo-Caicedo, Comparison of
impedance based fault location methods for power distribution systems,
Electric power systems research 78 (4) (2008) 657–666.
[11] R. Salim, K. Salim, A. Bretas, Further improvements on impedance-based
fault location for power distribution systems, IET Generation, Transmis-
sion & Distribution 5 (4) (2011) 467–478.
[12] F. Aboshady, D. Thomas, M. Sumner, A new single end wideband
impedance based fault location scheme for distribution systems, Electric
Power Systems Research 173 (2019) 263–270.
[13] P. N. Ayambire, Q. Huang, D. Cai, O. Bamisile, P. O. K. Anane, Real-time
and contactless initial current traveling wave measurement for overhead
transmission line fault detection based on tunnel magnetoresistive sen-
sors, Electric Power Systems Research 187 (2020) 106508.
[14] M. A. Aftab, S. S. Hussain, I. Ali, T. S. Ustun, Dynamic protection of
power systems with high penetration of renewables: A review of the trav-
eling wave based fault location techniques, International Journal of Elec-
trical Power & Energy Systems 114 (2020) 105410.
[15] A. Zidan, M. Khairalla, A. M. Abdrabou, T. Khalifa, K. Shaban, A. Ab-
drabou, R. El Shatshat, A. M. Gaouda, Fault detection, isolation, and ser-
vice restoration in distribution systems: State-of-the-art and future trends,
IEEE Transactions on Smart Grid 8 (5) (2016) 2170–2185.
[16] A. Bahmanyar, S. Jamali, A. Estebsari, E. Bompard, A comparison frame-
work for distribution system outage and fault location methods, Electric
Power Systems Research 145 (2017) 19–34.
[17] R. Agrawal, D. Thukaram, Identification of fault location in power dis-
tribution system with distributed generation using support vector ma-
chines, in: 2013 IEEE PES Innovative Smart Grid Technologies Con-
ference (ISGT), IEEE, 2013, pp. 1–6.
[18] W. Fei, P. Moses, Fault current tracing and identification via machine
learning considering distributed energy resources in distribution net-
works, Energies 12 (22) (2019) 4333.
[19] N. Sapountzoglou, J. Lago, B. Raison, Fault diagnosis in low voltage
smart distribution grids using gradient boosting trees, Electric Power Sys-
tems Research 182 (2020) 106254.
[20] A. Yadav, A. Swetapadma, A novel transmission line relaying scheme
for fault detection and classification using wavelet transform and linear
discriminant analysis, Ain Shams Engineering Journal 6 (1) (2015) 199–
209.
[21] F. Perez, E. Orduna, G. Guidi, Adaptive wavelets applied to fault classifi-
cation on transmission lines, IET generation, transmission & distribution
5 (7) (2011) 694–702.
[22] M. Shafiullah, M. A. Abido, Z. Al-Hamouz, Wavelet-based extreme
learning machine for distribution grid fault location, IET Generation,
Transmission & Distribution 11 (17) (2017) 4256–4263.
[23] H. Jiang, J. J. Zhang, W. Gao, Z. Wu, Fault detection, identification, and
location in smart grid based on data-driven computational methods, IEEE
Transactions on Smart Grid 5 (6) (2014) 2947–2956.
[24] R. Tervo, J. Karjalainen, A. Jung, Predicting electricity outages caused by
convective storms, in: 2018 IEEE Data Science Workshop (DSW), IEEE,
2018, pp. 145–149.
[25] L. Souto, J. Mel´
endez, S. Herraiz, Fault location in low voltage smart
grids based on similarity criteria in the principal component subspace, in:
2020 IEEE Power & Energy Society Innovative Smart Grid Technologies
Conference (ISGT), IEEE, 2020, pp. 1–5.
[26] D. Chakraborty, U. Sur, P. K. Banerjee, Random forest based fault clas-
sification technique for active power system networks, in: 2019 IEEE
International WIE Conference on Electrical and Computer Engineering
(WIECON-ECE), IEEE, 2019, pp. 1–4.
[27] Y. D. Mamuya, Y.-D. Lee, J.-W. Shen, M. Shafiullah, C.-C. Kuo, Appli-
cation of machine learning for fault classification and location in a radial
distribution grid, Applied Sciences 10 (14) (2020) 4965.
[28] N. Sapountzoglou, J. Lago, B. De Schutter, B. Raison, A generalizable
and sensor-independent deep learning method for fault detection and loca-
tion in low-voltage distribution grids, Applied Energy 276 (2020) 115299.
[29] P. Rai, N. D. Londhe, R. Raj, Fault classification in power system distri-
bution network integrated with distributed generators using cnn, Electric
Power Systems Research 192 (2021) 106914.
[30] W. Li, D. Deka, M. Chertkov, M. Wang, Real-time faulted line local-
ization and pmu placement in power systems through convolutional neu-
ral networks, IEEE Transactions on Power Systems 34 (6) (2019) 4640–
4651.
[31] R. H. Salim, K. R. C. de Oliveira, A. D. Filomena, M. Resener, A. S. Bre-
tas, Hybrid fault diagnosis scheme implementation for power distribution
systems automation, IEEE Transactions on Power Delivery 23 (4) (2008)
1846–1856.
[32] Z. Galijasevic, A. Abur, Fault location using voltage measurements, IEEE
Transactions on Power Delivery 17 (2) (2002) 441–445.
[33] P. Stefanidou-Voziki, N. Sapountzoglou, B. Raison, J. Dominguez-
Garcia, A review of fault location and classification methods in distri-
bution grids, Electric Power Systems Research 209 (2022) 108031.
[34] H. Falaghi, M.-R. Haghifam, M. O. Tabrizi, Fault indicators effects on
distribution reliability indices, in: CIRED 2005-18th International Con-
ference and Exhibition on Electricity Distribution, IET, 2005, pp. 1–4.
[35] V. R. Vasilis Rizeakos, A. B. Athanasios Bachoumis, FLITC-application.
URL https://github.com/tombax7/FLITC-application
[36] A. Mar, P. Pereira, J. F. Martins, A survey on power grid faults and
their origins: A contribution to improving power grid resilience, Ener-
gies 12 (24) (2019). doi:10.3390/en12244667.
URL https://www.mdpi.com/1996-1073/12/24/4667
[37] J. Hare, X. Shi, S. Gupta, A. Bazzi, Fault diagnostics in smart micro-
grids: A survey, Renewable and Sustainable Energy Reviews 60 (2016)
1114–1124. doi:https://doi.org/10.1016/j.rser.2016.01.122.
[38] H. Markiewicz, Voltage characteristics of electricity supplied by public
distribution systems (Jun 1999).
[39] Ieee recommended practice for monitoring electric power quality, IEEE
Std 1159-1995 (1995) 1–80doi:10.1109/IEEESTD.1995.79050.
[40] Ieee guide for service to equipment sensitive to momen-
tary voltage disturbances, IEEE Std 1250-1995 (1995).
doi:10.1109/IEEESTD.1995.122634.
[41] J. J. Grainger, Power system analysis, McGraw-Hill, 1999.
[42] P. J. Schmid, Dynamic mode decomposition of numerical and experimen-
tal data, Journal of fluid mechanics 656 (2010) 5–28.
[43] A. Meintjes, A. Lowe, M. Legget, Fundamental heart sound classifica-
tion using the continuous wavelet transform and convolutional neural net-
works, in: 2018 40th annual international conference of the IEEE engi-
neering in medicine and biology society (EMBC), IEEE, 2018, pp. 409–
412.
[44] R. Miao, Y. Gao, L. Ge, Z. Jiang, J. Zhang, Online defect recognition
of narrow overlap weld based on two-stage recognition model combining
continuous wavelet transform and convolutional neural network, Comput-
ers in Industry 112 (2019) 103115.
[45] H.-P. Nguyen, J. Liu, E. Zio, A long-term prediction approach based on
long short-term memory neural networks with automatic parameter op-
timization by tree-structured parzen estimator and applied to time-series
data of npp steam generators, Applied Soft Computing 89 (2020) 106116.
[46] J. Bergstra, R. Bardenet, Y. Bengio, B. K´
egl, Algorithms for hyper-
parameter optimization, Advances in neural information processing sys-
tems 24 (2011).
[47] L. F. Rodrigues, M. C. Naldi, J. F. Mari, Comparing convolutional neu-
ral networks and preprocessing techniques for hep-2 cell classification
in immunofluorescence images, Computers in biology and medicine 116
(2020) 103542.
[48] S. F. Chevtchenko, R. F. Vale, V. Macario, F. R. Cordeiro, A convolutional
neural network with feature fusion for real-time hand posture recognition,
Applied Soft Computing 73 (2018) 748–766.
[49] J. Lago, F. De Ridder, P. Vrancx, B. De Schutter, Forecasting day-ahead
electricity prices in europe: the importance of considering market inte-
14
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
gration, Applied energy 211 (2018) 890–903.
[50] F. He, J. Zhou, L. Mo, K. Feng, G. Liu, Z. He, Day-ahead short-term
load probability density forecasting method with a decomposition-based
quantile regression forest, Applied Energy 262 (2020) 114396.
[51] M. N. Fekri, H. Patel, K. Grolinger, V. Sharma, Deep learning for load
forecasting with smart meter data: Online adaptive recurrent neural net-
work, Applied Energy 282 (2021) 116177.
[52] J. Lago, K. De Brabandere, F. De Ridder, B. De Schutter, Short-term fore-
casting of solar irradiance without local telemetry: A generalized model
using satellite data, Solar Energy 173 (2018) 566–577.
[53] Dataport, Pecan Street Inc. (Nov 2020).
URL https://www.pecanstreet.org/dataport/
[54] N. Sapountzoglou, Detection et localisation des defauts dans les resaux de
distribution basse tension en presence de production decentralisee, Ph.D.
thesis, Universit´
e Grenoble Alpes (ComUE) (2019).
[55] H. Nguyen, K. P. Tran, S. Thomassey, M. Hamad, Forecasting and
anomaly detection approaches using lstm and lstm autoencoder tech-
niques with the applications in supply chain management, International
Journal of Information Management 57 (2021) 102282.
[56] M. Said Elsayed, N.-A. Le-Khac, S. Dev, A. D. Jurcut, Network anomaly
detection using lstm based autoencoder, in: Proceedings of the 16th ACM
Symposium on QoS and Security for Wireless and Mobile Networks,
2020, pp. 37–45.
[57] S. Xingjian, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, W.-c. Woo,
Convolutional lstm network: A machine learning approach for precipita-
tion nowcasting, in: Advances in neural information processing systems,
2015, pp. 802–810.
[58] W. Luo, W. Liu, S. Gao, Remembering history with convolutional lstm
for anomaly detection, in: 2017 IEEE International Conference on Multi-
media and Expo (ICME), IEEE, 2017, pp. 439–444.
[59] J. Bergstra, D. Yamins, D. Cox, Making a science of model search: Hy-
perparameter optimization in hundreds of dimensions for vision architec-
tures, in: International conference on machine learning, PMLR, 2013, pp.
115–123.
[60] E. Taskesen, distfit - Probability density fitting (1 2020).
URL https://erdogant.github.io/distfit
[61] J. Gawlikowski, C. R. N. Tassi, M. Ali, J. Lee, M. Humt, J. Feng,
A. Kruspe, R. Triebel, P. Jung, R. Roscher, et al., A survey of uncertainty
in deep neural networks, arXiv preprint arXiv:2107.03342 (2021).
15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65