Content uploaded by Fanghao Tian
Author content
All content in this area was uploaded by Fanghao Tian on Jul 02, 2024
Content may be subject to copyright.
Optimizing DC Inductor Design with Air Gap for
Triangular Excitation: A Reinforcement Learning
Approach
1st Fanghao Tian
Dept. of Electrical Engineering (ESAT)
KU Leuven - EnergyVille
Diepenbeek - Genk, Belgium
fanghao.tian@kuleuven.be
2nd Hans Wouters
Dept. of Electrical Engineering (ESAT)
KU Leuven - EnergyVille
Diepenbeek - Genk, Belgium
hans.wouters@kuleuven.be
3rd Xiaobing Shen
Dept. of Electrical Engineering (ESAT)
KU Leuven - EnergyVille
Diepenbeek - Genk, Belgium
xiaobing.shen@kuleuven.be
4th Wilmar Martinez
Dept. of Electrical Engineering (ESAT)
KU Leuven - EnergyVille
Diepenbeek - Genk, Belgium
wilmar.martinez@kuleuven.be
Abstract—Magnetic components play a crucial role in power
electronics systems and exhibit a significant influence on their
overall efficiency and performance. This paper presents an
intelligent and fast method for optimizing the design of DC
inductors. The proposed system is capable of generating a design
for a DC inductor with an airgap within a time frame of less
than one second, given the operating conditions of a triangular
excitation current. The proposed approach is specifically founded
in the Deep Deterministic Policy Gradient (DDPG) algorithm, one
of the reinforcement learning (RL) techniques, which facilitates
the agent’s ability to interact with its environment and acquire
knowledge through iterative interactions. By using a single
universal DDPG model, the magnetic design process with variant
operating points can be substantially accelerated. The results
indicate that the DDPG algorithm can effectively achieve optimal
magnetic component design, demonstrating the potential of RL
as a valuable tool for automating power electronics design.
Index Terms—inductor, reinforcement learning, design opti-
mization, DDPG
I. INTRODUCTION
The rapid evolution of power electronics has revolutionized
energy generation, storage, and consumption, playing a pivotal
role in various critical applications spanning renewable energy
systems, electric vehicles, and advanced power supplies [1].
The performance of these systems depends heavily on the
design of magnetic components, including transformers and
inductors, which facilitate the storage and transfer of energy
across various parts of the system. Efficiently designing mag-
netic components is, therefore, essential for power electronic
systems [2]. In doing so, the overall power converter perfor-
mance can be drastically improved.
Reinforcement learning (RL) stands out as a distinct arti-
ficial intelligence (AI) algorithm with the primary objective
of training the agent with the ability to learn autonomously
by interacting with the environment, altering its choices in
response to rewards, and ultimately guiding the agent towards
the most beneficial strategy to achieve the optimal reward.
One of RL’s most notable advantages is that it does not
require any labelled data for training, which has attracted the
interest of researchers and led to its integration into various
applications in contemporary power and energy systems [3],
which demonstrates its capability in complicated operating
and control problems. In the meantime, there has been a
rise in interest in applying RL in power electronics. Specific
applications include the optimization of the parameters design
of power converters [4], [5], allowing for rapid adjustments to
changing operational conditions. Additionally, the efficiency of
a three-level active neutral point clamped inverter is optimized
by RL in [6], as well as dual active bridge converters [7], [8].
In addition, RL can also be implemented for electromagnetic
interference (EMI) and power integrity improvements [9].
This paper focuses on the Deep Deterministic Policy Gra-
dient (DDPG), an advanced RL algorithm introduced in [10].
The literature indicates the proficiency of RL-based algorithms
in swiftly obtaining the optimal design parameters according
to the design objectives when the operating conditions change.
The design process for DC inductors is a complex proce-
dure, necessitating a careful balance among various factors,
including the shape, material, and dimensions of the magnetic
core and winding wire, number of turns, air gap, and desired
inductance value, among others. One of the primary challenges
in the design of such inductors is the reduction of power loss
while simultaneously preventing flux saturation. Furthermore,
the need to consider different operational conditions, such
as frequency, duty cycle, and temperature, may necessitate a
reassessment of the optimal design.
Given the complexity of inductor design and the superior
performance of RL, this paper proposes a DDPG-based opti-
mization methodology specifically for the design of air-gapped
DC inductors under triangular excitation waveforms. This
approach seeks to automate the identification of ideal design
configurations for EE core-based inductors using litz wire
windings. The results suggest that the DDPG algorithm can
offer a more efficient and automated approach to improving
the design of magnetic components.
II. DC-INDUCTOR LOSS MODEL
A. General Design Procedure
Designing an inductor is a complicated process involving
numerous variables, such as the shape, material, and dimen-
sions of the magnetic core and the winding wire choices. Ini-
tially, given the operating conditions of the inductor, including
the frequency, duty cycle and amplitude of the square voltage
waveform and triangular excitation current, the required in-
ductance value is determined. Secondly, the effective magnetic
cross-section area Aeand effective magnetic path length leare
determined once the dimensions of the EE core are defined.
Thirdly, the required reluctance ℜmcan be calculated by (1)
given the number of winding turns N.
ℜm=N2
L(1)
More specifically, the flux path through the core and airgap
both contribute to ℜmby (2),
ℜm=lg
µ0Ae
+le
µ0µrAe
(2)
where leis flux path length through the core, µ0and µrin-
dicate the permeability of the air and the relative permeability
of magnetic material, respectively.
In addition, the fringing effect should be considered. This
effect signifies the bulging of the flux in the air gap. Because
of this phenomenon, there is a decrease in reluctance since
the flux traverses a broader area. Consequently, the inductance
escalates by a factor denoted as Ffas delineated in [11] and
specified by equation (3).
Ff= 1 + lg
√Ae
ln 2G
lg
(3)
where Ffindicates the fringing factor and winding window
length is indicated by G.
Moreover, several constraints need to be considered. One is
the window area constraints, in which the cross area of the
winding cables should be smaller than the window area.
nAe winding < kAw(4)
where Ae winding is the crossing area of a single winding
cable and Awis the window area. Since winding cables cannot
be compressed perfectly in the window, an effective coefficient
kis used here. Another constraint is the saturation flux density.
Given a triangular current waveform, the flux density through
the core can be modelled as (5).
Bmax =Bdc +1
2Bac =nIavg
ℜmAe
+1
2
LIpp
nAe
(5)
The maximal flux density has to be smaller than the
saturation flux density of the material. As is shown above, the
maximal flux density is highly relevant to the current. Adding
the air-gap can reduce the total flux density but increase the
power loss.
B. Power Loss Model
The inductor losses are mainly generated by magnetic
core due to hysteresis losses [12], and winding wires due to
proximity effect and skin effect [13].
The calculation of core losses presents a considerable
challenge due to the inherent nonlinear characteristics of
magnetic materials. Traditionally, the estimation of losses is
frequently carried out using the Steinmetz equation, which
is a well-established but simplified model [12]. Later on,
the enhanced version of the generalised Steinmetz equation,
referred to as the improved generalised Steinmetz equation
(iGSE) [14], is proposed as shown in (6). It improves the
loss estimation by integrating the flux density considering the
material properties, resulting in a more accurate prediction of
core losses. The adoption of the iGSE is intended to facilitate
a more comprehensive comprehension and precise calculation
of core losses.
Pcore =1
Tsw ZTsw
0
ki
dB
dt
α
(∆B)β−αdt (6)
where Tsw represents the fundamental period of the excitation
signal. The term dB
dt stands for the rate of change of flux
density with respect to time, a critical factor in understanding
the dynamic behavior of magnetic fields during the switch-
ing process. Additionally, ∆Bdenotes the total variation in
flux density occurring within one complete switching cycle.
Furthermore, kiis calculated by (7),
ki=k
(2π)α−1R2π
0|cosθ|α2β−αdθ (7)
where k, α, and βare Steinmetz coefficients extracted from
the core material loss map.
Given a specific triangular waveform, the iGSE equation
can be simplified further as (8).
PLcore =ki
Tsw TswD∆B
TswDα
(∆B)β−α+
(1 −D)Tsw ∆B
(1 −D)Tsw α
(∆B)β−α(8)
where Dstands for the duty cycle of the excitation current.
However, the open-source database MagNet, constructed
by practical experimental core loss data, can provide a more
accurate loss estimation [15]- [17]. By testing various core ma-
terials under various operating conditions, MagNet can further
improve the accuracy of core loss estimation. In addition, an
Artificial Neural Network (ANN) is constructed to realize a
Power Loss
Hidden LayersInput Layer
Frequency
Binary Coding
of Materials
Flux Density
Duty Cycle
Period
Output Layer
Fig. 1. ANN for Core Loss.
fast core loss estimation based on the experimental data in
their research. In this paper, the same method is utilized for
an accurate core loss prediction. Fig. 1 illustrates the structure
of the ANN, where frequency, flux density, the duty cycle of a
triangular waveform, and temperature are inputs, whereas the
power loss density is the output.
C. Winding Loss Model
The winding losses consist of DC and AC parts, as shown
in (2)
PLcopper =RDC I2
LDC +RAC ∆I2
L(9)
The DC resistance is defined by the temperature-dependent
resistivity, length and cross area of the winding wire. In the
meantime, the Dowell equation is utilized for modelling the
winding losses of both round wire and Litz wire caused by
skin and proximity effects in this paper [18]. The ratio of AC
resistance to DC resistance of a Litz wire is in (3).
FR=Rw str
Rw dc
=Astr
sinh(Astr) + sin(2Astr )
cosh(Astr) + cos(2Astr )+
Astr
2(N2
lk−1)
3
sinh(Astr)−sin(Astr )
cosh(Astr) + cos(Astr )
(10)
III. RL-BASE D OPTIMAL DESIGN
As aforementioned, RL shows the capability of obtaining
the optimal design parameters while considering various op-
erating conditions simultaneously in the literature. As a result,
the trained agent can generate the optimal design in various
conditions.
Reward r(s,a)
Action a
Update State
Agent
Environment
st st+1
Fig. 2. Framework of RL.
The RL operates on an iterative loop, illustrated in Fig. 2.
Within this loop, the agent initiates actions that trigger state
updates according to the rules defined by the environment.
Meanwhile, the environment also evaluates the previous state
and its action, providing a reward as feedback accordingly.
The training objective is to maximize cumulative rewards,
ultimately enabling the agent to discover an optimal strategy
for achieving the highest possible reward.
Various RL algorithms have been developed to address
different environmental and action scenarios, including deep
Q-learning, policy gradient, and actor-critic. Deep Q-learning
is suitable for discrete action spaces, while policy gradient
methods are designed to handle continuous action spaces
by making decisions based on action probabilities. Actor-
critic algorithms combine both approaches, with the ”actor”
determining probabilistic actions and the ”critic” providing
value-based assessments of state transitions. In this paper,
we employ the deep deterministic policy gradient algorithm
(DDPG), an actor-critic method that leverages the strengths of
deep Q-learning algorithms.
A. Deep Deterministic Policy Gradient
DDPG was originally proposed in 2016 as it enhanced
the functionality of basic actor-critic algorithms [10]. There
are four neural networks (NN) in DDPG, including the actor
online NN µ, the actor target NN µ′, the critic online NN Q,
and finally the critic target NN Q′.
In the proposed framework shown in Fig. 3, the actor NNs
are in charge of the generation of actions, while the valuation
of state values is performed by the critic NNs. The training
for both the actor and critic NNs is conducted simultaneously.
This concurrent methodology not only facilitates the actor in
refining its policy update strategies but also empowers the
critic NN to generate consistently rational and quantitatively
reliable evaluations.
The training of DDPG follows the baseline of the replay
memory scheme. Initially, each loop, formed by state, action,
reward and updated state, is stored in a small database as
replay memory. The data utilized for training the four NNs is
a mini-batch from the replay memory. It is worth noting that
the two target NNs get updated much slower to keep the old
data and, as a result, avoid the training getting stuck into local
optimal.
In a more detailed perspective, the training strategies for
both the actor and critic neural networks diverge significantly.
The actor neural network’s online training adheres to the
policy gradient technique. This process is dedicated to maxi-
mizing the performance objective, denoted as J, by leveraging
the gradient’s estimation, which is acquired from N samples
of mini-batch, as expressed in Equation (11).
▽θµJ≈1
NX
i▽aQ(s, a |θQ)|s=si,a=µ(si)▽θµµ(s|θµ)|si
(11)
where ▽aQ(s, a |θQ)indicates the gradient on action aof
the online critic NN output, in which ais the action that is
Sample
Environment
Noise
𝑎=𝜇(𝑠𝑡 |𝜃𝜇)
𝑎𝑡
𝑆𝑜𝑓𝑡 𝑢𝑝𝑑𝑎𝑡𝑒 𝜃𝜇′
U𝑝𝑑𝑎𝑡𝑒 𝜃𝜇
U𝑝𝑑𝑎𝑡𝑒 𝜃Q
𝑆𝑜𝑓𝑡 𝑢𝑝𝑑𝑎𝑡𝑒 𝜃Q’
∇𝑎 𝑄(𝑠t,𝑎|𝜃𝑄)
∇𝜃𝜇 𝜇(𝑠𝑡 |𝜃𝜇)
𝑎=𝜇(𝑠𝑡 |𝜃𝜇)
𝑎’=𝜇’(𝑠𝑡+1 |𝜃𝜇’)
𝑄(𝑠t,𝑎|𝜃𝑄)
𝑄(𝑠t,𝑎|𝜃𝑄)
r+𝛾 𝑄’(𝑠𝑡+1,𝑎’|𝜃𝑄’)
(𝑠𝑡,𝑎𝑡,𝑟,𝑠𝑡+1)
Optimizer
ACTOR CRITIC
Optimizer
Policy Gradient
∇θ J
MSE Loss
L(θ )
Mini Batch
of Size N
Target Actor µ’
θ
Replay
Memory
µ’
Online Actor µ
θµ
µ
Online Critic Q
θQ
Q
Target Critic Q’
θQ’
+
+
-
Fig. 3. Schematic of the DDPG.
obtained from the online actor NN. ▽θµµ(s|θµ)|si, indicates
the gradient of the actor NN parameters.
The critic online NN, on the other way, is updated by
minimizing the loss function as is shown in (12).
L(θ) = 1
NX
i
[ri+γQ′(si+1 , µ′(si+1|θµ
′
)|θQ
′
)−Q(si, ai|θµ)]2
(12)
where rshows the reward, µ′(si+1|θµ
′
)and
Q′(si+1, µ′(si+1 |θµ
′
)are the output of target actor NN
and target critic NN respectively. Notably, target actor NN’s
output contributes as target critic NN’s input to generate the
value Q′. Lastly, Q(si, ai|θµ)shows the output of the online
critic NN.
As for the target NNs, they update partially based on the
actor NNs, following a soft updating rule as (15).
(θµ
′
←τθµ+ (1 −τ)θµ
′
θQ
′
←τθQ+ (1 −τ)θQ
′(13)
where the coefficient τ= 0.01. In this way, target NNs
update at a much slower rate.
B. Inductor Design Optimization
In magnetic component design, critical parameters include
the shape, dimensions, and the number of turns. Given the
desired inductance value and the excitation waveform, the
winding wire is chosen based on the current. An EE core
is demonstrated here, the dimensions of which are defined by
four lengths, as is shown in Fig. 4.
In this paper, some constraints among the dimension pa-
rameters are defined as (14).
B≤A
C≤A/2
D⩽0.3A
(14)
Fig. 4. Schematic of EE Core with air-gap.
The parameters to be trained by DDPG are in Table I.
TABLE I
PARAMETERS TO BE TRAINED
Predefined Parameters
Voltage (square)
Current (triangular)
Frequency
Inductance
Core Material
Winding wire
Volume Constraint
Design Parameters
Dimensions
Number of turns
Airgap length
During the training process, operating conditions are set
as constant, whereas the design parameters are explored by
DDPG in the search space. To maximize the efficiency, the
reward is defined as (15) when the constraints are not ex-
ceeded.
r= 100/Ploss (15)
Fig. 5. Core Loss Experiment Setup.
where Ploss indicates the power loss. In this case, the reward
will reach its maximum value when power loss is minimal. A
penalty reward of -10 is applied when it exceeds the constraints
on the maximal flux density of the core material. Lastly, when
the constraints are not exceeded but the core volume is larger
than the limit, a penalty is applied as (16).
r= 0.1(Vcore −Vcore limit )(16)
IV. RES ULT S
A case study of a 200 uH design operating at 100 kHz and
a duty cycle of 0.5 is conducted, using material N87 ferrite
and 90×0.100mm litz wire. Simulations of 3000 episodes
with 20 steps in each episode are conducted. Volume limit
is randomly chosen in a range of 15∼50 as one of the states
at the beginning of each episode. The training process takes
around 7.5 hours on a laptop with an Intel i7 9th generation
CPU. The average reward in each episode is provided in Fig.
6.
Fig. 6. Average Reward in Each Episode of DDPG Training.
The results indicate the average accumulated reward during
the training process of the DDPG algorithm increases grad-
ually. In the beginning, the average reward is negative when
the agent explores the environment in which the major part
of the searching space returns the negative reward due to the
penalty factor. After 50 episodes of exploring, the memory
scheme reaches its full capacity, which triggers the four NNs
parameters training process. Gradually, the agent is trained
to avoid the penalty and update the state towards the maximal
reward. Since the volume limit is randomly determined in each
episode, the maximal reward varies in different volume limits.
V. EX PE RI ME NTAL VALIDATION
The inductor design is validated when the volume limit
equals 18 cm². Based on the recommended design based on
DDPG output, an N87 core with the most similar dimension is
chosen for the prototype, which is a 42/21/15 core. A litz wire
of 90 strands with a strand diameter of 0.100 mm is chosen
for winding. Firstly, the inductance value and DC resistance
are measured with an impedance analyzer. Secondly, the core
losses are measured by the methodology mentioned in [15],
which adopts a secondary winding on the centre leg of the
core. A square waveform excitation is applied to the primary
side while the current of the primary side and voltage of the
secondary side are measured, as is shown in Fig. 5.
Fig. 7. Prototype for Testing
Finally, the prototype with 34 turns, as shown in Fig.7, is
tested in a buck converter in the given operating condition to
measure the total loss. To validate the optimal design, the other
4 prototypes of respectively 32, 33, 35 and 36 turns featuring
the same inductance value are tested in a comparative study.
The preliminary results are shown in Fig. 8.
n=32 n=33 n=34 n=35 n=36
Number of Turns [-]
1
1.5
2
2.5
3
3.5
4
Power Losses [W]
Total Loss Sim. Core Loss Sim. Winding Loss Sim.
Total Loss Exp. Core Loss Exp. Winding Loss Pred.
Fig. 8. Comparison of Inductor Loss when Adopting Different Turns (0.2
mH, 100 kHz).
The preliminary results show that the optimal design from
DDPG has a lower total loss. It can be concluded that more
turns will increase the winding losses, whereas the core losses
are reduced because of the reduced flux fluctuation. However,
It is worth noting that core loss is simulated and tested in
the condition of 0 DC bias, whereas the inductor is operated
with a DC bias. Knowing that DC bias has an impact on core
losses [15], this could affect the optimal design selection. In
addition, the eddy current current loss near the airgap is not
considered in the model yet, which also causes some errors.
VI. CONCLUSION
A DDPG-based magnetic optimization approach for design-
ing air-gapped EE-core inductors for various operating points
of triangular excitation waveforms is proposed in this paper.
The primary results demonstrate that the DDPG algorithm
can automate the search for optimal design configurations.
Given the design requirements with the trained DDPG neural
network, an optimal design can be generated within one
second. As a result, it accelerates the magnetic component
design of air-gapped EE-core under triangular excitation from
hours to seconds, thereby enhancing the overall performance
and efficiency of power electronic systems. In the future, the
DDPG model can be further expanded with more operating
conditions involved as training variables, such as voltage and
current waveform, frequency, and the choices of windings;
more computation resources will be required accordingly.
REFERENCES
[1] A. Nabih, R. Gadelrab, P. R. Prakash, Q. Li and F. C. Lee, ”High Power
Density 1 MHz 3 kW 400 V-48 V LLC Converter for Datacenters with
improved Core Loss and Termination Loss,” 2021 IEEE Applied Power
Electronics Conference and Exposition (APEC), Phoenix, AZ, USA,
2021, pp. 304-309.
[2] M. Kacki, M. S. Ryłko, J. G. Hayes and C. R. Sullivan, ”A Practical
Method to Define High Frequency Electrical Properties of MnZn Fer-
rites,” 2020 IEEE Applied Power Electronics Conference and Exposition
(APEC), New Orleans, LA, USA, 2020, pp. 216-222.
[3] D. Cao and et al, “Reinforcement learning and its applications in modern
power and energy systems: A review,” Journal of Modern Power Systems
and Clean Energy, vol. 8, no. 6, pp. 1029–1042, 2020.
[4] G. Kruse, D. Happel, S. Ditze, S. Ehrlich, and A. Rosskopf, “Param-
eter optimization of llc-converter with multiple operation points using
reinforcement learning,” 2023.
[5] F. Tian, D. B. Cobaleda, H. Wouters and W. Martinez, ”Parameter
Design Optimization for DC-DC Power Converters with Deep Re-
inforcement Learning,” 2022 IEEE Energy Conversion Congress and
Exposition (ECCE), Detroit, MI, USA, 2022, pp. 1-7.
[6] J. Wang, R. Yang, and Z. Yao, “Efficiency optimization design of three-
level active neutral point clamped inverter based on deep reinforcement
learning,” in 2022 IEEE 6th Conference on Energy Internet and Energy
System Integration (EI2), 2022, pp. 605–610.
[7] Y. Tang and et al, “Deep reinforcement learning-aided efficiency opti-
mized dual active bridge converter for the distributed generation system,”
IEEE Transactions on Energy Conversion, vol. 37, no. 2, pp. 1251–1262,
2022.
[8] Y. Tang, W. Hu, J. Xiao, Z. Chen, Q. Huang, Z. Chen, and F. Blaabjerg,
“Reinforcement learning based efficiency optimization scheme for the
dab dc–dc converter with triple-phase- shift modulation,” IEEE Trans-
actions on Industrial Electronics, vol. 68, no. 8, pp. 7350–7361, 2021.
[9] J. Kim, S. Jeong, J.-B. Kim, and J. D. Ihm, “Automatic spice- inte-
grated reinforcement learning for decap optimization for emi and power
integrity,” in 2022 IEEE International Symposium on Electromagnetic
Compatibility Signal/Power Integrity (EMCSI), 2022, pp. 565–569.
[10] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D.
Silver, and D. Wierstra, “Continuous control with deep reinforcement
learning.” in ICLR, 2016.
[11] C.W.T, McLyman, ”Transformer and Inductor Design Handbook (4th
ed.),” CRC Press, 2011.
[12] C. Steinmetz, “On the law of hysteresis,” Proceedings of the IEEE, vol.
72, no. 2, pp. 197–221, 1984.
[13] D. Elizondo, E. L. Barrios, A. Ursua, and P. Sanchis, “Analytical
modeling of high-frequency winding loss in round-wire toroidal induc-
tors,” IEEE Transactions on Industrial Electronics, vol. 70, no. 6, pp.
5581–5591, 2023.
[14] K. Venkatachalam, C. Sullivan, T. Abdallah, and H. Tacca, “Accurate
prediction of ferrite core loss with nonsinusoidal waveforms using only
steinmetz parameters,” in 2002 IEEE Workshop on Computers in Power
Electronics, 2002. Proceedings., 2002, pp. 36–41.
[15] H. Li, D. Serrano, T. Guillod, E. Dogariu, A. Nadler, S. Wang, M. Luo,
V. Bansal, Y. Chen, C. R. Sullivan, and M. Chen, “Magnet: An open-
source database for data-driven magnetic core loss modeling,” in 2022
IEEE Applied Power Electronics Conference and Exposition (APEC),
2022, pp. 588–595.
[16] E. Dogariu, H. Li, D. Serrano, S. Wang, M. Luo and M. Chen, ”Transfer
Learning Methods for Magnetic Core Loss Modeling,” IEEE Workshop
on Control and Modeling of Power Electronics (COMPEL), Cartagena
de Indias, Colombia, 2021.
[17] H. Li, S. R. Lee, M. Luo, C. R. Sullivan, Y. Chen and M. Chen,
”MagNet: A Machine Learning Framework for Magnetic Core Loss
Modeling,” IEEE Workshop on Control and Modeling of Power Elec-
tronics (COMPEL), Aalborg, Denmark, 2020.
[18] R. P. Wojda and M. K. Kazimierczuk, “Winding resistance and power
loss of inductors with litz and solid-round wires,” IEEE Transactions on
Industry Applications, vol. 54, no. 4, pp.3548–3557, 201.