Content uploaded by Simon Stieber
Author content
All content in this area was uploaded by Simon Stieber on Sep 04, 2023
Content may be subject to copyright.
Content uploaded by Simon Stieber
Author content
All content in this area was uploaded by Simon Stieber on Jan 17, 2023
Content may be subject to copyright.
Control of RTM processes through Deep
Reinforcement Learning
Simon Stieber∗, Leonard Heber∗, Christof Obertscheider†, Wolfgang Reif∗
∗Institute for Software & Systems Engineering, University of Augsburg, Augsburg, Germany
Email: {simon.stieber, leonard.heber, wolfgang.reif}@uni-a.de
†University of Applied Sciences Wiener Neustadt, Wiener Neustadt, Austria
Email: christof.obertscheider@fhwn.ac.at
Abstract—Resin transfer molding (RTM) is a composite man-
ufacturing process that uses a liquid polymer matrix to create
complex-shaped parts. There are several challenges associated
with RTM. One of the main challenges is ensuring that the
liquid polymer matrix is properly distributed throughout the
composite material during the molding process. If the matrix
is not evenly distributed, the resulting part may have weak or
inconsistent properties. This is the challenge we tackle with the
approach presented in this work. We implement an online control
using deep reinforcement learning (RL) to ensure a complete
impregnation of the reinforcing fibers during the injection phase,
by controlling the input pressure on different inlets. This work
uses this self-learning paradigm to actively control the injection
of an RTM process, which has the advantage of depending on a
reward function instead of a mathematical model, which would
be the case for model predictive control. A reward function is
more straightforward to model and can be applied and adapted
to more complex problems. RL algorithms have to be trained
through many iterations, for which we developed a simulation
environment with a distributed and parallel architecture. We
show that the presented approach decreases the failure rate from
54 % to 27 %, by 50 % compared to the same setup with steady
parameters.
Index Terms—Reinforcement Learning, Resin Transfer Mold-
ing, Control
I. INT ROD UC TI ON T O RTM AND CONTROL
Fiber-reinforced plastics (FRP) are a composite material,
whereby conventional plastics are raised to a higher mechani-
cal level by the introduction of reinforcing fibers. Components,
especially when reinforced with carbon fiber, are very light
and can withstand large forces in the direction of the fibers.
These properties can be exploited to save weight and increase
stiffness compared to conventional components made of steel
or aluminum [1]. Further, they render this material especially
interesting for the aviation and automotive industry since
weight savings lead to lower energy consumption, and thus
FRP are a key to a more environmentally friendly mobility.
The RTM process (resin transfer molding) [2] is a widely
used industrial process for manufacturing such components.
Characteristic is the usage of a closed mold whose shape
resembles the final part. The fibers are used in form of a textile,
which for example can be woven. Multiple layers of textile can
be stacked to build a textile preform, which is placed in the
mold. Next, a liquid polymer matrix, i.e. the resin, is injected
into the mold via one or several inlets. The border between
already impregnated areas and dry areas is called the flow
front. An irregular spreading of the flow front can lead to
an air entrapment in the worst case, which results in a dry
spot. These disturbances stem from the irregular nature of the
textile and lead to an incomplete impregnation of the textile,
which reduces the stability of the manufactured component.
This leads to a high reject rate, which can make the produc-
tion of FRP components uneconomical and unecological. By
controlling the injection pressure at the individual inlets, the
flow front can be influenced in order to ensure a uniform,
complete and fast impregnation of the textile. One difficulty
in designing such a controller is the nonlinear and complex
relationship between the injection pressure and the spreading
of the resin [3]. This complicates the classical expert-based
controller design, which is based on the formation of a
mathematical model of the controlled system. Therefore, data-
driven approaches using various machine learning models have
been developed in related work [3]–[5]. Initial successes have
been achieved, resulting in a more regular spreading of the
resin. In this work, we use Reinforcement Learning (RL) to
Fig. 1. Setup overview: The agent receives an observation and a reward and
chooses an action. The simulation acts as the environment which takes the
action and returns the next observation, containing the flow front image, the
fiber volume content (FVC) map, and the pressure image.
optimize the injection pressure in an RTM process. A system
trained by an RL algorithm is called an agent, that learns
to interact with its environment through trial and error. In
doing so, it is driven by rewards it receives in response to
its actions, which it tries to maximize [6]. The experiments
for this work were performed using a simulation of the RTM
process. Running a real machine would be too expensive
and burdensome to perform the number of runs necessary
1
to train the RL agents. In addition, a distributed and parallel
architecture was implemented to take advantage of available
computational resources during training. The overall approach
is depicted in Fig. 1.
II. RE LATE D WOR K
Previous work on the optimization of the RTM process
can be divided into passive and active methods. Passive
methods are used to optimize various process parameters in
advance. Szarski et al. [7] used RL to optimize fluid flow in a
process similar to RTM by determining the placement of flow
enhancers a priori.
The active, or online, methods are applied during the process
to control certain parameters based on real-time measurements.
Thus, unforeseen disturbances can be reacted to. The approach
of this work can be classified as an active method, where the
measurement is an image of the flow front, inter alia, and the
variable to be controlled, the actuator, is the pressure profile
applied to the resin inlets. Demirci and Coulter [4] trained an
artificial neural network (NN) using a numerical simulation to
predict the flow front position at the next discrete time step
from the flow front and the injection pressure. They defined
an optimal flow front as the control target. The output of the
controller is the pressure profile that minimizes the difference
between the flow front position predicted by the model and
the predefined target flow front. Nielsen and Pitchumani also
trained an NN to predict the future fluid advancement by
using a numerical simulation [8] and optimized the pressure
profile with simulated annealing. In real experimental setups,
the resulting flow front was found to sufficiently approximate
the desired one. Wang et al. [3] designed a model predictive
controller (MPC) with a flow front specification. They use an
autoregressive model with exogenous input (ARX) to account
for the nonlinear characteristics of the RTM process. The
parameters of the model can be identified online, i.e. at run-
time, by a recursive least squares method. The controller was
tested on a rectangular plate component, with an additional
textile layer added to create an obstacle to the flow front. A
response to the disturbance was evident in the applied pressure
profiles. Those works have in common that they measure the
performance of their approach by measuring the difference of
the actual flow front from an optimal flow front, which would
be a perpendicular line in the linear injection from one side
case. They determine the output of their controller using the
learned model of the flow front behaviour for optimization at
each step. Our work differs from those approaches because
model-free RL approximates functions that map measured
values directly to the next action. Therefore, no model of the
process is needed and no elaborate optimization needs to be
performed at each step. Further, we optimize to reduce dry
spots and thus reduce the failure rate of the process. The form
of the flow front is a auxiliary factor in our proposed reward
function (cf. Section IV-B).
III. MET HO DS
A. Reinforcement Learning
The RL algorithms employed here, A2C and PPO belong
to the model free methods, learning policies solely from ex-
perienced state transitions and received rewards while having
no prior knowledge of the environment’s dynamics nor trying
to model them. Advantage actor-critic (A2C) [9] implements
the actor-critic pattern by using the advantage function as the
critic. The advantage function estimates the advantage that
is gained from taking a certain action in a certain state. By
optimizing the policy with respect to the advantage estimate,
the probability to take actions, that lead to high rewards, rises.
Proximal policy optimization (PPO) [10] works similarly to
A2C, but introduces a surrogate objective to replace the simple
advantage estimate. It has been found that too large policy
updates can cause instabilities. That led to the development
of trust-region methods that limit the policy update per step,
which results in a more monotonic improvement [11]. PPO
uses a clipping mechanism, which is more efficient, easier to
implement, and broader applicable than former trust region
methods.
B. Simulative Environment
For our application, as it is common for Deep Reinforce-
ment Learning (DRL) models, training in a real-world setup
is not feasible, since a large number of iterations are required,
which would result in many costly experiments. Therefore
we used a numerical simulation of the resin flow to provide
a simulative environment resembling the RTM process. We
implemented a simulation based on RTMSim [12] to meet our
requirements. The resin flow through a porous medium, i.e. a
textile can be described with Darcy’s law, shown in equation 1.
−→
v=−K
η· ∇p(1)
The flow speed −→
vis proportional to the permeability Kof the
preform, the pressure gradient ∇pand inversely proportional
to the dynamic viscosity η. Flow speed and pressure gradient
are functions of time and position inside the preform. In the
considered application, preform permeability is only a function
of position and resin viscosity is constant. Before the filling
starts, there is 0.1 bar initial pressure inside the preform.To be
able to execute comparable experiments, some assumptions
about process parameters had to be made. Their values were
chosen to be constant within realistic magnitudes but would
vary, if, for example, other types of resin or textile were used.
We assume ηto be 0.1Pas and Kto be isotropic, meaning its
value is the same in every flow direction. The exact value of
Kis varied between experiments and will be explained later
on.
RTMSim applies a finite area method (FAM) to solve the
resin flow on the whole part, which introduces the need
for temporal and spatial discretization. While the simulation
requires comparably small time steps to yield numerically
stable solutions, we chose a much larger step size of 0.5s
- or a frequency of 2Hz - for the RL cycle. This reduces
2
0 10 20 30 40 50
Length in cm
0
10
20
30
40
50
Width in cm
Inlet 1
Inlet 2
Inlet 3
Inlet Cells
0 10 20 30 40 50
Length in cm
0
10
20
30
40
50
Width in cm
FVC Values
Fig. 2. Left: Three different inlet cell groups that can be actuated indepen-
dently are depicted in different colors. Right: FVC contents in different areas
of the preform. Patch with higher FVC.
the computational burden, which is necessary to efficiently
train agents. Therefore, per RL step, a multitude of simulation
steps is executed. In the spatial domain, we use a mesh
that consists of 1878 triangle elements and models a planar
quadratic part, as depicted in Fig. 2. The part has a side length
of 50 cm and a thickness of 0.5cm. Three equally wide and
independently controllable resin inlets are placed on the left
side. In order to simulate irregularities in the textile preform
and provoke perturbations of the flow front, inserts are placed
on the part. An insert is an area whose fiber volume content
(FVC) - and thus the permeability - deviates from the standard
value. An experimentally determined relation was used for
calculating permeability from porosity. Standard values in the
main preform are 35% and 1.464×10−9m2. In related work, it
has become common to use rectangular inserts to provoke and
analyze perturbations of the flow front [3], [13]. The position,
the dimension, and the degree of FVC deviation determine
how hard the control task is. During training, the placement
is drawn from a random distribution, which is subject to
certain constraints in each experiment that will be explained
in section IV-D.
Special requirements, that defer from commonly used sim-
ulation tools such as PAM RTM [14] are the possibility to
change the injection pressure and obtain the state of the
simulation at any point in simulated time. This ability is crucial
for online control and needs to be possible without terminating
the executing process to achieve high efficiency and stability.
We created a lightweight program by stripping down the
implementation of RTMSim to the specific case we need to
simulate. This enabled us to run multiple parallel instances of
the simulation on a multi-node compute cluster. Distributed on
9 servers our architecture can provide 279 virtual instances of
the RTM process for an agent to train with while using one
additional server for mid-training validation. A training run
consists of 2,000,000 steps, which took on average 5hours
and 29 minutes, including validation. This equals 60,000 to
100,000 filling cycles for agents to gain experience from,
depending on the average episode length of each experiment.
IV. EXP ER IM EN TS
In the following the experimental setup including the RL
controller and RL hyperparameters, and the actual experiment
plan are described.
A. RL Controller
The agent interacts with all parallel environments in a
synchronous manner and the experiences are batched and
accumulated similar to stochastic gradient descent. The inter-
action with every single environment is similar to how an agent
would be integrated into the RTM process as a controller, as
can be shown in Fig. 1. At each discrete step the agent receives
an observation and chooses an action. The action consists of
three integer values that control the injection pressure at the
three resin inlets, which can be set to five discrete equidistant
levels between 0.1and 5bar.
We experimented with three different observation spaces to
evaluate which physical quantities yield the most value for
the agent. The considered quantities are the filling state, the
preform FVC and the pressure inside the tool. The filling
state represents the spreading of the resin, which is to be
optimized and therefore included. The flow speed and thus
the flow front is, as stated by Darcy’s law, influenced by
the permeability and the pressure gradient. We substitute the
permeability information with the FVC of the textile, because
of our assumptions of the textile’s isotropy. We provide the
agents with a simple map (50 ×50 pixel grayscale image) of
the pressure inside the mold. While the FVC remains constant
throughout one filling cycle, the filling state and pressure
evolve over time. The simplest observation space containing
only the flow front image is from now on called Ff, adding
the FVC map gives the observation space FfFvc and adding
the pressure image leads to FfFvcP.
A filling cycle consists of a series of interactions and ends
when one terminal condition is fulfilled. This can be either the
occurrence of a dry spot or the complete filling of the part,
each of which will trigger a specific reward signal contributing
to the reward function described in Section IV-B. During
training, finished simulation instances are automatically reset
with randomized initial conditions.
B. Reward Function
The main purpose of our reward function (Eq. (2)) (that is
based on the flow front image only) is to reward the complete
filling of the part and punish the occurrence of dry spots.
r(o) = a·filled
tsim
−b·dryspot−
c·1
h·R·
R−2
X
i=1
h
X
j=1
(R−i)·(1 −oi,j )2(2)
filled and dryspot take either values of 0or 1and indicate if
the respective event has occurred. Both can only be triggered
in terminal states because the environments automatically reset
in either case. The input to the reward function is the flow front
image o, while the notation oi,j refers to the pixel in the i-th
3
row and j-th column. his the height of o, which is 50 in our
experiments, and Ris the column index of the rightmost pixel
that has been reached by the resin. tsim counts elapsed time
since the beginning of the episode and the weighting factors
have been chosen as a= 3000,b= 100 and c= 10 in prior
experiments. Apart from the sparse reward mechanism, which
only yields a signal - either positive or negative - in terminal
states, we added an auxiliary goal to guide the agent to the
desired behavior: the flow front uniformity, introduced in the
third term. In our setup, the optimal flow front would be an
orthogonal line moving from left to right. When evaluating
the reward function, we place this target flow front at R.
Then the MSE between oand this target, weighted with each
pixel’s distance to R, is calculated, with the exception that the
two columns nearest to the target line are excluded. Thereby,
small irregularities have a comparably small impact on the
reward signal, whereas deeper bulges are weighted quadrati-
cally higher. This motivates the agent to keep the flow front
as even as possible, which is a good step toward preventing
dry spots. Two mechanisms implicitly add the incentive to
finish episodes quickly. By weighting filled inversely with
nsteps, the agent receives a higher reward at the end of short
episodes. In order to maximize the rewards accumulated over
an episode, the agent should finish the episode in as few steps
as possible, while still seeking to fill the part completely by
avoiding dry spots.
C. RL Parameters
We used the package stable-baselines3 [15], which, inter
alia, provides implementations of A2C and PPO with pre-
tuned hyperparameters.We adjusted the parameter n_steps
to 20, which sets the number of steps to include per policy
update. This improved the agents’ performance, while for the
other parameters, no changes were found to be beneficial.
Regarding the NN trained by the algorithms, we used the
same architecture in all experiments. Parts of the network are
shared between the policy and the value function. The shared
part consists of three convolutional layers followed by a feed
forward layer of width 128. The first convolutional layer uses
32 kernels of size 8×8with stride 4, the second 64 kernels of
size 4×4with stride 2, and the third 64 kernels of size 3×3
with stride 1. This convolutional network architecture was used
by Mnih et al. [9] in an influential work on the application of
DRL and an implementation is provided by stable-baselines3.
Next, the network splits into two heads, each containing a feed
forward layer of width 32. The output of the policy contains
three values and represents the action of the agent, while the
value network predicts one value. We used the Adam optimizer
and ReLU as activation function in all layers.
D. Experiments
The experiment series contains six experiments resulting
from combining the three possible observation spaces with the
two considered algorithms. During training, the insert parame-
ters are drawn from experiment-specific random distributions,
which will be explained in this section. We evaluate the agents
TABLE I
SET UP OF E XP ERI MEN TS
# and shape of insert 1Rect.
Height / Width of insert [cm] 15 ±1
FVC preform / patch [%] 35 /45
Perm. preform / patch [m2]1.464 ×10−9/2.268 ×10−10
on a test set of 100 parts, that was created according to the
same distribution as used in training. We provoked strong
perturbations, up to the formation of dry spots. The agent
tends to have greater control of the flow front in the first
third of the part than in areas far from the resin inlets, which
can be explained by Darcy’s law, according to which the flow
velocity is proportional to the pressure gradient. The pressure
outside the flow front is equal to the initial cavity pressure.
Consequently, the pressure gradient reduces as the flow front
propagates. Therefore the inserts are placed randomly in the
left third or 15 cm from the right of the part in the different
setups to test whether an RL agent is in principle able to
prevent the formation of dry spots. The inserts are of square
shape and have a side length of 15 ±1cm. The inserts have a
FVC of 45 % and a permeability of 2.268 ×10−10 m2. They
were placed keeping a 5cm spacing from all edges, except the
right edge, where it was necessary to keep a broader space.
A static injection process leads to strong irregularities of the
flow front as soon as it reaches an insert. In 54 % of the cases
within the test set, this leads to a dry spot, causing the cycle
to be prematurely terminated. The amount of change in FVC
necessary to provoke dry spots was determined experimentally.
The results of these experiments are described in section V.
V. R ES ULTS
In the following chapter, the results of these experiments
are presented to assess whether RL can provide an advantage
in controlling the RTM process. A common metric in RL
applications is the average accumulated reward per episode [6].
Another measure is the average number of steps per episode.
From an economic point of view, it is desirable to achieve the
shortest possible cycle time of the RTM process. However,
very short episodes can also mean that filling cycles were
aborted early because a dry spot was detected. Therefore, when
considering the episode length, the specific data set must be
analyzed to determine whether and how often dry spots occur.
In such cases, the adjusted average episode length can be used,
which considers only successful filling cycles. In addition, the
rate of failed episodes can be used. While this information is
implicitly included in the cumulative rewards, since the occur-
rence of a dry spot is penalized by the reward function, the
reward signals are also influenced by other factors. Therefore,
it can be advantageous to explicitly calculate the dry spot rate,
which, shall be minimized. By comparing the strategies, it
is possible to evaluate which algorithm gives the best result
in which configuration, but there is no indication of whether
an overall advantage can be obtained for the control of the
RTM process. Therefore, the learned strategies are additionally
4
compared with a constant baseline strategy that applies the
maximum possible injection pressure of 5bar to each gate in
each step. This corresponds to the static version of the RTM
process commonly employed industrially [1].
In our experiments, the insert placement and the level of
permeability perturbation cause either strong irregularities of
the flow front or even result in the formation of a dry spot
(leading up to 54% cycles containing dry spots). Table II
lists the results obtained when training the six pairings of
algorithms and observation spaces. The metric of average
cumulative rewards per episode is of limited use. First, the
rewards obtained are generally lower than in the first experi-
ment, which can be explained by the stronger perturbations of
the flow front. Second, within this experiment, the PPO agents
can generate higher rewards than the A2C agents, but in doing
so there appears to be no relationship to the rate of dry spots.
Third, the A2C agents receive significantly lower rewards
on average, in particular even less than the static injection.
Nonetheless, agent A2C/FfFvcP is the most successful in terms
of preventing dry spots. Also, the mean length cannot be
used unambiguously to compare strategies. The static process
requires the fewest steps per episode, because over half of the
test cases are terminated prematurely, which usually occurs
between step 10 and step 15. The conclusion that longer
episodes indicate better agent behavior is also incorrect. Agent
A2C/FfFvc stands out, producing comparatively long episodes
but still having a similar dry spot rate as, e.g., PPO/FfFvc,
whose episodes are on average about 9steps shorter. Thus,
the episodes of A2C/FfFvc are not longer because it prevents
more dry spots and thus leads more episodes to successful
completion, but because the strategy of A2C/FfFvc chooses
comparatively low-pressure values, causing the flow front to
progress slower and the episodes to last longer. Therefore,
Table II reports the mean length cleared of episodes that have
been stopped early as adjusted mean length. The dry spot rate
is used as a metric in this experiment. All agents achieve
a better dry spot rate than the static process. The trend is
that, as in the first series of experiments, agents with more
information can achieve better results. The best agents do not
show major disadvantages in terms of filling speed, yet the
static process is still the fastest. This becomes apparent when
considering the dry spot rate and the adjusted mean length
together, which shows that agents that can prevent the most dry
spots still have a comparatively high filling speed. There is no
clear trend between the two algorithms, in particular, an agent
trained by A2C performs best. Fig. 3 shows a comparison of
the best agent A2C/FfFvcP and the static injection, in which
the latter leads to the formation of a dry spot. The actions of
the static agent are plotted in the action graph for comparison,
with the maximum value of 5bar applied to all inlets at all
times. The learned strategy also behaves almost constantly. At
inlet 2, which is at the same “height” as the insert, 5bar is
applied, while the value at the two outer inlets is reduced to
half: 2.5bar. Due to the overall lower injection pressure, the
regulated flow front moves slightly slower than the static one.
In the snapshots, although both flow fronts are approximately
TABLE II
RES ULTS F ROM EX PE RIM EN TS WI TH S TRO NG PE RTUR BATI ONS .
Algorithm
/ Observa-
tion Space
Mean
Reward per
Episode
Mean
Length
Dry
Spot
Rate
Adjusted
Mean
Length
Static −157.421.15 54% 24.91
PPO/Ff −163.9 23.72 40% 29.27
A2C/Ff −322.8 29.27 41% 35.76
PPO/FfFvc −132.820.82 37% 24.95
A2C/FfFvc −275.1 29.37 39% 35.61
PPO/FfFvcP −138.5 22.67 34% 26.64
A2C/FfFvcP −221.5 23.82 27% 26.82
equally advanced, there are 1.5s between each. The static
flow front advances rapidly at the edges of the component
so that the part delayed by the lower permeability of the
insert lags behind. As soon as the advancing arms of the
flow front have passed the insert, they close up again because
the textile has a higher permeability there and is penetrated
more quickly. The delayed region of the flow front is so far
behind that the insert is not completely impregnated before
the flow front closes behind it. Thus, air entrapment occurs
and the episode is aborted. This can also be seen in the
reward graph in Fig. 3, as a strong negative reward signal is
triggered after 6.5s. The controlled flow front moves slower,
especially at the edges of the component. Although a slight
advance on both sides of the insert cannot be prevented, the
resin has almost completely penetrated the insert when the
two arms merge, as can be seen in the snapshot after 6.5s.
Thus, a dry spot does not form and the filling process can be
completed. In this experiment, we show that RL algorithms
can learn control strategies that reduce the dry spot rate of the
simulated RTM process. In doing so, they do not necessarily
have a disadvantage in filling speed. When provided with
more information, the agents can preemptively steer against
perturbations and thus achieve better results.
VI. CONCLUSION AND OU TL OO K
In this work, we showed that RL for the RTM process is
possible and yields better results than a statically parameter-
ized process. Another advantage of the presented approach is
that a mathematical model of the process is not needed, which
is the case for MPC. Through massive parallel computation,
we could train our RL models in an appropriate time. We
adapted and used an RTM filling simulation tool, that offered
properties specifically needed for RL. RL yields further ad-
vantages, as it is easier adaptable to other geometries and the
(re-)training is automatable with enough compute. Further it
can handle non-linear dynamics whereas MPC is using often
linear models only. Our approach is currently constrained to
a subset of RTM processes, but can be adjusted to other
setups: herefore, the simulation of the process and possibly
the reward function needs to be adjusted, depending on the
form and other properties of the product. For an application
in the real world, with an RTM machine, several steps would
need to be taken. If a component of the same shape and size
5
012345678 9 10 11 12
Time in s
0
2
4
Pressure in bar
Actions
Static, Inlet 1
Static, Inlet 2
Static, Inlet 3
A2C/FfFvcP, Inlet 1
A2C/FfFvcP, Inlet 2
A2C/FfFvcP, Inlet 3
t = 3.5 s
Static
t=5s t=6.5s
t=5s
A2C/FfFvcP
t=6.5s t=8s
012345678 9 10 11 12
Time in s
−300
−200
−100
0
Reward
Accumulated Rewards
Static
A2C/FfFvcP
Fig. 3. Snapshots from a filling cycle with strong perturbations controlled by A2C/FfFvcP. An static injection with 5bar is depicted as comparison.
we discussed in the paper would be the goal, a machine with
inlet gates that can be adjusted during the process and also a
monitoring system, that shows flow front (e.g. as shown by
Stieber et al. [13]) and pressure field of the process would be
necessary. Another way to apply this method to a real-world
process would be to use a Vacuum Assisted Resin Infusion
(VARI) process that usually works with a transparent vacuum
bag as the top half of the mold, offering a visible flow front.
For the pressure field, pressure sensors would be necessary
in both cases. After determining the process to use, a model
trained with a matching simulation could be used with our
method and then be re-trained to the real process, making this
a Sim-to-Real Transfer Learning [16] approach. Additionally,
effects such as race-tracking [17], [18], that happen in real-
world scenarios, need to be considered to adjust the flow-front
part of the reward function for real processes.
REFERENCES
[1] Handbuch Faserverbundkunststoffe/Composites: Grundlagen, Verar-
beitung, Anwendungen, 4th ed., ser. Springer eBook Collection Com-
puter Science and Engineering. Wiesbaden: Springer Vieweg, 2013.
[2] David A. Babb, W. Frank Richey, Katherine Clement, Edward R.
Peterson, Alvin P. Kennedy, Zdravko Jezic, Larry D. Bratton, Eckel Lan,
Donald J. Perettie, “Resin transfer molding process for composites,” U.S.
Patent US5 730 922A, 1996.
[3] K.-H. Wang, Y.-C. Chuang, T.-H. Chiu, and Y. Yao, “Flow pattern
control in resin transfer molding using a model predictive control
strategy,” Polymer Engineering & Science, vol. 58, no. 9, pp. 1659–
1665, 2018.
[4] H. H. Demirci and J. P. Coulter, “Neural network based control of
molding processes,” Journal of Materials Processing and Manufacturing
Science, vol. 2, no. 3, pp. 335–354, 1994.
[5] D. Nielsen and R. Pitchumani, “Real time model-predictive control
of preform permeation in liquid composite molding processes,” in
Proceedings of NHTC’00, 2000. [Online]. Available: http://seb199.me.
vt.edu/amtl/index htm files/c0003.pdf
[6] R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduc-
tion,” IEEE Transactions on Neural Networks, vol. 9, no. 5, p. 1054,
1998.
[7] M. Szarski and S. Chauhan, “Instant flow distribution network optimiza-
tion in liquid composite molding using deep reinforcement learning,”
Journal of Intelligent Manufacturing, vol. 34, no. 1, pp. 197–218, 2023.
[8] D. R. Nielsen and R. Pitchumani, “Control of flow in resin transfer
molding with real-time preform permeability estimation,” Polymer Com-
posites, vol. 23, no. 6, pp. 1087–1110, 2002.
[9] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap,
T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous Methods
for Deep Reinforcement Learning,” 2016. [Online]. Available: http:
//arxiv.org/pdf/1602.01783v2
[10] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov,
“Proximal Policy Optimization Algorithms.”
[11] J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trust
Region Policy Optimization.” [Online]. Available: http://arxiv.org/pdf/
1502.05477v5
[12] C. Obertscheider and E. Fauster, “Rtmsim - a julia module for
filling simulations in resin transfer moulding,” https://github.com/
obertscheiderfhwn/RTMsim, 2022.
[13] S. Stieber, N. Schr ¨
oter, A. Schiendorfer, A. Hoffmann, and W. Reif,
“FlowFrontNet: Improving Carbon Composite Manufacturing with
CNNs,” in Machine Learning and Knowledge Discovery in Databases:
Applied Data Science Track, ser. Lecture Notes in Computer Science,
Y. Dong, D. Mladeni´
c, and C. Saunders, Eds. Cham: Springer
International Publishing, 2021, vol. 12460, pp. 411–426.
[14] ESI Group, “Composites Simulation Software,” 01.08.2022. [Online].
Available: https://www.esi-group.com/products/composites
[15] Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto,
Maximilian Ernestus, and Noah Dormann, “Stable-Baselines3:
Reliable Reinforcement Learning Implementations,” Journal of Machine
Learning Research, vol. 22, no. 268, pp. 1–8, 2021. [Online]. Available:
http://jmlr.org/papers/v22/20-1364.html
[16] S. Stieber, “Transfer Learning for Optimization of Carbon
Fiber Reinforced Polymer Production,” Organic Computing:
Doctoral Dissertation Colloquium 2018, pp. 1–12, 2018.
[Online]. Available: https://books.google.de/books?hl=de&lr=
&id=B4uRDwAAQBAJ&oi=fnd&pg=PA61&ots=aV5KA5d-wo&sig=
QPtHaPsUDKnmVMqOK9OVu0- 9jas#v=onepage&q&f=false
[17] S. Bickerton and S. G. Advani, “Characterization and modeling of race-
tracking in liquidcomposite molding processes,” Composites Science and
Technology, vol. 59, no. 15, pp. 2215–2229, 11 1999.
[18] S. Stieber, N. Schr ¨
oter, E. Fauster, M. Bender, A. Schiendorfer, and
W. Reif, “Inferring material properties from CFRP processes via Sim-
to-Real learning,” International Journal of Advanced Manufacturing
Technology, 2023.
6