ArticlePDF Available

Towards online reinforced learning of assembly sequence planning with interactive guidance systems for industry 4.0 adaptive manufacturing

Authors:

Abstract and Figures

Literature shows that reinforcement learning (RL) and the well-known optimization algorithms derived from it have been applied to assembly sequence planning (ASP); however, the way this is done, as an offline process, ends up generating optimization methods that are not exploiting the full potential of RL. Today's assembly lines need to be adaptive to changes, resilient to errors and attentive to the operators' skills and needs. If all of these aspects need to evolve towards a new paradigm, called Industry 4.0, the way RL is applied to ASP needs to change as well: the RL phase has to be part of the assembly execution phase and be optimized with time and several repetitions of the process. This article presents an agile exploratory experiment in ASP to prove the effectiveness of RL techniques to execute ASP as an adaptive, online and experience-driven optimization process, directly at assembly time. The human-assembly interaction is modelled through the input-outputs of an assembly guidance system built as an assembly digital twin. Experimental assemblies are executed without pre-established assembly sequence plans and adapted to the operators' needs. The experiments show that precedence and transition matrices for an assembly can be generated from the statistical knowledge of several different assembly executions. When the frequency of a given subassembly reinforces its importance, statistical results obtained from the experiments prove that online RL applications are not only possible but also effective for learning, teaching, executing and improving assembly tasks at the same time. This article paves the way towards the application of online RL algorithms to ASP.
Content may be subject to copyright.
Journal of Manufacturing Systems 60 (2021) 22–34
0278-6125/© 2021 The Author(s). Published by Elsevier Ltd on behalf of The Society of Manufacturing Engineers. This is an open access article under the CC BY
license (http://creativecommons.org/licenses/by/4.0/).
Towards online reinforced learning of assembly sequence planning with
interactive guidance systems for industry 4.0 adaptive manufacturing
Andrea de Giorgio *
,
1
, Antonio Maffei, Mauro Onori, Lihui Wang
KTH Royal Institute of Technology, Department of Production Engineering, SE-11428, Stockholm, Sweden
ARTICLE INFO
Keywords:
Reinforcement learning
Adaptive assembly
Assembly sequence planning
Assembly guidance system
Manufacturing
Industry 4.0
Optimization
Knowledge retrieval
ABSTRACT
Literature shows that reinforcement learning (RL) and the well-known optimization algorithms derived from it
have been applied to assembly sequence planning (ASP); however, the way this is done, as an ofine process,
ends up generating optimization methods that are not exploiting the full potential of RL. Todays assembly lines
need to be adaptive to changes, resilient to errors and attentive to the operatorsskills and needs. If all of these
aspects need to evolve towards a new paradigm, called Industry 4.0, the way RL is applied to ASP needs to
change as well: the RL phase has to be part of the assembly execution phase and be optimized with time and
several repetitions of the process. This article presents an agile exploratory experiment in ASP to prove the
effectiveness of RL techniques to execute ASP as an adaptive, online and experience-driven optimization process,
directly at assembly time. The human-assembly interaction is modelled through the input-outputs of an assembly
guidance system built as an assembly digital twin. Experimental assemblies are executed without pre-established
assembly sequence plans and adapted to the operatorsneeds. The experiments show that precedence and
transition matrices for an assembly can be generated from the statistical knowledge of several different assembly
executions. When the frequency of a given subassembly reinforces its importance, statistical results obtained
from the experiments prove that online RL applications are not only possible but also effective for learning,
teaching, executing and improving assembly tasks at the same time. This article paves the way towards the
application of online RL algorithms to ASP.
1. Introduction
Sometimes good things have to be broken in order to be rebuilt even
better, a process referred as disruption. Assembly sequence planning
(ASP) seems to require it because of the recent introduction of the
paradigms of Industry 4.0 and smart manufacturing [1] that ask for
manufacturing systems and, specically, assembly lines to be adaptive
to changes [25], exible [6], evolvable [7,8], resilient to errors [9] and
attentive to the more knowledgeable operatorsskills and needs [10,11].
All these characteristics cannot be expressed by a static execution of a
predetermined ASP that is produced ofine, before that the assembly
takes place in the real-life industrial environment. Relying on xed in-
structions can certainly enforce the strength of an almost-optimal solu-
tion that neglects small but known deviations from the most common
procedure; however, a self-adaptable execution based on a strong plan is
in line with what required by Industry 4.0 [12]. As shown in this article,
the latter can address those exceptions too, leading towards higher
success rates in assembly.
Reinforcement learning (RL) is a machine learning method [13] that
deviates from the idea of training learning algorithms only on all the
available data, by accepting the possibility that the training could
continue over the execution phase, when the algorithm is applied. The
application of such an algorithm to an unforeseen case generates new
learning data. The interaction with the changing environment makes
these algorithms more resilient; however, it is often the case that for
environments that are well-modelled, such as experiments in physics,
the RL algorithm be that an ant colony, a particle swarm, a genetic
algorithm, etc. is applied to the simulated environment to generate an
optimal solution [14,15]. This is indeed a good method because both the
simulation and the RL algorithm can be run hundreds or thousands of
times per second. Even though RL adaptable systems have been suc-
cessfully developed for other manufacturing purposes [16], the
approach is not yet feasible in assembly, as simulating the overall
complexity of an assembly line and the interaction with humans is still
* Corresponding author.
E-mail address: andrea@degiorgio.info (A. de Giorgio).
1
URL: https://andreadegiorgio.com
Contents lists available at ScienceDirect
Journal of Manufacturing Systems
journal homepage: www.elsevier.com/locate/jmansys
https://doi.org/10.1016/j.jmsy.2021.05.001
Received 8 December 2020; Received in revised form 7 April 2021; Accepted 1 May 2021
Journal of Manufacturing Systems 60 (2021) 22–34
23
an open challenge, i.e. having entire factories that are digital and
simulated. Today, ASP simulations can only rely on xed parameters
such as data extracted from the assembly CAD models or relevant in-
sights in the topic. An example are all the criteria-based optimizations
present in literature [1720]. Luckily, not all the good aspects intro-
duced by RL are lost in manufacturing. This article makes use of the
basic step-by-step learning paradigm of RL and a statistical approach to
prove that RL algorithms can be successfully applied in online ASP, at
assembly time, when the number of assembly executions is limited to the
possibilities of the real environment. The experiment is planned,
developed and executed as an agile process, with incremental hypoth-
eses. The experimental setup is built around a manufacturing course
Tillverkningsteknik (MG1026) at KTH Royal Institute of Technology in
which circa 150 students assemble a metal locomotive toy.
The approach of this article that introduces ASP variations at as-
sembly time is in accordance with Martons phenomenographic theory
[21] that afrms that there is no learning without discernment, and
there is no discernment without variation. Both the assembly operators
and the assembly environment introduce a great deal of variation that
needs to be captured and exploited at the right time, i.e. during as-
sembly, to generate better assembly sequences. Furthermore, the as-
sembly system presented enforces the human-centered view of the
operator 4.0 [22] for improvements in their physical ergonomics and the
creation of useful mental models of the assembly for the operator, when
these are not provided by the assembly design [23].
Among the preliminary results of the agile experiment, it emerges the
need to digitalize the assembly-operator interaction at assembly time,
which is solved by introducing digital twins of the operators cognitive
process and the assembly [24]. The gamication of the user experience
has also proven a successful technique in manufacturing [25]. Thus, it
can be integrated into an assembly guidance system (AGS). It follows
that the article presents an agile-developed version of such AGS and the
data collected over trial and error attempts to generate ASP online with
it.
The aim of this article is to pave the way towards the introduction of
online RL statistical techniques to model the ASP at runtime, instead of
the common ofine use of RL techniques for creating pre-determined
and xed ASP to run at assembly time.
This article is structured in the following way. Section 2 contains
additional literature review on the topics touched by this introduction.
Section 3 explains the research methodology applied to nd innovative
solutions to an old problem, i.e. ASP. Section 4 is the core of the paper
and it is divided into several subsections corresponding to the various
phases of the agile experiment and their partial results. Section 5 pre-
sents and discusses the overall results in applying RL for ASP. Section 6
presents the conclusions and outlines some future work.
2. Related works
Claeys et al. [26] introduce a generic model for managing context
aware assembly instructions, i.e. the assembly instructions are
pre-generated and stored to be retrieved when most useful, depending
on the assembly environment. Their system, however, does not present
learning capabilities at assembly execution time. In general, there are
algorithms for solving and optimizing ASP problems based on known
variations [1720,27,28], computer aided geometric feasibility and
optimization ASP algorithms [2932]. None of these considers changing
the assembly sequence at assembly time.
A literature review from Rashid et al. [33] reports several articles
applying soft computing methods for ASP, including many that belong to
the RL class, such as twenty-two using genetic algorithms, three using ant
colony optimization, ve using particle swarm optimization. There have
been a few attempts to determine an optimal assembly sequence using
reinforcement learning [34] and at least an attempt using deep rein-
forcement learning from Zhao et al. [35]; however, these methods focus
on reinforcing policies, i.e. converging towards an optimal solution, on
assembly states and conditions that are not directly acquired in an in-
dustrial environment but generated from a knowledge base. This
approach has been agged as partially wrong by Kaelbling et al. [36] for
two main reasons: Firstly, because mapping an environment in advance,
e.g. with a knowledge base, requires a huge effort than compared to
acquiring data while operating in the environment. Secondly, because
the environment is often subject to changes that can be better handled by
an adaptable system that learns during its execution. Thus, the ability of
an assembly planning system to learn during the assembly execution
based on human decisions and real-time issues is fundamental. A work
from Watanabe & Inada [37] seems to go in this direction, though it
focuses on acquiring historical performance data from a robotic assembly
and use reinforcement learning to improve the assembly task. As a
conrmation that reinforcement learning is much more applied in ro-
botics than in manufacturing problems, further joint robotic-assembly
works are hereby reported: Yu et al. [38] proposed a case study using
reinforcement learning to solve the scheduling problem in a human-robot
collaborative assembly task. Martinez et al. [39] focused on reinforce-
ment learning of robotic manipulations as part of an assembly task. All
the reviewed scientic literature seems to neglect the possibility of
applying reinforcement learning directly to human behavior during as-
sembly tasks. An operation that holds the potential to elicit dynamic
environmental knowledge and personal knowledge from the operators.
It is fair to say that a standard aspect left untouched by this article is
the use of liaison matrices to represent assemblies [4042]. The RL
approach hereby presented is based on the aforementioned mathemat-
ical tool that has become common practice for computer-based assembly
representation; however, an innovation comes from using the liaison
matrix as a mathematical base for the RL statistical algorithms.
3. Research goal and methodology
The aim of this research is to introduce the use of reinforcement
learning (RL) to produce an optimal assembly sequence plan during
assembly execution. RL is a class of powerful machine learning (ML)
algorithms able to learn during the execution of a process. They repre-
sent an alternative to the traditional learning methods where, rstly, a
process is executed and its data collected, secondly, the process data is
learned by an ML algorithm. The traditional ML methods are in line with
and used by the common assembly sequence planning (ASP) strategies
adopted by industry. Usually, an engineer plans the optimal ASP by
studying the product features with some support software, often based
on ML algorithms, and then the optimal ASP is implemented in form of
assembly instruction manuals or any kind of non-adaptive guidance
systems for the assembly execution. This approach (see Fig. 1) requires
the assembly operators to give feedback only when it is too late to apply
changes to the established ASP and a major change requires the original
engineer to run the whole optimization process again with the new
parameters acquired from the feedback, often after the review of the
assembly design. Several operations that require time and effort, other
than a distributed workload over ASP engineers, operators and other
parties involved in the product design.
An ideal RL strategy could introduce an additional and faster feed-
back cycle (see Fig. 2) between the ASP generation and execution. The
ability of RL algorithms to nd optimal solutions while maintaining a
degree of adaptability to new scenarios is exactly what can enable a new
framework for ASP and become the ultimate goal of this research;
however, there is no straightforward way to achieve this, as such a goal
Fig. 1. Traditional feedback cycle in ASP.
A. de Giorgio et al.
Journal of Manufacturing Systems 60 (2021) 22–34
24
requires to assess the effects of changing different aspects of traditional
ASP and RL methods. RL algorithms are based on the possibility to
simulate large number of executions in modelled environments, char-
acteristic that is missing in the ASP process. The number of executions is
limited to those happening in the real factory and the RL method needs
to converge faster than its computational counterparts do. Note that the
RL feedback cycle is faster because it can be executed at each assembly
step. The traditional feedback ow is longer because it requires the
entire assembly to be ended before that any data can be fed back to the
assembly designer or to the assembly planner.
There are ve aspects of traditional RL that have to be mapped to
ASP: environment, agent, states, rewards and actions (see Fig. 3). The
environment is clearly the assembly process. One would rather think of
the assembly station as a production environment, but be aware that this
computer science terminology refers to the process that is modelled as
an ML algorithm rather than a real environment where the process is
executed. The agent is somehow a bit more complex to dene. An agent
can be both the operator executing the assembly and the assembly
guidance system taking a certain decision for the operator. States and
actions are connected to how an assembly plan can be represented in
form of computational knowledge. Finally, rewards are connected to the
optimization strategy and are of two kinds: the feelings perceived by
operators in satisfactory assembly steps and the numerical score
attributed to the execution of certain actions in answer to certain states.
The former reward is perhaps the most complex aspect to model, espe-
cially when the optimality of a step becomes subjective because of the
choice of the particular operator executing the assembly. A more
objective way to consider the rewards is by looking at aggregated
choices from different operators.
As mentioned earlier, there is a duality of the agent in the real world
as human operator and as its digital twin embedded in the AGS. A
human is digitally represented by the inputs and outputs of a set of
sensors and interfaces enabling for a direct translation of the human
perception, intention and actions to the simulated environment where
RL is enforced. This constitutes a central point for this research. Namely,
exploring how assembly knowledge is transferred from an AGS to an
operator and vice versa.
The overall system is presented in Fig. 4. An operator O
i
produces an
assembly plan P
i
during the assembly Ai. The current plan and the past
ones P1, , Pi are used for online RL of the AGS instructions to the next
operators.
All the aspects of this system, previously described, become sub-
goals for this research. Therefore, an agile approach is required to
explore and meet as many sub-goals as possible and pave the way to-
wards the ultimate goal that is using RL in ASP. A series of experiments
are planned and executed one at a time before knowing what the next
one will be. Each experiment acts as an observation environment, useful
to produce new research hypotheses and test them within the next
experiment. This adaptive methodology has an advantage when the
intermediate goals are clear, but there is no clear understanding about
the overall process to investigate and there does not exist an established
experiment to directly achieve and test the ultimate goal.
Each experiment is carried out with the following cycle of operations
(see Fig. 5):
analysis of results from prior experiments and next research hy-
potheses generation;
setup and test of new experimental equipment;
assembly under experimental conditions;
collection of results.
Each experiment cycle is performed several times. The agile principle
allows testing an experimental setup for errors and possible issues, other
than collecting signicant statistical data before proceeding to the next
experiment; however, for clarity, results presented in this paper do not
mention how many times a cycle is executed prior to the obtainment of
denitive results. Issues encountered in some of the test runs might be
presented as general results of one particular experiment and omitted in
the others. This does not indicate a discontinuity of issues, but rather a
shift in the focus of the experiments.
4. Experimental setups and partial results
The experimental setup consists of an assembly station for a metal
toy locomotive, as shown in Fig. 6. The assembly station is minimal, as it
is part of a university course and not an industrial line. There is a wide
table, with assembly tools and components. A group of students, further
referred as novice operators, performs the assembly task. The peculiarity
of this assembly station is that the product is stably invariant, while the
operators change at every experiment. The academic course Till-
verkningsteknik (MG1026) at KTH Royal Institute of Technology is
structured so that circa 150 students take part every semester to several
applied tasks. One of these involves circa 50 locomotive assemblies,
done in groups of one to four students. The same students who perform
the nal assembly produce the locomotive components during the
course; however, the course objective is teaching how to manufacture
the parts, rather than assembling them. Thus, altering the assembly does
not alter the learning outcomes of the course and this allows the
experimental setups to be independent from any educational needs.
Each experiment is, as much as possible for this article, reported as a
standalone execution. This is explained in the scientic methodology
section; however, the overall scientic progress is part of a unique
experimental setup that evolves in an agile way towards the interesting
ndings. Thus, sometimes the lines are a bit blurred and some common
details are only explained in the section corresponding to the experiment
where they are mainly relevant.
4.1. Setup of the rst experiment
The rst experiment consists of placing a camera over the assembly
station where the metal toy locomotive is assembled. The novice oper-
ators follow the assembly instructions provided by three paper sheets
present on the table. The instructions consist of an exploded view of the
assembly, a list of screws and their manufacturing data and labels
associated with a certain CAD component in the drawings and an
operation list that goes as follows:
Assemble the boiler and front cover.
Mount the boiler on the frame. Use dome nut and chimney as nuts.
Do not overtighten.
Insert the roof into the cabin. The roof is a black plastic plug.
Mount the cabin to the boiler.
Mount a wheel on each axle. Push the shoulders into the frame.
Fit the remaining wheels.
Fig. 2. Traditional and fast feedback cycles, when the ASP is generated at as-
sembly time with RL.
Fig. 3. Traditional RL architecture.
A. de Giorgio et al.
Journal of Manufacturing Systems 60 (2021) 22–34
25
In this experiment, the observing researcher does not interact at all
with the operators. Nine videos are thus recorded and analyzed.
4.2. Results of the rst experiment
This setup consists of the original course assembly task and it is al-
ways successfully completed by the students/operators, with the
optional assistance of the course instructors.
Nine assembly videos, recorded with this setup, are watched several
times to qualitatively and quantitatively grasp several aspects of the
assembly scenario, as seen from different operators. This is in line with
Martons phenomenographic approach [21]. Among the observations, a
few issues that are relevant to the development of a human-machine
interaction system are identied.
Issue 1. Parallel assembly. Having 25 operators per assembly
station means that at some point a bit of parallel assembly is inevitable.
Further experiments need to handle this issue in order to capture a
proper assembly sequence.
Issue 2. Product quality. A few times during an assembly task, a
subassembly operation is suspended because a component has not been
properly machined and needs some further adjustments. The total as-
sembly time should exclude eventual delays due to quality issues. If
quality issues arise, there are no other manufactured components that
can be used to replace the originals. Thus, a quality check needs to be
run prior to assembly.
Issue 3. Camera perspective. About half the videos are recorded
from a side perspective and another half are recorded from a top
perspective. Both camera positions present advantages. A side perspec-
tive works best for showing the action from a human perspective. A top
view is better to avoid occlusions due to objects or humans in the scene.
Ideally, a camera can be positioned 45 degrees towards the table,
halfway between top and side view.
Issue 4. Operatorsintentions. Verbal communication is used to
exchange intentions among operators while performing the assembly;
however, it is hard to dene when a certain intention arises, before it is
communicated to the other operators and/or it is concretized into an
assembly action.
Fig. 4. System overview.
Fig. 5. Agile approach to experiments. Each experiment consists of a set of steps, from analysis of previous results and research hypotheses generation (HG), to the
preparation of a next experimental setup, assembly and collection of results.
Fig. 6. A rendering of the locomotive from its CAD parts.
A. de Giorgio et al.
Journal of Manufacturing Systems 60 (2021) 22–34
26
4.3. Hypotheses generation and setup of the second experiment
In order to create more variability, according to Martons proposal
[21], in the second experiment the operators are asked to perform the
assembly with the sole exploded view, without reading the written in-
structions. Course assistants are asked to refrain from helping the stu-
dents and assistance is only provided in case of major problems, such as
quality issues.
The research hypotheses for this experiment are the following:
Hypothesis 1. When lacking the assembly instructions, the operators
have to apply their own understanding of the assembly task, consisting
of limited knowledge due to prior personal experience or education.
Hypothesis 2. The assembly process can be successfully completed
without the assembly instructions.
The experiment should be useful to learn what kind of knowledge an
AGS needs to provide in order to obtain a successful assembly of the
product.
4.4. Results of the second experiment
The results of this experiment are of a qualitative nature and based
on a number of experiments that can give intuitive answers to the two
hypotheses made. About ten assemblies are executed without providing
the written instructions to the operators, but the sole exploded view and
screws/nuts tables. All the operators indeed tried to perform the as-
sembly, even with missing instructions. In a few cases, the intervention
of the researcher has prevented the sub assembly of parts that would
have jeopardized the completion of the assembly. Thus hypothesis 1
(H1) is conrmed but hypothesis 2 (H2) is false because at least one
exception was found, i.e. an assembly was not properly completed
without instructions. The operators would use their knowledge in spite
of the missing written instructions, as in H1, but the completion of the
assembly relies on the ability of the observing researcher to steer the
operators away from the wrong assembly sequences once the intention
of the operators is manifested. This last point is a key issue for the
generation of new hypotheses and their testing.
It follows, from a positive H1 and a negative H2, that an AGS is
needed. The guidance system has to prevent the operators from doing
something wrong while allowing them to exploit their personal knowl-
edge of the assembly. Ideally, for optimal communication, such a system
is only required to provide instructions based on the manifested in-
tentions of the operators. Two engineering techniques are introduced to
the experimental setup and used to the develop a guidance system for
this purpose: liaison matrices and soft constraints. Liaison matrices are a
way to mathematically encode if two assembly components are to be
assembled together. In particular, an adjacency matrix lists all the
neighbor components and the liaison matrix does that; but it also ex-
cludes components that do not have a stable assembly operation be-
tween them. The limitation to this matrix is that it is not possible to
dene if more than two liaison components are part of the same sub-
assembly or separate ones.
In order to encode the subassembly order, there are two main ap-
proaches. One considers the assembly steps and denes which compo-
nents belong to each step. The other is based on precedence constraints.
For each component to be assembled a precedence matrix indicates
which other components must have been assembled before. An operator
start assembling the components that have no requirements and pro-
ceeds with those that are allowed by the assembled components, until
the assembly is nished. These two approaches are supported by several
ASP methods that rely more on one or the other. For example, AND/OR
graphs or any winning assembly sequences from ASP based on genetic
algorithms show all the alternatives available for each step, while the
assembly precedence graphs constrain the sequences to those that can
satisfy the precedence constraints. In any case, the aim of ASP is to
generate as many sequences that are feasible and select the best
sequence for the nal assembly and relative instructions. For a guidance
system to allow an operator to freely select the next assembly step, while
checking that such step is not preventing the execution of the whole
assembly, the method that provides the more adaptable solution is an
AND/OR graph which lists all the possible solutions; however, an AND/
OR graph is not easy to produce for every assembly and embed into
matrices for automatic execution in an AGS. Methods that come up with
few assembly plans are not generic enough to evaluate assemblies that
are freely dened by an operator. Instead, precedence constraints are
widely used. Because precedence matrices are easy to dene and deploy
for many assemblies. The only issue with precedence constraints is that
they do not leave much choice to the operators unless they can be
violated. Thus, soft constraints are preferable for the next experimental
setups, i.e. some precedence constraints that are not mandatory. The
starting point is testing the ability of the operators to complete the as-
sembly without introducing any soft or hard constraints.
An AGS in its version 1 (v1) - is developed as a touch screen
interface that shows the exploded locomotive assembly and allows an
operator to pick two by two components that have a liaison, i.e. a value
of one between them in the liaison matrix (see Figs. 7 and 8). The
resulting interface is presented in Fig. 9. The soft constraints are
imposed visually, in form of a green/gray map (see Fig. 10) that shows
components that can be picked up. The exploded view blinks with the
green/gray map to show that components are selectable. Once a
component is selected this becomes blue. Two selected components are
automatically conrmed as assembled in real time. An undobutton
appears to cancel the latest operation. See Fig. 9 for more details.
The AGS records all the assembly operations in a. mat le containing
the assembly transitions or actions (from previous component to next
component) specied over the liaison matrix as increasing values from 1
to N. This le is easily importable in Matlab®, where all the results are
collected and analyzed. At any time, the. mat le contains all the in-
formation from the previous successful assemblies and it is reloaded and
updated by the AGS for each new successful assembly. Thus, each as-
sembly is guided by the partial results obtained by all the previous as-
semblies plus the values for the current one. This enables a
reinforcement learning approach to detect and enforce soft constraints.
It becomes the objective of the following experiments to record through
the AGS and statistically analyze in Matlab® the assembly behavior of
the operators.
Fig. 7. Component IDs for the locomotive assembly.
A. de Giorgio et al.
Journal of Manufacturing Systems 60 (2021) 22–34
27
4.5. Hypotheses generation and setup of the third experiment
Given the results of the second experiment and the development of
the AGS v1, hypotheses 3, 4 and 5 (H3, H4 and H5) are made:
Hypothesis 3. The AGS provides the needed assembly guidance to
complete the assembly when the assembly instructions are missing.
Hypothesis 4. The AGS solves the parallel assembly issue 1 by forcing
the assembler to plan and execute operations serially.
Hypothesis 5. The AGS solves issue 4 by framing an operators
intention before they can execute it.
The AGS v1 is deployed (see Fig. 9) for tests over another round of
assemblies. At this stage, the hypotheses to be tested are H3, H4 and H5,
all of them about the AGS. Thus, the setup is the same as in the second
experiment, but the researcher does not interact with the operators for
other reasons than to explain how to use the AGS itself or when fatal
assembly operations are done.
4.6. Results of the third experiment
As said before, several assemblies are executed without providing the
written instructions to the operators, but the sole exploded view and
screws/nuts tables. The results of this experiment verify H3, because all
the operators in 20 trials understood how to successfully complete the
assembly, though in few executions they still required a little external
help to spot the existence of some constraints. The use of an AGS also
conrms the validity of H4, because no parallel assembly attempts arose
when the operators had to stick to selection and execution steps with the
AGS. It becomes rather difcult to understand if H5 can be conrmed or
not, as the researchers observations captured another related issue
preventing the demonstration of H5. Even if the AGS can elicit the
intention of the operators, not all the operators are comfortable with the
idea of selecting components to be assembled in two by two selections.
In almost all cases, the explanation of the researcher that the system only
takes two components at a time was not accepted or accepted with
unfavorable comments. The problem found is formulated as:
Issue 5. Subassemblies. Every operator expresses their assembly
intention in form of new subassemblies made of several components,
rather than adding one new component to the current assembly.
4.7. Setup of the fourth experiment
The results from the third experiments lead to issue 5 that can be
addressed by adjusting the AGS structure in a way that reects multi-
component subassembly selections. Thus, a new AGS version 2 (v2) is
developed.
In this second version of the AGS, some soft constraints are
Fig. 8. Liaison matrix for the metal locomotive assembly.
A. de Giorgio et al.
Journal of Manufacturing Systems 60 (2021) 22–34
28
introduced. Initially based on the ASP presented in the paper assembly
instructions, the constraints are imposed visually in form of a colored
map (see Fig. 10 for the colormap and Fig. 11 for the AGS v2 interface)
and updated after the execution of each assembly. This map shows
components that are to be picked up later in time as more yellow than
components that are to be picked up sooner in time, which are greener.
The component colors are updated by recording the most frequent order
of the selected components. Thus, they encode an ASP sequence in a less
explicit or fuzzy language, using a simple action or transition fre-
quency matrix.
The transition frequency matrix is based on the liaison matrix and it
is expressed component by component with a range of increasing values
from one to N over the liaisons used. Liaisons are set from the beginning
to values -1. The matrix is, of course, symmetrical.
While the colored ap does not solve a particular issue, it has the aim
of reinforcing the idea that a standard ASP exists before that an operator
can analyze the assembly state and make their decision. It is a method to
elicit knowledge from the human operators while they are using the
AGS.
On the AGS interface, at rst, the entire colored map blinks. After the
selection of a rst component that becomes blue, all the liaison-related
component blink with the color map, while the rest of non-liaison-
related components are gray and unselectable. On the bottom-left
corner, an assembled locomotive rotates to give a preview of the nal
objective of the assembly process. On the top-left corner, a button allows
to undo the last assembly operation. On the top-right corner, the selected
Fig. 9. Assembly guidance system v1 with digital twin of the
locomotive assembly. (a) The green/gray map allows selecting
all the components that have available liaisons. (b) Component
17 threaded screwis selected. (c) Component 8 front cover
is selected and the subassembly {8,17} is automatically
considered done. Components 8 has become gray in the green/
gray map because there are no other liaisons left for it, i.e. they
cannot be further assembled. While component 17 still has one
liaison left with component 9 boiler. On the top-left corner, a
button allows to undo the last assembly step. On the bottom-
right corner, the assembled locomotive is shown. (For inter-
pretation of the references to colour in this gure legend, the
reader is referred to the web version of this article).
A. de Giorgio et al.
Journal of Manufacturing Systems 60 (2021) 22–34
29
subassembly is shown. The button assemble componentsconrms the
subassembly operation done at the real station. When this is pressed, the
selected assembly is moved to the bottom-right corner, where all the
assembled components are shown as a preview of the current assembly
state. A white ribbon on the bottom of the AGS interface describes the
current state and the possible operations. See Fig. 11 for more details.
The aim of the subsequent experiment is to test that the AGS v2 does
not generate issues that are similar to issue 5. The experimental setup is
kept as before, with the sole change of the guidance system deployed in
its version 2 and the issue 5 test is formulated as a new hypothesis 6
(H6):
Hypothesis 6. If every operator can express their assembly intention
in form of new subassemblies made of several components, rather than
adding one new component to the current assembly, issue 5 disappears.
4.8. Results of the fourth experiment
This round of assemblies veries H6 because it quantitatively shows
that operators can think in a quite wide range of subassemblies (see
Table 1) and subsequent assembly sequences. It also shows that when
the optimization criteria are not made explicit, people tend to follow any
of their own ideas. Which can be formulated as:
Issue 6. Optimization criteria. Operators with soft constraints and
no instructions tend to favor their own personal optimization criteria,
which are not explicit.
This experiment conrms that, despite H6 stands, H5 is false. H5 was
hard to prove or falsify with the previous experimental data, but this
time it is falsied by the fact that there are no limitations to the in-
tentions that an operator could have. Thus, the AGS will always be
limited by the programmers understanding of the operators way of
thinking.
4.9. Hypotheses generation and setup of the fth experiment
All the previous experiments have created a basis of hypotheses and
issues that leads to a nal experiment. This experimental setup makes
again use of the successful AGS v2. The researcher illustrates its use by
explicitly asking the operators to pick subassemblies that would be
stable after the assembly operation (see Table 1), i.e. when the com-
ponents will hold together by gravity, friction or anything else than the
operatorsability to keep them together. This setup aims at verifying the
following:
Hypothesis 7. If optimization criteria are given, the choice of sub-
assemblies converges towards specic choices.
In particular, this experiment should quantitatively verify hypothesis
7 (H7) in terms of stability of the chosen subassemblies. It should also
validate all the previously proposed solutions to tackle the issues found.
For this purpose, a whole class of circa 150 students is dedicated to one
nal large experiment, able to generate a statistically relevant number of
assembly executions with the proposed AGS in its nal version v2 and
stability criteria.
4.10. Results of the fth experiment
An entire class of students corresponds to 47 assembly groups and
relative locomotive assemblies. For each assembly, the selected sub-
assemblies and subassembly sequences are shown in Fig. 12. If these
results are compared with those of the previous experiment, see Table 1,
the choice of any subassemblies this time is limited to stable ones, thus
numerically conrming H7.
The statistical results offer insights on what are the most common
subassemblies in both the fourth (limited to the stable ones) and fth
experiments, namely: {8,9,17}, {7,12}, {7,9,18}, {1,3,13}, {2,5,15},
{1,4,11,14}, {2,6,11,16}, {9,11,20,21} and {9,10,11,19}. The obser-
vation of such common subassemblies in different order from Fig. 12
suggests that statistics of what subassembly at which step can also be
extracted from the data and highlighted. The operation leads to Fig. 13,
a statistical assembly step graph. Which is also a novel contribution to
literature introduced by this article. This graph encodes the most sig-
nicant subassembly/step choices to complete the assembly and it
represents a hybrid form between a general AND/OR graph and a fully
dened assembly sequence.
The order of selection of all the subassemblies is analyzed in Matlab®
and it generates two relevant results shown in Figs. 14 and 15. The rst
one is a statistically reinforced subassembly transition matrix (Fig. 14)
that is composed of all the values corresponding to a transition from one
subassembly (column) to another (row). The diagonal shows the total
count of each subassembly and it is in accordance with the gray boxes
displaying the same information in Fig. 13. In orange and red, the matrix
values that are respectively above two-third and one-third of the diag-
onal numbers for the row, used as conventional thresholds to highlight
the information contained. A matrix similar to this, but listing each
component instead of the subassemblies is the one used to generate the
colormap of Fig. 10. This suggests how the same operation could be done
by visually letting the operator chose an entire sequence of sub-
assemblies. The second result is a statistically reinforced subassembly
precedence matrix (Fig. 15) that is generated by collecting all the values
corresponding to a transition from any subassembly (column) to another
(row), at any step. In other words, the past use of a subassembly is the
value displayed by the column for each selected subassembly (row). If
the column value is zero it means that the corresponding subassembly
has never been assembled before the one indicated by the row. If the
same conventional threshold as before is set for this matrix, considering
two thirds of the diagonal number as a qualifying value, the cells above
it are colored in red. They constitute precedence constraints that can be
enforced to obtain an optimal ASP from the collective operators
knowledge elicited by the AGS with a statistical reinforcement learning
process.
It is important to outline that the precedence/transition matrices in
Figs. 14 and 15 did not numerically drive the assembly process in this
research; however, they can be used to generate further colormaps that
implicitly leave the choice to the operators, i.e. by showing colors
instead of numbers. A similar operation was done on the statistical
transitions applied over the liaison matrix to generate the AGS colormap
shown in this article. Alternatively, in the application of online RL al-
gorithms, these thresholds can be tuned up as hyperparameters for the
RL algorithms to make an informed choice instead of leaving it up to the
Fig. 10. Colored map for the locomotive assembly. Green components should
be assembled earlier than yellow components. (For interpretation of the ref-
erences to colour in this gure legend, the reader is referred to the web version
of this article).
A. de Giorgio et al.
Journal of Manufacturing Systems 60 (2021) 22–34
30
Fig. 11. Assembly guidance system v2 with digital twin of the locomotive
assembly. On the bottom-left corner, an assembled locomotive rotates to
give a preview of the nal objective. On the top-left corner, a button al-
lows to undo the last assembly operation. On the top-right corner, the
selected subassembly is shown. On the bottom-right corner, all the
assembled components are shown as a preview of the current assembly
state. The button assemble componentsconrms the subassembly
operation done at the real station. (a) The colored map allows selecting all
the components. (b) Component 17 threaded screwis selected. (c)
Subassembly {8,9,17} is selected. (d) Subassembly {8,9,17} is assembled
and components 8 and 17 have become gray in the colored map because
they cannot be further assembled.
A. de Giorgio et al.
Journal of Manufacturing Systems 60 (2021) 22–34
31
operators.
5. General discussion of results
Advancing from the rst to the fth agile experiments, the given
paper instructions, illustrating a predened xed ASP, have been
replaced by an adaptive AGS that learns an optimal ASP from the op-
erators. By comparing the ASP before and after, it can be seen from
Table 2 that the instruction order is slightly changed. This is due to the
understanding level of the operators and the proper codication of their
true intentions provided by the AGS interface. For instance, separate
instructions such as Push the shoulders into the frame, relative to
subassembly {1,2,11}, and Fit the remaining wheels, relative to sub-
assemblies {1,4,14} and {2,6,16}, is replaced by a unique operation
described by subassemblies {1,4,11,14} and {2,6,11,16}. This is because
tting the axel into the frame comes more natural for the operator when
the operation is directly completed with the addition of the remaining
wheel to it. In both cases, namely with paper instructions or AGS, the
assembly is successful. Thus, the software approach objectively allows to
structure and document the ASP process, together with the mental
process of the operators, without interfering with them.
Among the many issues encountered, all were solved either imme-
diately or by a following experiment. In particular, issues 14 are
tackled at the beginning, providing the fundamental choices leading to
the AGS, and issues 5 and 6 allow improving the AGS from its version 1
to version 2. The working hypotheses generated by this process are all
qualitatively or quantitatively veried or falsied and overall show that
controlling the experimental design and its variables in such a high
complex assembly task is not only possible, but also fruitful. The sta-
tistical results collected at the end of the fth experiment provide a
statistical basis to apply RL algorithms at assembly time, basing the
optimization function not on preexisting criteria but on the informed
decisions of knowledgeable operators in the era of Industry 4.0. The
statistical step frequency graph (see Fig. 13) and the statistical hard and
soft precedence constraints (see Figs. 14 and 15) generated by this work
are in line with the outcomes of previous techniques, with the sole dif-
ference that they are dynamically generated at assembly execution time,
as online RL methods would require.
Table 1
Frequency and stability of subassemblies selected with the assembly guidance
system v2 in the fourth experiment (without the stability criteria) and in the fth
experiment (with the stability criteria).
Subassembly Freq. 4
th
exp. Freq. 5
th
exp. Stability
{8,9,17} 15 47 Stable
{7,12} 15 47 Stable
{7,9,18} 14 47 Stable
{1,3,13} 13 35 Stable
{2,5,15} 12 35 Stable
{1,4,11,14} 9 33 Stable
{2,6,11,16} 8 33 Stable
{10,19} 8 0 Unstable
{9,11,20,21} 8 47 Stable
{9,10,11,19} 7 47 Stable
{11,19,20} 6 0 Unstable
{2,6,16} 6 3 Stable
{2,11} 5 0 Unstable
{9,19} 5 0 Unstable
{20,21} 5 0 Unstable
{1,11} 4 0 Unstable
{1,4,14} 4 4 Stable
{9,19,20} 3 0 Unstable
{5,15} 2 0 Unstable
{9,20} 2 0 Unstable
{9,20,21} 2 0 Unstable
{11,19} 2 0 Unstable
{1,2,11} 1 4 Stable
{1,3,11,13} 0 2 Stable
{1,3,4,11,13,14} 1 8 Stable
{1,3,4,13,14} 0 2 Stable
{2,5,11,15} 1 1 Stable
{2,5,6,11,15,16} 0 9 Stable
{2,5,6,15,16} 0 2 Stable
Fig. 12. Subassembly steps for each group.
A. de Giorgio et al.
Journal of Manufacturing Systems 60 (2021) 22–34
32
6. Conclusions and future work
This research shows that ASP can be done with statistical RL tech-
niques with a step optimization approach at assembly time. The ASP
optimization policies are determined and driven by the competence of
the Industry 4.0 skilled operators. Computers, in particular AGS, have to
be the interface between the digital world that has computational power
to support informed decisions, and the humans who operate in the real
world. This is a new approach for an expert operator that can fully
interact with the ASP algorithm, or rather be part of it, and drive the ASP
optimization function, based on their personal experience on the
assembly lines and of an inevitably complex world. The advantage is to
be able to tackle any kind of unknown problems before they can arise
and add a great deal of adaptiveness and resilience to the industrial
operations.
This research sheds light upon how an important peculiarity of RL
algorithms is not exploited in industrial processes. This is the exploita-
tion vs exploration pattern ironically human-inspired corresponding
to asking operatorsto either stick to their instructions or take initiative.
This research has partly proven that, especially in the context of Industry
4.0, giving options to knowledgeable operators is the way forward to
truly take advantage of the online capabilities of RL algorithms in in-
dustrial applications, in particular with ASP. Moreover, the detailed
approach allows coexistence with more elaborate schemes in which the
assembly processes are dened through ontological work [43] or for
semi-automated operations requiring high adaptability and
self-conguration [44]. This is an added advantage as the production
designers can create assembly systems that are gradually automated
from manual to fully automated.
The introduction of Martons variational approach to the ASP,
together with the developed AGS, allows adding another couple of
components of the Blooms Taxonomy [45] to the learning experience.
Fig. 13. Subassembly step frequencies. The total can be read both as the sum of
the step frequencies and the total frequency of each subassembly. On the side
column, in gray, the subassemblies with the greatest frequencies, and in green,
the relative step counts, with darker green for higher frequencies. (For inter-
pretation of the references to colour in this gure legend, the reader is referred
to the web version of this article).
Fig. 14. Statistically reinforced subassembly transition matrix. Total times that
each column subassembly is picked right before the row subassembly. The di-
agonal shows the overall subassembly usage. In orange (middle value) and red
(high value), the hard constraints. (For interpretation of the references to colour
in this gure legend, the reader is referred to the web version of this article).
Fig. 15. Statistically reinforced subassembly precedence matrix. Total times
that each column subassembly is picked at any step before the row subassem-
bly. The diagonal shows the overall subassembly usage. In red, the hard con-
straints. (For interpretation of the references to colour in this gure legend, the
reader is referred to the web version of this article).
Table 2
Comparison of assembly instructions between the initial paper instruction ASP
from the course instructor and the AGS-reinforced ASP recorded in the fth
experiment.
Instructions Relative
subassembly
Order
Instr. AGS
Assemble the boiler and
front cover
{8,9,17} 1 1
Mount the boiler on the
frame
{9,10,11,19} and
{9,11,20,21}
2 4 and 5
Insert the roof into the cabin {7,12} 3 2 or 3
Mount the cabin to the boiler {7,9,18} 4 2 or 3
Mount a wheel on each axle {1,3,13} and
{2,5,15}
5 6
Push the shoulders into the
frame
{1,2,11} 6 Almost
never
Fit the remaining wheels {1,4,14} and
{2,6,16}
7 Almost
never
Push the shoulders into the
frame and t the
remaining wheels
{1,4,11,14} and
{2,6,11,16}
No (already
done in 6 and
7)
7
A. de Giorgio et al.
Journal of Manufacturing Systems 60 (2021) 22–34
33
Namely, a trial and error or mechanical operation (depending on the
operators knowledge) is translated in quality time for learning when the
initial task of simply applying the operations described by the assembly
instructions is replaced by the need for the operators to analyze the as-
sembly state, report it to the AGS and use it to evaluate the best strategy
to operate. A change that highlights both the pedagogical success for
such an application in the context of the manufacturing course that
provides the use case described in this article, and the improved
manufacturing outcome that is foreseen in real industrial environments,
where operators can learn quickly and efciently the assembly opera-
tions while working on a fully functional production line.
The use of statistical RL methods provides insights on the type of
knowledge that articial intelligence systems can accumulate when
interacting with human operators. One is the step frequency diagram,
which shows the best statistical assembly sequence that emerges from
the wisdom of a crowd of operators. Another is the precedence matrix
that can be constructed by applying a conventional threshold to the
statistical values obtained by counting the transitions among the ele-
ments of the liaison matrix. All these mathematical tools can be used as
building blocks for RL algorithms that are meant to drive an online ASP.
Future work should primarily focus on testing the AGS developed
with this research on real assembly lines in industry. Firstly, because the
results obtained in the experimental setup of this article need to be
validated on several different assemblies and, secondly, because of the
eventual limitations coming from the didactical assembly that has been
chosen for this research that does not allow to optimize other industrial
criteria, e.g. assembly execution time. A second line of research could be
directed to understanding the reasons behind an operators choice of
assembly sequence. While RL is generally proven to converge towards
optimal solutions, it does not explain why a solution in ASP might be
optimal. This is against the current line of thought in ASP, but it is in line
with the direction taken by deep reinforcement learning, where unsu-
pervised algorithms solve problems without the necessary human
understandability.
Another focus of future work is to integrate the AGS into the oper-
ators equipment, with technology such as head mounted devices for
augmented reality or any other devices that are operated by multimodal
input such as sensors and cameras, other than the operators speech,
gaze, hand gestures or movements in the assembly station.
While online RL algorithms are good optimization tools, they are
meant to be used for learning from small data, instead of big data. The
latter approach is possible when the source of variation provides plenty
of data. Such is the case for the ofine approach to RL presented in the
literature review of this article. An assembly of customized products
might produce only a limited variation of data, therefore the statistical
approach with online RL algorithms seems coherent with the industrial
requirements but it needs to be veried for its consistency on small as-
sembly data. Can a few attempts to a correct assembly elicit the majority
of issues? This is indeed a potential limitation and an important
perspective to be addressed in future work.
As this article is meant to pave the way to the application of online
statistical RL algorithms to ASP, a major limitation consists in not
applying and testing any specic and well-known RL algorithms to on-
line ASP for benchmarking purposes. This has to be done once it is
assessed the validity of such approach, which remains the main aim of
the research presented in this article and its future work. Since the
statistical convergence aspect of online RL has been assessed as prom-
ising for ASP, it is indeed a needed future work to test standard RL al-
gorithms and list their pros and cons towards a full use of online RL ASP
methods. In particular, strategies about how to give rewards to correct
ASP or how and when to apply the exploitation vs exploration strategy,
for example if the operators knowledge can be evaluated before as-
sembly and a threshold can be based on the result of this assessment.
Declaration of competing interest
The authors report no declarations of interest.
Acknowledgements
The authors wish to thank Mats Bejhem, lecturer for the
manufacturing course Tillverkningsteknik (MG1026) at KTH Royal
Institute of Technology presented as a case study for this paper, for his
availability and excellent contributions to the success of this research. A
special mention to Jan Stamer and Mikael Johansson for their help in
preparing the experimental setups. This research is funded by KTH Royal
Institute of Technology in Stockholm, Sweden.
References
[1] Lu Y, Xu X, Wang L. Smart manufacturing process and system automation a
critical review of the standards and envisioned scenarios. J Manuf Syst 2020;56:
31225. https://doi.org/10.1016/j.jmsy.2020.06.010.
[2] Cohen Y, Faccio M, Galizia FG, Mora C, Pilati F. Assembly system conguration
through Industry 4.0 principles: the expected change in the actual paradigms.
IFAC-PapersOnLine 2017;50:1495863. https://doi.org/10.1016/j.
ifacol.2017.08.2550.
[3] Pellicciari M, Andrisano AO, Leali F, Vergnano A. Engineering method for adaptive
manufacturing systems design. Int J Interact Des Manuf 2009;3:8191. https://doi.
org/10.1007/s12008-009-0065-9.
[4] Wang L, Keshavarzmanesh S, Feng H-Y. Design of adaptive function blocks for
dynamic assembly planning and control. Int J Ind Manuf Syst Eng 2008;27:4551.
https://doi.org/10.1016/J.JMSY.2008.06.003.
[5] Wang L, Keshavarzmanesh S, Feng HY, Buchal RO. Assembly process planning and
its future in collaborative manufacturing: a review. Int J Adv Manuf Technol 2009;
41:13244. https://doi.org/10.1007/s00170-008-1458-9.
[6] Cohen Y, Naseraldin H, Chaudhuri A, Pilati F. Assembly systems in Industry 4.0
era: a road map to understand assembly 4.0. Int J Adv Manuf Technol 2019;105:
403754. https://doi.org/10.1007/s00170-019-04203-1.
[7] Onori M, Neves P, Akillioglu H, Maffei A, Hofmann A, Siltala N. Dealing with the
unpredictable: an evolvable robotic assembly cell. Enabling manuf. compet. econ.
sustain. Berlin Heidelberg: Springer; 2012. p. 1605. https://doi.org/10.1007/978-
3-642-23860-4_26.
[8] Maffei A, Onori M, Neves P, Barata J. Evolvable production systems: mechatronic
production equipment with evolutionary control. IFIP Adv Inf Commun Technol
2010;314:13342. https://doi.org/10.1007/978-3-642-11628-5_14.
[9] Schmitt R, Permin E, Kerkhoff J, Plutz M, B¨
ockmann MG. Enhancing resiliency in
production facilities through cyber physical systems. Cham: Springer; 2017.
p. 287313. https://doi.org/10.1007/978-3-319-42559-7_11.
[10] Alharbi O. Industry 4.0 operators: core knowledge and skills. Adv Sci Technol Eng
Syst 2020;5:17783. https://doi.org/10.25046/aj050421.
[11] Kaasinen E, Schmalfuß F, ¨
Ozturk C, Aromaa S, Boubekeur M, Heilala J, et al.
Empowering and engaging industrial workers with operator 4.0 solutions. Comput
Ind Eng 2020;139:105678. https://doi.org/10.1016/j.cie.2019.01.052.
[12] Alc´
acer V, Cruz-Machado V. Scanning the industry 4.0: a literature review on
technologies for manufacturing systems. Eng Sci Technol Int J 2019;22:899919.
https://doi.org/10.1016/j.jestch.2019.01.006.
[13] Sutton RS, Barto AG. Reinforcement learning: an introduction. MIT press; 2018.
[14] Kuhnle A, Kaiser JP, Theiß F, Stricker N, Lanza G. Designing an adaptive
production control system using reinforcement learning. J Intell Manuf 2020:122.
https://doi.org/10.1007/s10845-020-01612-y.
[15] Hubbs CD, Li C, Sahinidis NV, Grossmann IE, Wassick JM. A deep reinforcement
learning approach for chemical production scheduling. Comput Chem Eng 2020;
141:106982. https://doi.org/10.1016/j.compchemeng.2020.106982.
[16] Kim YG, Lee S, Son J, Bae H, Do Chung B. Multi-agent system and reinforcement
learning approach for distributed intelligence in a exible smart manufacturing
system. Int J Ind Manuf Syst Eng 2020;57:44050. https://doi.org/10.1016/j.
jmsy.2020.11.004.
[17] Su Y, Mao H, Tang X. Algorithms for solving assembly sequence planning
problems. Neural Comput Appl 2020:110. https://doi.org/10.1007/s00521-020-
05048-6.
[18] Shoval S, Efatmaneshnik M, Ryan MJ. Assembly sequence planning for processes
with heterogeneous reliabilities. Int J Prod Res 2017;55:280628. https://doi.org/
10.1080/00207543.2016.1213449.
[19] Chen RS, Lu KY, Tai PH. Optimizing assembly planning through a three-stage
integrated approach. Int J Prod Econ 2004;88:24356. https://doi.org/10.1016/
S0925-5273(03)00187-7.
[20] Wang Y, Tian D. A weighted assembly precedence graph for assembly sequence
planning. Int J Adv Manuf Technol 2016;83:99115. https://doi.org/10.1007/
s00170-015-7565-5.
[21] Marton F, Trigwell K. Variatio est mater studiorum. High Educ Res Dev 2000;19:
38195. https://doi.org/10.1080/07294360020021455.
[22] Gualtieri L, Palomba I, Merati FA, Rauch E, Vidoni R. Design of human-centered
collaborative assembly workstations for the improvement of operatorsphysical
A. de Giorgio et al.
Journal of Manufacturing Systems 60 (2021) 22–34
34
ergonomics and production efciency: a case study. Sustainability 2020;12:3606.
https://doi.org/10.3390/su12093606.
[23] Parmentier DD, Van Acker BB, Detand J, Saldien J. Design for assembly meaning: a
framework for designers to design products that support operator cognition during
the assembly process. Cogn Technol Work 2020;22:61532. https://doi.org/
10.1007/s10111-019-00588-x.
[24] Qi Q, Tao F, Hu T, Anwer N, Liu A, Wei Y, et al. Enabling technologies and tools for
digital twin. J Manuf Syst 2019. https://doi.org/10.1016/j.jmsy.2019.10.001.
[25] Ulmer J, Braun S, Cheng CT, Dowey S, Wollert J. Human-centered gamication
framework for manufacturing systems. In: Procedia CIRP; 2020. p. 6705. https://
doi.org/10.1016/j.procir.2020.04.076.
[26] Claeys A, Hoedt S, Van Landeghem H, Cottyn J. Generic model for managing
context-aware assembly instructions. IFAC-PapersOnLine 2016;49:11816.
https://doi.org/10.1016/J.IFACOL.2016.07.666.
[27] Moussa M, ElMaraghy H. Master assembly network for alternative assembly
sequences. J Manuf Syst 2019;51:1728. https://doi.org/10.1016/j.
jmsy.2019.02.001.
[28] Zhang N, Liu Z, Qiu C, Tan J. A novel assembly sequence design mechanism for
assembly sequence planning. 2020 IEEE 7th Int. Conf. Ind. Eng. Appl. ICIEA 2020,
Institute of Electrical and Electronics Engineers Inc. 2020:104. https://doi.org/
10.1109/ICIEA49774.2020.9102101.
[29] Su Q. Computer aided geometric feasible assembly sequence planning and
optimizing. Int J Adv Manuf Technol 2007;33:4857. https://doi.org/10.1007/
s00170-006-0447-0.
[30] Trigui M, BenHadj R, Aifaoui N. An interoperability CAD assembly sequence plan
approach. Int J Adv Manuf Technol 2015;79:146576. https://doi.org/10.1007/
s00170-015-6855-2.
[31] Ben Hadj R, Trigui M, Aifaoui N. Toward an integrated CAD assembly sequence
planning solution. Proc Inst Mech Eng Part C J Mech Eng Sci 2015;229:29873001.
https://doi.org/10.1177/0954406214564412.
[32] Li Z, Wang J, Anwar MS, Zheng Z. An efcient method for generating assembly
precedence constraints on 3D models based on a block sequence structure. CAD
Comput Aided Des 2020;118:102773. https://doi.org/10.1016/j.
cad.2019.102773.
[33] Rashid MFF, Hutabarat W, Tiwari A. A review on assembly sequence planning and
assembly line balancing optimisation using soft computing approaches. Int J Adv
Manuf Technol 2012;59:33549. https://doi.org/10.1007/s00170-011-3499-8.
[34] Lowe G, Shirinzadeh B. Dynamic assembly sequence selection using reinforcement
learning. Proc. - IEEE Int. Conf. Robot. Autom., Institute of Electrical and
Electronics Engineers Inc. 2004:26338. https://doi.org/10.1109/
robot.2004.1307458.
[35] Zhao M, Guo X, Zhang X, Fang Y, Ou Y. ASPW-DRL: assembly sequence planning
for workpieces via a deep reinforcement learning approach. Assem Autom 2019;
40:6575. https://doi.org/10.1108/AA-11-2018-0211.
[36] Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: a survey. J Artif
Intell Res 1996;4:23785. https://doi.org/10.1613/jair.301.
[37] Watanabe K, Inada S. Search algorithm of the assembly sequence of products by
using past learning results. Int J Prod Econ 2020;226:107615. https://doi.org/
10.1016/j.ijpe.2020.107615.
[38] Yu T, Huang J, Chang Q. Mastering the working sequence in human-robot
collaborative assembly based on reinforcement learning. IEEE Access 2020;8:
16386877. https://doi.org/10.1109/access.2020.3021904.
[39] Martinez D, Alenya G, Jimenez P, Torras C, Rossmann J, Wantia N, et al. Active
learning of manipulation sequences. Proc. - IEEE Int. Conf. Robot. Autom., Institute
of Electrical and Electronics Engineers Inc. 2014:56718. https://doi.org/
10.1109/ICRA.2014.6907693.
[40] Xing Y, Chen G, Lai X, Jin S, Zhou J. Assembly sequence planning of automobile
body components based on liaison graph. Assem Autom 2007;27:15764. https://
doi.org/10.1108/01445150710733423.
[41] Lai HY, Huang CT. A systematic approach for automatic assembly sequence plan
generation. Int J Adv Manuf Technol 2004;24:75263. https://doi.org/10.1007/
s00170-003-1760-5.
[42] Raju Bahubalendruni MVA, Biswal BB. Computer aid for automatic liaisons
extraction from cad based robotic assembly. 2014 IEEE 8th Int. Conf. Intell. Syst.
Control Green Challenges Smart Solut. ISCO 2014 - Proc., Institute of Electrical and
Electronics Engineers Inc. 2014:425. https://doi.org/10.1109/
ISCO.2014.7103915.
[43] Lohse N, Hirani H, Ratchev S. Equipment ontology for modular recongurable
assembly systems. Int J Flex Manuf Syst 2005;17:30114. https://doi.org/
10.1007/s10696-006-9030-0.
[44] Khabbazi MR, Wikander J, Onori M, Maffei A. Object-oriented design of product
assembly feature data requirements in advanced assembly planning. Assem Autom
2018;38:97112. https://doi.org/10.1108/AA-07-2016-084.
[45] Krathwohl DR. A revision of blooms taxonomy: an overview. Theory Pract 2002;
41:2128. https://doi.org/10.1207/s15430421tip4104_2.
A. de Giorgio et al.
... In today's literature, several studies have proposed PdM frameworks using machine learning (ML) techniques [17]. While both supervised and unsupervised learning techniques have already been widely used in the manufacturing industry, accounting for 90-95% of all applications according to [18], especially for PdM [19], reinforcement learning (RL) has been much less studied [20]. This is unfortunate because RL provides many interesting features [21], such as learning by interacting with the environment, measuring the utility of actions that yield long-term benefits, optimizing complex sequential decisions under uncertainty, adding that RL takes advantage of the multi-agent approach, which allows for multi-objective optimization [22]. ...
Article
Full-text available
In the context of Industry 4.0, companies understand the advantages of performing Predictive Maintenance (PdM). However, when moving towards PdM, several considerations must be carefully examined. First, they need to have a sufficient number of production machines and relative fault data to generate maintenance predictions. Second, they need to adopt the right maintenance approach, which, ideally, should self-adapt to the machinery, priorities of the organization, technician skills, but also to be able to deal with uncertainty. Reinforcement learning (RL) is envisioned as a key technique in this regard due to its inherent ability to learn by interacting through trials and errors, but very few RL-based maintenance frameworks have been proposed so far in the literature, or are limited in several respects. This paper proposes a new multi-agent approach that learns a maintenance policy performed by technicians, under the uncertainty of multiple machine failures. This approach comprises RL agents that partially observe the state of each machine to coordinate the decision-making in maintenance scheduling, resulting in the dynamic assignment of maintenance tasks to technicians (with different skills) over a set of machines. Experimental evaluation shows that our RL-based maintenance policy outperforms traditional maintenance policies (incl., corrective and preventive ones) in terms of failure prevention and downtime, improving by ≈75% the overall performance.
... Afterwards, a digital twin of an actual assembly line enabled the firm to measure performances and analyze them in real-time [24]. The main trend emerging in most recent years from literature about DT in assembly environment is the use of reinforcement learning, especially in case of assembly line planning and line and sequence optimization [25,26]. The two sub-topics, E7.4 and E7.5, often works closely together in realizing vertical and horizontal integration in assembly workplace, optimizing performances and verifying in advance the quality of proposed changes in the plant, for example regarding logistics [27]. ...
Conference Paper
Full-text available
The fourth industrial revolution is based on a few technological advancements that promise an industrial transformation based on achieving sharing and circular economies. Selecting and applying these advancements correctly, i.e., following relevant value drivers, is a key to the success of manufacturing firms. This results in an increasing body of knowledge from academy and practitioners in the domain of the adoption of digital technology in industry. Given the breadth of the topic, the literature deals with both a vast amount of promising technologies and related existing and prospect industrial application. This work focuses on the contributions in the production sub-domain of assembly systems and technology. In detail, relevant high-impact scientific and engineering works have been identified and analyzed with the purpose of highlighting the innovation patterns in term of the prominent technological advancement (push) and related application (pull). The results of the present study show that the most relevant areas of research are: (1) the Industrial Internet of Things, (2) Augmented and Virtual Reality as assistance to the assembly and applied to the training of operators, and (3) the horizontal and vertical system integration through Digital Twins (DT) and Cyber Physical Systems (CPS). The prominent value drivers are the improvement of resources and processes as well as asset utilization and labor. Moreover, this work represents a first step towards a unitary framework to synchronize different research efforts in the domain of assembly and support the envisaged green industrial transformation.
... Certain technologies, such as robots, digital twins, blockchain and additive manufacturing have received attention (Chen and Cao 2020;Ivanov 2021d;Ivanov and Dolgui 2021b;Shen et al. 2021;Singh et al. 2020b), while others have been mostly neglected (machine learning, artificial intelligence and big data analytics). The introduction of specific digital technologies, such as machine learning algorithms and augmented reality systems, can make production lines adaptive to changes, resilient in correcting errors and attentive to the operators' skills that need upgrading (Baroroh et al. 2020;de Giorgio et al. 2021). ...
Article
Full-text available
The COVID-19 pandemic has affected manufacturing companies and necessitated adaptations of firms’ operations. Despite the increasing interest in this subject, a scarcity of systematic analysis can be observed. The present study systematically reviews the existing research on the COVID-19 pandemic concerning the manufacturing industry. This paper aims to highlight the main impacts of the COVID-19 pandemic on the manufacturing sector from the operations management perspective, the practical adaptation actions, and future research opportunities. Open research questions and directions for further investigation are articulated and triangulated across organisational, process and technology perspectives.
... To compete successfully in the global market, in recent years, factories have turned their attention to optimize all tasks, including the ones performed by humans, leveraging the progress in information technology with the deployment of artificial intelligence [1,2] and machine learning [3] in various application areas throughout the product life cycle. Industry 4.0 [4] is the coined term used to describe this optimization involving interconnection and collaboration among the factory's interactants (human and synthetic) towards a human-automation symbiosis. ...
Article
Full-text available
Manual work accounts for one of the largest workgroups in the European manufacturing sector, and improving the training capacity, quality, and speed brings significant competitive benefits to companies. In this context, this paper presents an informed tree search on top of a Markov chain that suggests possible next assembly steps as a key component of an innovative assembly training station for manual operations. The goal of the next step suggestions is to provide support to inexperienced workers or to assist experienced workers by providing choices for the next assembly step in an automated manner without the involvement of a human trainer on site. Data stemming from 179 experiment participants, 111 factory workers, and 68 students, were used to evaluate different prediction methods. From our analysis, Markov chains fail in new scenarios and, therefore, by using an informed tree search to predict the possible next assembly step in such situations, the prediction capability of the hybrid algorithm increases significantly while providing robust solutions to unseen scenarios. The proposed method proved to be the most efficient for next assembly step prediction among all the evaluated predictors and, thus, the most suitable method for an adaptive assembly support system such as for manual operations in industry.
... The locomotive consists of 21 components (see Fig. 2), including the necessary screws. The locomotive is an interesting benchmarking product for the expert video aid because it has been used for previous studies in assembly [33,34]. ...
Article
Full-text available
Since the introduction of the concept of learning curves in manufacturing, many articles have been applying the model to study learning phenomena. In assembly, several studies present a learning curve when an operator is trained over a new assembly task; however, when comparisons are made between learning curves corresponding to different training methods, unaware researchers can show misleading results. Often, these studies neglect either or both the stochastic nature of the learning curves produced by several operators under experimental conditions, and the high correlation of the experimental samples collected from each operator that constitute one learning curve. Furthermore, recent studies are testing newer technologies, such as assembly animations or augmented reality, to provide assembly aid, but they fail to observe deeper implications on how these digital training methods truly influence the learning curves of the operators. This article proposes a novel statistical study of the influence of expert video aid on the learning curves in terms of assembly time by means of functional analysis of variance (FANOVA). This method is better suited to compare learning curves than common analysis of variance (ANOVA), due to correlated data, or graphical comparisons, due to the stochastic nature of the aggregated learning curves. The results show that two main effects of the expert video aid influence the learning curves: one in the transient and another in the steady state of the learning curve. The transient effect of the expert video aid, where the statistical tests suffer from a high variance in the data, appears to be a reduction in terms of assembly time for the first assemblies: the operators seem to benefit from the expert video aid. As soon as the steady state is reached, a slower and statistically significant effect appears to favor the learning processes of the operators who do not receive any training aid. Since the steady state of the learning curves represents the long term production efficiency of the operators, the latter effect might require more attention from industry and researchers.
... The experimental study is mediated by two technologies. The first one is an assembly guidance system (AGS) that is in charge of eliciting and transferring the DK components while producing an assembly sequence plan (de Giorgio et al. 2021). The second one is a smart recording/visualisation device, connected to the AGS, which is in charge of transferring the PK components of assembly knowledge through automatically authored videos. ...
Article
Full-text available
Can automatically authored videos of industrial operators help other operators to learn procedural tasks? This question is relevant to the advent of the industrial internet of things (IIoT) and Industry 4.0, where smart machines can help human operators rather than replacing them in order to benefit from the best of humans and machines. This study considers an industrial ecosystem where procedural knowledge (PK) is quickly and effectively transferred from one operator to another. Assembly tasks are procedural in nature and present a certain complexity that still does not allow machines and their sensors to capture all the details of the operations. Especially if the assembly operation is adap-tive and not fixed in terms of assembly sequence plan. In order to help the operators, videos of other operators executing the complex procedural tasks can be automatically recorded and authored from machines. This study shows by means of statistical design and analysis of experiments that expert aid can reduce the assembly time of an untrained operator, whereas automatically authored video aids can transfer PK but producing an opposite effect on the assembly time. Therefore, hybrid training methods are still necessary and trade-offs have to be considered. Managerial insights from the results suggest an unneglectable impact of the choice to digitise industrial operations too early. The experimental studies presented can act as guidelines for the correct statistical testing of innovative solutions in industry.
... It is evident that although these methods approach a high-fidelity model of human movement step by step, there are still many difficulties in achieving an accurate prediction of human intent. Reinforcement learning has been applied to assembly sequence planning, however, Giorgio et al. proposed that today's assembly lines need to be adaptive to changes, resilient to errors and attentive to the operators' skills and needs [55]. Therefore, this paper takes a new perspective by using an intelligent body similar to human learning behavior to guide humans in performing tasks, avoiding the problems caused by instability of the human movement model and collaborating safely with a defined human operational behavioral strategy. ...
Article
The assembly process of high precision products involves a variety of delicate operations that are time-consuming and energy-intensive. Neither the human operators nor the robots can complete the tasks independently and efficiently. The human-robot collaboration to be applied in complex assembly operation would help reduce human workload and improve efficiency. However, human behavior can be unpredictable in assembly activities so that it is difficult for the robots to understand intentions of the human operations. Thus, the collaboration of humans and robots is challenging in industrial applications. In this regard, a human-robot collaborative reinforcement learning algorithm is proposed to optimize the task sequence allocation scheme in assembly processes. Finally, the effectiveness of the method is verified through experimental analysis of the virtual assembly of an alternator. The result shows that the proposed method had great potential in dynamic division of human-robot collaborative tasks.
Article
Full-text available
Autonomous Grinding (AG) is a unique kind of manufacturing process which needs extra operative care, for both robot and environment. Due to the rapid growth of research on AG, a detailed analysis on autonomous grinding and the comparative analysis with other machining processes are gathered and listed (like calibration, contact state estimation) based on operating algorithms. This paper has come to the conclusion that there are still some spaces for certain techniques and planning for future growth in the domain of industrial grinding areas such as green manufacturing (GM) methodology for better safety of the environment and better planning for waste material disposal techniques which were not reviewed before in the autonomous grinding machining sectors. At the same time, because of the motion of the rotating motors, the effects of vibration are investigated carefully and how to minimize the various errors with the help of intelligent monitoring with SMART sensors are also observed. Finally, this standpoint survey provides a rigorous outlook of the various algorithms used in the last several years in autonomous grinding manufacturing which will help the Robot industry like factory 5.0 to adopt GM practices.
Thesis
Full-text available
Industrial processes are mainly based on procedural knowledge that must be continually elicited from experienced operators and learned by novice operators. In the context of Industry 4.0, machines already play a key role in knowledge transfer; however, new models and methods based on the artificial intelligence advances of the past few years need to be developed and applied. The future of human-machine collaboration is not limited to physical applications, but it has the potential to harness both the strength of human skills, experience and the computational power provided by the surrounding machines for truly adaptive industrial processes. The winning recipe is a balance between letting humans exploit their inherent experience and letting machines integrate the missing skills to preserve production standards. This work introduces a procedural knowledge model to be used for the design of industrial and scientific adaptive processes and it paves the way to transforming human-machine collaboration into an efficient solution to make industrial and scientific processes resilient to a constantly changing world.
Article
This paper proposes an integrated job shop scheduling and assembly sequence planning (IJSSASP) approach for discrete manufacturing, enabling the part processing sequence and assembly sequence to be optimized simultaneously. The optimization objectives are to minimize the total production completion time and the total inventory time of parts during production. The interaction effects between the job shop schedule and the assembly sequence plan in discrete manufacturing are analyzed, and the mathematical models including the objective functions and the constraints are established for IJSSASP. Based on the above, a non-dominated sorting genetic algorithm-II (NSGA-Ⅱ) with a hybrid chromosome coding mechanism is applied to solve the IJSSASP problem. Through the case studies and comparison tests for different scale problems, the proposed IJSSASP approach is verified to be able to improve the production efficiency and save the manufacturing cost of the discrete manufacturing enterprise more effectively.
Article
Full-text available
A long-standing goal of the Human-Robot Collaboration (HRC) in manufacturing systems is to increase the collaborative working efficiency. In line with the trend of Industry 4.0 to build up the smart manufacturing system, the collaborative robot in the HRC system deserves better designing to be more self-organized and to find the superhuman proficiency by self-learning. Inspired by the impressive machine learning algorithms developed by Google Deep Mind like Alphago Zero, in this paper, the human-robot collaborative assembly working process is formatted into a chessboard and the selection of moves in the chessboard is used to analogize the decision-making by both human and robot in the HRC assembly working process. To obtain the optimal policy of the working sequence to maximize the working efficiency, agents in the system are trained with a self-play algorithm based on reinforcement learning, without guidance or domain knowledge beyond game rules. A convolution neural network (CNN) is also trained to predict the distribution of the priority of move selections and whether a working sequence is the one resulting in the maximum of the HRC efficiency. A height-adjustable standing desk assembly is used to demonstrate the proposed HRC assembly algorithm and its efficiency in real-time task planning.
Conference Paper
Full-text available
While bringing new opportunities, the Industry 4.0 movement also imposes new challenges to the manufacturing industry and all its stakeholders. In this competitive environment, a skilled and engaged workforce is a key to success. Gamification can generate valuable feedbacks for improving employees’ engagement and performance. Currently, Gamification in workspaces focuses on computer-based assignments and training, while tasks that require manual labor are rarely considered. This research provides an overview of Enterprise Gamification approaches and evaluates the challenges. Based on that, a skill-based Gamification framework for manual tasks is proposed, and a case study in the Industry 4.0 model factory is shown.
Article
Full-text available
Modern production systems face enormous challenges due to rising customer requirements resulting in complex production systems. The operational efficiency in the competitive industry is ensured by an adequate production control system that manages all operations in order to optimize key performance indicators. Currently, control systems are mostly based on static and model-based heuristics, requiring significant human domain knowledge and, hence, do not match the dynamic environment of manufacturing companies. Data-driven reinforcement learning (RL) showed compelling results in applications such as board and computer games as well as first production applications. This paper addresses the design of RL to create an adaptive production control system by the real-world example of order dispatching in a complex job shop. As RL algorithms are “black box” approaches, they inherently prohibit a comprehensive understanding. Furthermore, the experience with advanced RL algorithms is still limited to single successful applications, which limits the transferability of results. In this paper, we examine the performance of the state, action, and reward function RL design. When analyzing the results, we identify robust RL designs. This makes RL an advantageous control system for highly dynamic and complex production systems, mainly when domain knowledge is limited.
Article
Full-text available
Smart manufacturing is arriving. It promises a future of mass-producing highly personalized products via responsive autonomous manufacturing operations at a competitive cost. Of utmost importance, smart manufacturing requires end-to-end integration of intra-business and inter-business manufacturing processes and systems. Such end-to-end integration relies on standards-compliant and interoperable interfaces between different manufacturing stages and systems. In this paper, we present a comprehensive review of the current landscape of manufacturing automation standards, with a focus on end-to-end integrated manufacturing processes and systems towards mass personalization and responsive factory automation. First, we present an authentic vision of smart manufacturing and the unique needs for next-generation manufacturing automation. A comprehensive review of existing standards for enabling manufacturing process automation and manufacturing system automation is presented. Subsequently, focusing on meeting changing demands of efficient production of highly personalized products, we detail several future-proofing manufacturing automation scenarios via integrating various existing standards. We believe that existing automation standards have provided a solid foundation for developing smart manufacturing solutions. Faster, broader and deeper implementation of smart manufacturing automation can be anticipated via the dissemination, adoption, and improvement of relevant standards in a need-driven approach.
Article
Full-text available
This work examines applying deep reinforcement learning to a chemical production scheduling process to account for uncertainty and achieve online, dynamic scheduling, and benchmarks the results with a mixed-integer linear programming (MILP) model that schedules each time interval on a receding horizon basis. An industrial example is used as a case study for comparing the differing approaches. Results show that the reinforcement learning method outperforms the naive MILP approaches and is competitive with a shrinking horizon MILP approach in terms of profitability, inventory levels, and customer service. The speed and flexibility of the reinforcement learning system is promising for achieving real-time optimization of a scheduling system, but there is reason to pursue integration of data-driven deep reinforcement learning methods and model-based mathematical optimization approaches.
Article
Full-text available
Assembly sequence planning is one of the key issues in DFA and computer-aided assembly process planning research for concurrent engineering. The purpose of this paper is to solve the problem of insufficient individual intelligence in evolutionary algorithms for assembly sequence planning, and a evolutionary algorithm for assembly sequence planning is designed. In this paper, the particle swarm optimization (PSO) algorithm is used to optimize the hybrid assembly sequence planning and assembly line balance problems. According to the assembly sequence problem, the number of assembly tool changes and the number of assembly orientation changes are transformed into the operation time of the assembly line. At the same time, the transportation of heavy parts in the assembly balance problem is considered. Then, by extracting the connection relationship and information of the parts, the disassembly method is used to inversely obtain the disassembly support matrix, and then, it is used to obtain the priority relationship diagram of the assembly operation tasks that indicate the order constraints of the job tasks on the assembly line. Aiming at the shortcoming that particle swarm optimization algorithm is easy to fall into local optimum, a various population strategy is adopted to shorten the evolution stagnation time, improve the evolution efficiency of particle swarm optimization algorithm, and enhance the optimization ability of the algorithm. Combined with the three evaluation indicators of assembly geometric feasibility, assembly process continuity, and assembly tool change times, a fitness function is constructed to achieve multi-objective optimization. Finally, experiments show that the multi-agent evolutionary algorithm is incorporated into the planning process to obtain an accurate solution through the various population strategy–particle swarm optimization algorithm, which proves the feasibility of the compound algorithm and has better performance in solving assembly sequence planning problems.
Article
Personalized production has emerged as a result of the increasing customer demand for more personalized products. Personalized production systems carry a greater amount of uncertainty and variability when compared with traditional manufacturing systems. In this paper, we present a smart manufacturing system using a multi-agent system and reinforcement learning, which is characterized by machines with intelligent agents to enable a system to have autonomy of decision making, sociability to interact with other systems, and intelligence to learn dynamically changing environments. In the proposed system, machines with intelligent agents evaluate the priorities of jobs and distribute them through negotiation. In addition, we propose methods for machines with intelligent agents to learn to make better decisions. The performance of the proposed system and the dispatching rule is demonstrated by comparing the results of the scheduling problem with early completion, productivity, and delay. The obtained results show that the manufacturing system with distributed artificial intelligence is competitive in a dynamic environment.