ArticlePDF Available

Abstract and Figures

In this paper, a novel lightweight incremental class learning algorithm for live image recognition is presented. It features a dual memory architecture and is capable of learning formerly unknown classes as well as conducting its learning across multiple instances at multiple locations without storing any images. In addition to tests on the ImageNet dataset, a prototype based upon a Raspberry Pi and a webcam is used for further evaluation: The proposed algorithm successfully allows for the performant execution of image classification tasks while learning new classes at several sites simultaneously, thereby enabling its application to various industry use cases, e.g. predictive maintenance or self-optimization.
Content may be subject to copyright.
Available online at www.sciencedirect.com
ScienceDirect
Procedia CIRP 93 (2020) 437–442
www.elsevier.com/locate/procedia
2212-8271 © 2019 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the scientific committee of the 53rd CIRP Conference on Manufacturing Systems
10.1016/j.procir.2020.03.056
53rd CIRP Conference on Manufacturing Systems
Distributed Cooperative Deep Transfer Learning for Industrial
Image Recognition
Benjamin Maschler
a*
, Simon Kamm
a
, Nasser Jazdi
a
, Michael Weyrich
a
a
Institut of Industrial Automation and Software Engineering, University of Stuttgart, Pfaffenwaldring 47, 70569 Stuttgart, Germany
* Corresponding author. Tel.: +49 711 685 67295; Fax: +49 711 685 67302. E-mail address: benjamin.maschler@ias.uni-stuttgart.de
Abstract
In this paper, a novel light-weight incremental class learning algorithm for live image recognition is presented. It features a dual memory
architecture and is capable of learning formerly unknown classes as well as conducting its learning across multiple instances at multiple locations
without storing any images. In addition to tests on the ImageNet dataset, a prototype based upon a Raspberry Pi and a webcam is used for further
evaluation: The proposed algorithm successfully allows for the performant execution of image classification tasks while learning new classes at
several sites simultaneously, thereby enabling its application to various industry use cases, e.g. predictive maintenance or self-optimization.
© 2019 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the scientific committee of the 53rd CIRP Conference on Manufacturing Systems
Keywords: Artificial Intelligence; Artificial Neural Networks; Continual Learning; Deep Learning; Distributed Learning; Dual Memory Method; Incremental
Class Learning; Live Image Recognition; Transfer Learning
1. Introduction
The current trends in the manufacturing industries towards
smaller, more customer-specific lots produced by
interconnected, highly reconfigurable manufacturing systems
just in time lead to a significant increase in these systems’
complexity [1, 2, 3]. However, with a widespread availability
of high-quality, high-resolution data [4] handling this
complexity by artificial intelligence (AI) methods becomes
feasible [3]. Yet, many of the approaches suggested in the last
decade still require a great level of often manual adaption to the
specific system’s environment [5, 6, 7].
Deep learning based approaches offer a solution to this
problem of automatic generalization [7, 8, 9], but require large
amounts of training data [10, 11], which might not be available
due to privacy or industrial espionage concerns, technical or
legal reasons, and great computing power to conduct this
training [3, 12]. These obstacles might be overcome by light-
weight transfer learning approaches optimized for edge devices
which enable the performant of deep learning algorithms on
dispersed datasets during runtime without the so-called
catastrophic forgetting [13, 14]. These approaches would then
facilitate decentral AI solutions on the industrial shop floor
which currently require machine learning on a central data lake,
e.g. in predictive quality control [15], automatic control [5] or
predictive maintenance [6]. However, as research in this field
is still at the very beginning, the use-case considered here is of
a more abstract, basic form, i.e. an image recognition task based
on the ImageNet-dataset or a live camera-feed.
Objectives: In this paper, the challenge of cooperative
distributed machine learning for industrial automation is
derived (see Sec. 2.1) and literature regarding its solution in the
field of image recognition surveyed (see Sec. 2.2). Thereon, a
suitable approach based upon complementary learning systems
(CLS) is selected and a specific implementation with a focus on
low computing and storage requirements created (see Sec. 3).
This implementation is then evaluated on a demanding
standardized dataset and compared with published results (see
Sec. 4). Finally, a prototype system is presented (see Sec. 5)
before a conclusion with regards to the applicability of the
approach in industrial use cases is drawn (see Sec. 6).
438 Benjamin Maschler et al. / Procedia CIRP 93 (2020) 437-442
2. Related work
2.1 Transfer learning’s challenges
‘Natural’ intelligence in humans and animals allows them to
transfer knowledge from known problems and their solution
towards unknown ones or to adapt previously learned lessons
based upon new information. This is commonly called
‘continual’ learning [13].
In deep learning based AI, such transfer or adaption is not easily
achieved, as new information tends to simply overwrite old
information without gaining a considerable advantage
compared to learning with new information from scratch. This
displacement process is called catastrophic forgetting [16]
and poses a significant obstacle for the construction of machine
learning algorithms that are:
multi-purpose, meaning the capability to successfully
solve several different tasks learned sequentially,
multi-location, meaning the capability to successfully
train and infer on different locations parallel, or even
just
highly adaptive, meaning the capability to successfully
adapt to changing inputs without major re-training,
Therefore, as in natural intelligence, the so-called ‘stability-
plasticity dilemma’ needs to be overcome. It refers to the
opposing aims of having an algorithm stable enough to keep
connections once learned and flexible enough to learn new
connections once encountered. Accordingly, this form of
machine learning is also termed ‘continual’ or sometimes
‘transfer’ learning.
There exists a large variety of different approaches to
continual learning, surveyed e.g. as shown in [13] or [14].
However, most of those still dwell in the area of basic research,
many only allowing for one or two of the three dimensions of
transfer mentioned above.
2.2 Dual-memory method for image recognition
A promising approach to mitigate the stability-plasticity
dilemma while allowing for all three dimensions of transfer is
the dual-memory method [13, 17, 18]. It consists of two separate
AI modules. A slow-learning module (inspired by the neocortex
in a mammalian brain) and a fast-learning module (inspired by
the hippocampus in a mammalian brain). The slow-learning
module (module A) is used to extract general information from
the input data and generalise the information while training.
The fast-learning module (module B) remembers specific
information of the input data and saves new memories. Fig. 1
depicts a possible setup for image recognition.
Extraction of general information from input data, i.e.
feature extraction, is nowadays commonly carried out by deep
neural networks (DNN). Depending on the application,
different DNNs can be used for this purpose. Regarding the
scenario presented in this paper, the following requirements
need to be met: On the one hand, the algorithm for module A
should extract good and relevant features from the input data.
On the other hand, the DNNs should also run in real-time
applications on (mobile) edge devices, e.g. on a Raspberry Pi.
Therefore, the memory consumption and computational
complexity of the DNN is a relevant criterion. Different feature
extraction algorithms based upon a DNN architecture are
compared based on their memory consumption, the number of
Giga-FLOPs and their classification performance on the
ImageNet-dataset in Table 1. Although those algorithms are
usually full classification algorithms, only their feature
extraction components are used further on. Therefore, they are
referred to as ‘feature extraction algorithms’.
Because of the focus on memory and computing power
restricted devices, MobileNet-V2 is chosen for module A: It was
designed with a focus on low memory consumption and
operations per input data and still delivers acceptable
classification – i.e. feature extraction – results [23].
However, while module A is just one of many possible
DNN-based feature extractors, the requirements for module B
are more demanding. In order to allow transfer learning as well
as solve the use case as described above, an appropriate
algorithm must
be trainable on a data stream, where samples of the
different classes appear randomly in time and order
be able to classify already seen and trained classes at all
times
show a limited growth of computational complexity and
memory consumption as the number of known classes
grows and
not need to store any training data.
While iCaRL [24] and FuzzyARTMAP [25, 26] fulfil the
first three criteria, only the latter fulfils all four. Therefore,
FuzzyARTMAP was chosen as a foundation for module B. Its
Table 1: Comparison of feature extraction algorithms
DNN-Architecture No. of Parameters
x10
-6
Memory Consumption No. of FLOPs
x10
-9
Top-1 Classification
Error
Top-5 Classification
Error
AlexNet [
1
9
]
60
0.7
36.7 %
15.3 %
VGG
-
16 [
20
]
138
16
25.6 %
8.1 %
VGG
-
19 [
20
]
144
20
25.5 %
8.0 %
ResNet
-
50 [
21
]
25.6
4
20.7 %
5.3 %
ResNet
-
101 [
22
]
44.5
8
19.9 %
4.6 %
Inception
-
V3 [
22
]
24
96 MB
4.8
21.6 %
5.6 %
MobileNet
-
V2 [
23
]
3.5
14 MB
0.3
28 %
9 %
Figure 1: Dual Memory Method for Image Recognition
Module A:
Feature
Extraction
Module B:
Classification
Dual Memory Method
Sensoriy
Input
Features
Classficiation
Results
Benjamin Maschler et al. / Procedia CIRP 93 (2020) 437-442 439
architecture allows it to solve the stability-plasticity-dilemma
[25] by adding new knowledge without changing already
trained information. The classification is then based on a
comparison between the input data and the representations of
known classes.
3. Methodology
As outlined in Sec. 2.2, we propose an architecture
combining two different algorithms:
Module A uses the pre-trained MobileNet-V2 algorithm
from Tensorflow 2.0 with a fix learning rate
 0. However,
the last fully connected layers of MobileNet-V2 are removed in
order to just extract features, which are then relayed to module
B.
Module B is based on the FuzzyARTMAP algorithm, which
is complemented by an updater that allows it to recognize
completely new classes, too. Thereby, module B serves as an
incremental, fast learning classifier.
The resulting classification process by this distributed
incremental class learning algorithm (DICLA) based upon [27]
is depicted in Fig. 2:
1. DICLA accepts input in form of images. The pre-trained
feature extraction algorithm from module A reduces the
image to a feature vector.
2. This feature vector is then relayed to module B where it is
compared against a set of stored feature vectors called
representations, which are mapped to the set of known
classes: Each stored representation represents exactly one
class while each class can be represented by one to many
representations.
3. The updater now evaluates the result of the comparison
process:
A. If the similarity between the input feature vector and
one of the representations passes a threshold value ρ
then the input picture is classified by module B as
belonging to the class associated with that
representation.
B. However, if this threshold is not passed, then the input
image is considered to belong either to a class
previously unknown or to add new information to a
previously known class.
i. In the former case, a new class is created and the
input feature vector used as a representation of this
class.
ii. In the latter case, the input feature vector is used as
a new representation of an already existing class.
Thereby, both cases lead to an expansion of the
algorithm’s knowledge base during runtime. In both
cases, module B outputs the class associated with that
representation.
The number of representations associated with a class can be
reduced by consolidating those into a single one. This is done
by a separate process not depicted in Fig. 2.
4. Experiments
4.1 Experimental setup
The following experiments were executed on the ImageNet-
dataset [28]. It features RGB-images of 224x224 pixels
belonging to 1,000 different classes.
Due to time restrictions, for some experiments a subset of
this dataset, the ImageNet-10 dataset was used. It features only
10 randomly drawn classes from the complete ImageNet-
dataset. Additionally, its RGB-images consist of only 96x96
pixels. To allow for a comparison of different tests, the same 10
classes with the class indices of 145, 153, 289, 404, 405, 510,
805, 817, 867 and 950 were used throughout our experiments.
The training of the DICLA was executed with one epoch per
incremental step, so every training image is seen just once.
Based on a hyperparameter optimization, the following
parameter were chosen:
Training images per class: 100 (ImageNet-10)/ 10
(ImageNet) – randomly drawn
Test images: 50 (all available images)
  0.5 (fix threshold)
 0.2 (learning rate of module B)
For an evaluation of the proposed algorithm different tests
on ImageNet were performed. As a reference, iCaRL [24] and
Learning without Forgetting (LwF) [29] are used. iCaRL and
Figure 2: Structure of the Distributed Incremental Class Learning Algorithm (DICLA) according to [27]
440 Benjamin Maschler et al. / Procedia CIRP 93 (2020) 437-442
LwF use a ResNet-Architecture for feature extraction, which
achieves a better classification accuracy than MobileNet-V2
(see Table 1). They perform 100 epochs per incremental step
and use all training images (about 1,300 per class) of which
iCaRL saves about 20,000. Results are obtained from [30].
4.2 Results: Incremental learning performance
The results for the three algorithms on the complete
ImageNet dataset with 10 incremental training steps of 100
classes each are shown in Fig. 3:
In the beginning, there is a big performance gap between the
better performing LwF and iCaRL on the one hand and DICLA
on the other hand. However, while the number of classes
increases this difference decreases, leading to DICLA finally
performing about as good as LwF and only slightly worse than
iCaRL. However, one has to keep in mind the considerable
differences in computational complexity and storage required
between iCaRL and DICLA. Furthermore, DICLA’s accuracy
is much more stable, decreasing by about 29 points as compared
to 46 points (iCaRL) and 51 points (LwF).
As the other two algorithms are sensitive to the number of
incremental steps [30], DICLA was tested with different
numbers of incremental steps as well: In addition to the
experiment described above, experiments with 1 and 20 steps
were carried out. The results for 20 incremental steps are shown
in Fig. 3, too. The curve is very similar to the curve with 10
incremental steps. Furthermore, the accuracy achieved with just
1 incremental step is about the same, too. Therefore, the
algorithm does not seem to be sensitive to the number of
incremental steps. The final classification results of LwF,
iCaRL and the different set-ups of the DICLA are given in
Table 2.
With the help of consolidation, the memory consumption
and computational complexity of the DICLA can be further
reduced. Two consolidation strategies and the impact of those
were evaluated regarding classification accuracy and memory
consumption. For these investigations, ImageNet-10 was used.
One consolidation strategy consolidates all representations after
every incremental step. The other strategy consolidates the
Table 2: Final Classification Accuracy on ImageNet
Algorithm Final Classification
Accuracy ImageNet
Final Memory
Consumption
ImageNet
DICLA
(1 incremental step) 39.1 % 527.2 MB
DICLA
(10 incremental steps) 39.3 % 530.5 MB
DICLA
(20 incremental steps) 39.4 % 528.8 MB
iCaRL 44 % 2123 MB
LwF 39 % -
Table 3: Final Classification Accuracy on ImageNet-10 by Consolidation
Method
Consolidation Method
Final Classification
Accuracy ImageNet-
10
Final Memory
Consumption
ImageNet-10
No consolidation 76.40 (± 1.2) % 0.96 MB
Consolidation after
every step 74.62 (± 2.19) % 0.10 MB
Consolidation after
final step only 69.32 (± 4.81) % 0.10 MB
representations only once after all incremental training steps
have been carried out and just before calculating the final test
accuracy. The mean accuracies with standard deviations over
10 runs can be seen Table 3 while Fig. 4 shows just the mean
accuracies.
By consolidation, the memory consumption can be reduced
drastically by about 90%. With the consolidation after every
training step, the final classification accuracy (74.62%) is close
to the accuracy without consolidation (76.40%). With the
consolidation after the final step only, the accuracy is
significantly lower (69.32%) without saving any more memory
space. Therefore, it can be said that with a suitable
consolidation strategy, the memory consumption can be
reduced significantly by only reducing the performance (here:
classification accuracy) slightly. This makes this method highly
useful for applications with little available memory.
Figure 3: Results for ImageNet, 10 and 20 incremental steps
Figure 4: Impact of Consolidation on ImageNet-10, 10 incremental steps
Benjamin Maschler et al. / Procedia CIRP 93 (2020) 437-442 441
4.3 Results: Distributed learning performance
To evaluate the performance of the algorithm for distributed
scenarios, the ImageNet-10 dataset is used. The number of
distributed edge devices is set to 2, 5 and 10, the integer factors
of the number of classes. The 10 classes are uniformly and
randomly distributed to the edge device, so that every device
has the same number of classes to learn. After learning the local
independent classes, the knowledge in every edge devices’
module B is exchanged and the final accuracy on all classes is
tested. For reference, the incremental class learning on
ImageNet-10 with a single edge device (see Table 3) is used.
The mean accuracies with standard deviations over 10 runs are
given in Table 4.
Table 4: Distributed Learning Performance on ImageNet-10
Number of Edge Devices Final Classification Accuracy ImageNet-10
1 76.40 (± 1.2) %
2 76.26 (± 1.5) %
5 73.36 (± 2.5) %
10 74.86 (± 2.1) %
It can be seen in the table that the performance is almost
stable. The standard deviation of the final accuracy increases
with a higher number of devices, but the final mean accuracy is
not decreasing significantly when the training is distributed to
more (5 or 10) devices and the knowledge combined only
afterwards. Thus, the DICLA is well capable of training on
local data and exchange the knowledge with other devices
without sharing the training data.
The authors therefore believe that DICLA can be
successfully applied to industrial use cases such as predictive
quality control, self-optimization or predictive maintenance,
allowing for a cooperation in learning without sharing one’s
input data.
5. Prototype
In order to demonstrate DICLA’s features in a less abstract
use case and on an actual edge device, a prototype setup was
created (see Fig. 5):
A Raspberry Pi Camera Module v2 periodically captures
images of a white placement area and the objects thereon. The
camera is connected to a standard-issue, non-overclock
Raspberry Pi 3 Model B+, on which an instance of DICLA is
analyzing the images captured. The Raspberry Pi is further
connected to a screen, on which the current image, the class of
this image according to DICLA as well as DICLA’s certainty
regarding this assessment is displayed. A user can correct
DICLA’s classification via keyboard and mouse by entering a
different already existing or new class name, causing the
algorithm to create new representations or classes respectively.
Furthermore, the user can trigger a consolidation.
The prototype setup proved to be capable of handling an
image stream of more than one 96x96 image per second while
correctly learning and recognizing new classes of everyday
office and lab equipment like pencils, screwdrivers or pens
without overheating or lagging behind. A distribution of the
learning task was possible without a deterioration of
performance as could be expected according to Section 4.3.
6. Conclusion and transfer
In this paper, an algorithm for image recognition based on
the dual-memory-method and capable of transfer learning, i.e.
cooperatively and performant learning on distributed, evolving
datasets possibly describing different tasks without forgetting
previously acquired knowledge, is presented.
A special focus is laid on this algorithm’s industrial
applicability, calling for low requirements regarding
computational power and storage volume in order to allow its
use in edge devices on the industrial shop floor.
The proposed distributed incremental class learning
algorithm (DICLA) was implemented and compared on the
ImageNet benchmark dataset with two state-of-the-art transfer
learning algorithms, i.e. LwF and iCaRL. It could be
demonstrated, that despite its slightly lower accuracy compared
to iCaRL, it is more performant taking the considerably lower
hardware consumption into account. Furthermore, it is – other
than LwF and iCaRL – not sensitive to the number of training
steps and does not suffer losses from distributing the learning
to different devices and combining the acquired knowledge
only in the end.
To underline these findings, a prototype setup based upon a
standard Raspberry Pi 3 and a camera was created. The
recognition of everyday objects could be learned from an image
stream online and without deterioration even if different
devices were learning different objects separately.
The capabilities demonstrated thereby support the
assumption that DICLA is a good foundation for solving
industrial use cases: By expanding their respective deep
learning algorithms to incorporate the presented approach, they,
Figure 5
: Prototype Setup with Placement Area (blue), Raspberry Pi (yellow),
Camera (location indicated by red arrow), User Input Devices (violet), User
Output Device (green)
442 Benjamin Maschler et al. / Procedia CIRP 93 (2020) 437-442
too, could learn on light-weight edge devices. Naturally, this is
easier for image recognition based use cases, but the authors are
currently working on applying their approach to time series data
as well. Thereby, a broad range of use cases from predictive
quality control over self-optimization to predictive maintenance
could be realized on distributed edge devices without sharing
the input data. The authors happily invite other researchers to
examine this potential on a broader scale and in a real-life
scenario.
References
[1] Efthymiou K, Pagoropoulos A, Papakostas N, Mourtzis D, Chryssolouris
G. Manufacturing Systems Complexity Review: Challenges and Outlook.
Procedia CIRP 2012;3:644–9.
[2] Bruzzone AAG, D’Addona DM. New Perspectives in Manufacturing: An
Assessment for an Advanced Reconfigurable Machining System. Procedia
CIRP 2018;67:552–7.
[3] Meister M, Beßle J, Cviko A, Böing T, Metternich J. Manufacturing
Analytics for problem-solving processes in production. Procedia CIRP
2019;81:1–6.
[4] Kagermann H. Change Through Digitization—Value Creation in the Age
of Industry 4.0. In: Albach H, Meffert H, Pinkwart A, Reichwald R,
editors. Management of Permanent Change. Wiesbaden: Springer
Fachmedien Wiesbaden; 2015, p. 23–45.
[5] Lindemann B, Karadogan C, Jazdi N, Liewald M, Weyrich M. Cloud-
based Control Approach in Discrete Manufacturing Using a Self-Learning
Architecture. IFAC-PapersOnLine 2018;51(10):163–8.
[6] Lughofer E, Sayed-Mouchaweh M. Prologue: Predictive Maintenance in
Dynamic Systems. In: Lughofer E, Sayed-Mouchaweh M, editors.
Predictive Maintenance in Dynamic Systems. Cham: Springer
International Publishing; 2019, p. 1–23.
[7] Yao X, Zhou J, Zhang J, Boer CR. From Intelligent Manufacturing to
Smart Manufacturing for Industry 4.0 Driven by Next Generation
Artificial Intelligence and Further On. In: 2017 5th International
Conference on Enterprise Systems (ES). IEEE; 2017, p. 311–318.
[8] Kozjek D, Vrabič R, Kralj D, Butala P. A Data-Driven Holistic Approach
to Fault Prognostics in a Cyclic Manufacturing Process. Procedia CIRP
2017;63:664–9.
[9] Yang R, Huang M, Lu Q, Zhong M. Rotating Machinery Fault Diagnosis
Using Long-short-term Memory Recurrent Neural Network. IFAC-
PapersOnLine 2018;51(24):228–32.
[10] Wang J, Ma Y, Zhang L, Gao RX, Wu D. Deep learning for smart
manufacturing: Methods and applications. J Manu Sys 2018;48:144–56.
[11] Jordan MI, Mitchell TM. Machine learning: Trends, perspectives, and
prospects. Science 2015;349(6245):255–60.
[12] Maschler B, Jazdi N, Weyrich M. Maschinelles Lernen für intelligente
Automatisierungssysteme mit dezentraler Datenhaltung am
Anwendungsfall Predictive Maintenance. In: VDI-Kongress Automation
2019, Baden-Baden, Germany. VDI 2019;2351:739–751.
[13] Parisi GI, Kemker R, Part JL, Kanan C, Wermter S. Continual lifelong
learning with neural networks: A review. Neural Netw 2019;113:54–71.
[14] Maltoni D, Lomonaco V. Continuous learning in single-incremental-task
scenarios. Neural Netw 2019;116:56–73.
[15] Lindemann B, Fesenmayr F, Jazdi N, Weyrich M. Anomaly detection in
discrete manufacturing using self-learning approaches. Procedia CIRP
2019;79:313–8.
[16] French R. Catastrophic forgetting in connectionist networks. Trends in
Cognitive Sciences 1999;3(4):128–35.
[17] Hinton GE, Plaut DC. Using fast weights to deblur old memories. In: 1987
CSS 9th Annual Conference of the Cognitive Science Society (CogSci).
CSS; 1987, p. 177-186.
[18] Kemker R, McClure M, Abitino A, Hayes T, Kanan C. Measuring
Catastrophic Forgetting in Neural Networks. In: 2018 AAAI 32nd
Conference on Artificial Intelligence. AAAI; 2018, p. 3390-3398.
[19] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep
convolutional neural networks. Commun. ACM 2017;60(6):84–90.
[20] Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-
Scale Image Recognition; 2014.
[21] He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image
Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR). IEEE; 2016, p. 770–778.
[22] Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the
Inception Architecture for Computer Vision. In: 2016 IEEE Conference
on Computer Vision and Pattern Recognition (CVPR). IEEE; 2016, p.
2818–2826.
[23] Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C. MobileNetV2:
Inverted Residuals and Linear Bottlenecks. In: 2018 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR). IEEE; 2018, p. 4510–
4520.
[24] Rebuffi S-A, Kolesnikov A, Sperl G, Lampert CH. iCaRL: Incremental
Classifier and Representation Learning. In: 2017 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR). IEEE; 2017, p. 5533–
5542.
[25] Carpenter GA, Grossberg S. Adaptive Resonance Theory. Springer US;
2010, p-22-35.
[26] Merten AM. Adaptive Resonance Theory [ART] – Ein neuer Ansatz
lernender Computer. University of Ulm, 2003 (Online:
http://www.informatik.uni-ulm.de/ni/Lehre/WS03/ProSemNN/ART.pdf,
last accessed on 2020-01-09).
[27] Maschler B, Weyrich M. Deep Transfer Learning at Runtime for Image
Recognition in Industrial Automation Systems. 16th Technical Conference
EKA – Design of Complex Automation Systems, Magdeburg, 2020.
[28] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S et al. ImageNet
Large Scale Visual Recognition Challenge. Int J Comput Vis
2015;115(3):211–52.
[29] Li Z, Hoiem D. Learning without Forgetting. IEEE Trans Pattern Anal
Mach Intell 2018;40(12):2935–47.
[30] Wu Y, Chen Y, Wang L, Ye Y, Liu Z, Guo Y et al. Large Scale
Incremental Learning. In: 2019 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR). IEEE; 2019.
... However, deep learning models usually assume that the training and testing data are drawn from the same distribution, i.e., supervised learning (SL), making the models rely on a large number of annotated training samples . Collecting and annotating training data for assembly processes is a time-consuming and labor-intensive task that requires substantial manual effort (Maschler et al. 2020). ...
... To address this issue, researchers have developed various techniques to overcome data scarcity. For instance, (Li, Zhang, Ding, and Sun, 2020) employed data augmentation to increase the training data for intelligent rotating machinery fault inspection, while (Krüger, Lehr, Schlueter, and Bischoff, 2019) focused on inherent features and (Maschler, Kamm, Jazdi, and Weyrich, 2020) used incremental learning for industry part recognition. Synthetic data generated from CAD models have also been used to expand training datasets for deep learning in various industrial applications, as described in Cohen et al. (2020); Dekhtiar et al. (2018); Wong et al. (2019); Horváth et al. (2022). ...
Article
Full-text available
In the manufacturing industry, automatic quality inspections can lead to improved product quality and productivity. Deep learning-based computer vision technologies, with their superior performance in many applications, can be a possible solution for automatic quality inspections. However, collecting a large amount of annotated training data for deep learning is expensive and time-consuming, especially for processes involving various products and human activities such as assembly. To address this challenge, we propose a method for automated assembly quality inspection using synthetic data generated from computer-aided design (CAD) models. The method involves two steps: automatic data generation and model implementation. In the first step, we generate synthetic data in two formats: two-dimensional (2D) images and three-dimensional (3D) point clouds. In the second step, we apply different state-of-the-art deep learning approaches to the data for quality inspection, including unsupervised domain adaptation, i.e., a method of adapting models across different data distributions, and transfer learning, which transfers knowledge between related tasks. We evaluate the methods in a case study of pedal car front-wheel assembly quality inspection to identify the possible optimal approach for assembly quality inspection. Our results show that the method using Transfer Learning on 2D synthetic images achieves superior performance compared with others. Specifically, it attained 95% accuracy through fine-tuning with only five annotated real images per class. With promising results, our method may be suggested for other similar quality inspection use cases. By utilizing synthetic CAD data, our method reduces the need for manual data collection and annotation. Furthermore, our method performs well on test data with different backgrounds, making it suitable for different manufacturing environments.
... The reason is, deep learning-based object detection methods usually assume that the training data and testing data are drawn from the same distribution, i.e., supervised learning, making the model rely on a large amount of annotated training samples [2]. Collecting annotated training samples requires plenty of time and manual labor in assembly due to the diversity and complexity of assembly approaches and environments [3]. ...
... Since deep learning has been progressively implemented in industrial quality inspection, different research has aimed to solve the limited annotated training data challenges. There is research focused on data augmentation [6], inherent features [7], and deep transfer learning [3]. No method has yet tried to use the domain adaptation method to solve the problem as far as the authors are aware. ...
Article
Full-text available
A challenge to apply deep learning-based computer vision technologies for assembly quality inspection lies in the diverse assembly approaches and the restricted annotated training data. This paper describes a method for overcoming the challenge by training an unsupervised domain adaptive object detection model on annotated synthetic images generated from CAD models and unannotated images captured from cameras. On a case study of pedal car front-wheel assembly, the model achieves promising results compared to other state-of-the-art object detection methods. Besides, the method is efficient to implement in production as it does not require manually annotated data.
... Examples of the application of time-series based machine learning models are the failure analysis of electronic devices using CNNs [9], anomaly detection in discrete manufacturing with LSTM networks [10], or indoor localization based on 5G signals [11]. Further, machine learning models are often applied to image-based applications, such as object recognition [12] or solar cell defect detection [13]. With more available data from different sources, multimodal machine learning is gaining increasing interest. ...
... The heterogeneity of the data brings special challenges for machine learning algorithms, which are nowadays mostly used for the analysis of data in industrial applications to gain new insights. The algorithms are typically trained for an application on one kind of data, such as object detection in images [6] or failure classification in time series [7]. However, these models are trained for one specific data source setup. ...
Conference Paper
Machine learning implementations in an industrial setting poses various challenges due to the heterogeneous nature of the data sources. A classical machine learning algorithm cannot adapt to dynamic changes in the environment, such as the addition, removal, or failure of a data source. However, to handle heterogeneous data and the challenges coming with this, it is a mandatory capability to build robust and adaptive machine learning models for industrial applications. In this work, a novel architecture for robust and adaptive machine learning is proposed to address these challenges. For this, an architecture consisting of different modular layers is developed, where different models can be easily plugged in. The architecture can handle heterogeneous data with different fusion techniques, which are discussed and evaluated in this paper. The proposed architecture is then evaluated on two public datasets for condition monitoring of automation systems to prove its robustness and adaptiveness. The architecture is compared with baseline models and shows more robust performance in case of failing/removed data sources. In addition, new data sources can easily be added without the need to retrain the whole model. Furthermore, the architecture can detect and locate faulty data sources.
... This needs to be considered when developing and evaluating analysis algorithms for heterogeneous data. There are specialized and very successful machine learning models for specific data modalities and tasks, such as object detection in images (Maschler et al., 2020), anomaly detection (Lindemann et al., 2020), or failure classification based on time-series data (Kamm et al., 2022a). However, these classical machine learning algorithms don't use the often given variety (Wilcke et al., 2017;Damoulas and Girolami, 2009). ...
Article
Full-text available
In many application domains data from different sources are increasingly available to thoroughly monitor and describe a system or device. Especially within the industrial automation domain, heterogeneous data and its analysis gain a lot of attention from research and industry, since it has the potential to improve or enable tasks like diagnostics, predictive maintenance, and condition monitoring. For data analysis, machine learning based approaches are mostly used in recent literature, as these algorithms allow us to learn complex correlations within the data. To analyze even heterogeneous data and gain benefits from it in an application, data from different sources need to be integrated, stored, and managed to apply machine learning algorithms. In a setting with heterogeneous data sources, the analysis algorithms should also be able to handle data source failures or newly added data sources. In addition, existing knowledge should be used to improve the machine learning based analysis or its training process. To find existing approaches for the machine learning based analysis of heterogeneous data in the industrial automation domain, this paper presents the result of a systematic literature review. The publications were reviewed, evaluated, and discussed concerning five requirements that are derived in this paper. We identified promising solutions and approaches and outlined open research challenges, which are not yet covered sufficiently in the literature.
... However, practical experience shows that there are some differences in the characteristics of the echo spectrum between multi-aircraft and single target, and experienced operators can effectively distinguish these differences. Deep learning technology has a good application in the visual application [19], so applying deep learning technology to multi-aircraft recognition will achieve good results. There is no need to calculate statistics or other parameters in the recognition based on deep learning method, while only need the classifier network trained to automatically recognize the subtle differences of spectral features in the range-Doppler image. ...
Article
Full-text available
Over-the-horizon radar (OTHR) is an important equipment for the ultralong-range early warning in the military, but the use of constant false-alarm rate (CFAR), which is a traditional detection method, makes it difficult in multi-aircraft formation recognition. To solve this problem, a multi-aircraft formation recognition method based on deep transfer learning in OTHR is proposed. First, the range-Doppler images of aircraft formation in OTHR are simulated, which are composed of four categories of samples. Secondly, a recognition model based on Convolutional Neural Network (CNN) and CFAR detection technology is constructed, whose training method is designed as a two-step transfer. Finally, the trained model can well distinguish the spectral characteristics of aircraft formation, and then recognize the aircraft number of a formation. Experiments show that the proposed method is better than the traditional CFAR detection method, and can detect the number of aircraft more accurately in the formation with the same false alarm rate.
... Machine vision is an important branch of artificial intelligence (AI). A convolutional neural network (CNN) is a typical deep-learning technology applied in machine vision [2]. CNNs have been widely used for image recognition and object detection [3], [4]. ...
Article
Interdisciplinary integration of theory and practice is imperative as a course requirement in emerging engineering education, and in the public elective course "Machine Vision Algorithm Training". Considering the entire teaching process, including pre-training, in-training, and post-training, this paper discusses the course construction and content in detail in terms of project-based learning (PBL). The PBL teaching approach and evaluation methods are described in detail through a comprehensive face recognition training case based on a convolutional neural network (CNN) and Raspberry Pi. Through project design training from shallower to deeper, interdisciplinary integration of theory and practice is cultivated, stimulating interest in course study. The results demonstrate that PBL teaching improves the engineering application and innovative abilities of students.
Article
Object detection refers to investigating the relationships between images or videos and detected objects to improve system utilization and make better decisions. Productivity measurement plays a key role in assessing operational efficiency across different industries. However, capturing the workers’ working status can be resource-intensive and constrained by a limited sample size if the sampling is conducted manually. While the use of object detection approaches has provided a shortcut for collecting image samples, classifying human poses involves training pose estimation models that may often require a substantial effort for annotating the images. In this study, a systematic approach that integrates pose estimation techniques, fuzzy-set theory, and machine learning algorithms has been proposed at an affordable level of computational resources. The Random Forests algorithm has been explored for handling classification tasks, while fuzzy approximation has also been applied to capture the imprecision associated with human poses, enhancing robustness to variability and accounting for inherent uncertainty. Decision-makers can utilize the proposed approach without the need for high computational resources or extensive data collection efforts, making it suitable for deployment in various workplace environments.
Article
Full-text available
The arrival of the intelligent manufacturing and industrial internet era brings more and more opportunities and challenges to modern industry. Specifically, the revolution of the production mode of traditional manufacturing is undergoing thanks to the techniques including but not limited to digits, network, intelligence, and industrial automation fields. As the core link between intelligent manufacturing and industrial internet platform, industrial Big Data analytics has been paid more and more attention by academia and industry. The efficient mining of the high-value information covered under industrial Big Data and the utilization of the real-life industrial process are among the hottest topics at present. Meanwhile, with the advanced development of industrial automation toward knowledge automation, the learning paradigm of industrial Big Data analytics is also evolving accordingly. Therefore, starting from the perspective of industrial Big Data analytics and aiming at the corresponding industrial scenarios, this article actively explores the revolution of the learning paradigm under the background of industrial Big Data: 1) The evolution of the industry Big Data analytics paradigm is analyzed, that is, from isolated learning to lifelong learning, and their relationships are further summarized; 2) Mainstream directions of lifelong learning are listed, and their applications in industrial scenarios are discussed in detail; 3) Prospects and future directions are given.
Conference Paper
Full-text available
Machine learning algorithms rely on a broad database for high quality results. However, studies show that many companies are not willing to share their data with other companies, for example in the form of a shared data cloud. Therefore, the goal should be to make efficient machine learning possible with decentralized data storage that allows confidential data to remain in the respective company of origin. This article presents a new concept in this respect and analyses its potential for intelligent automation systems taking predictive maintenance as an example. The feasibility of the concept using various existing approaches will be discussed, before potential benefits for plant operators and manufacturers, with particular consideration of the perspective of small and medium-sized companies, will be discussed.
Article
Full-text available
Due to constantly growing complexity in production complex problems occur more often and problem-solving is of increasing importance. The question arises whether the established methods and tools for problem-solving meet the new requirements. Opportunities emerge through an increased data availability and data analysis can help to support solving complex problems. In order to evaluate resulting possibilities this article examines the importance of Manufacturing Analytics in today’s problem-solving processes. Based on an empirical study using interviews to gather experience out of industrial projects it is dealing with fundamentals of modern Manufacturing Analytics and its influence on effective and efficient problem-solving considering statistics, machine learning, data mining and engineering processes.
Article
Full-text available
It was recently shown that architectural, regularization and rehearsal strategies can be used to train deep models sequentially on a number of disjoint tasks without forgetting previously acquired knowledge. However, these strategies are still unsatisfactory if the tasks are not disjoint but constitute a single incremental task (e.g., class-incremental learning). In this paper we point out the differences between multi-task and single-incremental-task scenarios and show that well-known approaches such as LWF, EWC and SI are not ideal for incremental task scenarios. A new approach, denoted as AR1, combining architectural and regularization strategies is then specifically proposed. AR1 overhead (in terms of memory and computation) is very small thus making it suitable for online learning. When tested on CORe50 and iCIFAR-100, AR1 outperformed existing regularization strategies by a good margin.
Article
Full-text available
Process anomalies and unexpected failures of manufacturing systems are problems that cause a decreased quality of process and product. Current data analytics approaches show decent results concerning the optimization of single processes but lack in extensibility to plants with high-dimensional data spaces. This paper presents and compares two data-driven self-learning approaches that are used to detect anomalies within large amounts of machine and process data. Models of the machine behavior are generated to capture complex interdependencies and to extract features that represent anomalies. The approaches are tested and evaluated on the basis of real industrial data from metal forming processes.
Chapter
Vorwort Zum 20. Mal trifft sich die Community zum Leitkongress der Mess- und Automatisierungstechnik AUTOMATION im Juli 2019. Unter dem Motto „Autonomous Systems and 5G in Connected Industries“ erwartet Sie ein anspruchsvolles Programm! Wie wirken sich künstliche Intelligenz und autonome Systeme auf die Fertigungs- und Prozessautomation der Zukunft aus? Welche digitalen Geschäftsmodelle lassen sich dadurch sowie durch ein neues Level an Vernetzung mittels des kommenden Mobilfunkstandards der 5. Generation (5G) realisieren? Verschaffen neue Kommunikationswege und KI solchen Konzepten wie dem digitalen Zwilling, der modularen Automatisierung und erweiterten Automatisierungsarchitekturen den Durchbruch? Der Siegeszug der KI läuft in vielen Geschäftsbereichen. Nach Anwendungen auf den großen Social Media- und Consumer-Plattformen werden immer mehr Beispiele und Erfolge in der industriellen Produktion sichtbar. Andererseits bleiben auch große Anbieter nicht vor Rückschlägen gefe...
Conference Paper
The utilization of deep learning in the field of industrial automation is hindered by two factors: The amount and diversity of training data needed as well as the need to continuously retrain as the use case changes over time. Both problems can be addressed by deep transfer learning allowing for the performant, continuous training on small, dispersed datasets. As a specific example for transfer learning, a dual memory algorithm for computer vision problems is developed and evaluated. It shows the potential for state-of-the-art performance while being trained only on fractions of the complete ImageNet dataset at multiple locations at once.
Chapter
This introductory chapter intends to provide a general overview about the motivation and significance of predictive maintenance (PdM) in the current literature, its nature and characteristics, as well as the most essential requirements and challenges in PdM systems (Sect. 1). It outlines the main lines of research investigated during the last 20 years in order to cope with the requirements in industrial environments, by identifying and classifying appropriate research directions resulting in methodologies and components already established for and in predictive maintenance systems with a possible smooth transition to preventive maintenance—“what has been done so far” (Sect. 2). Then, it emphasizes on recently emerging challenges that go beyond state-of-the-art, with a specific focus on dealing with dynamic changes in the system and on establishing fully automatized processes and operations (Sect. 3). This serves as a clear motivation for our book, in which most of the chapters are dealing with data-driven modeling, optimization, and control strategies, which possess the ability to be trainable and adaptable on the fly based on changing system behavior and nonstationary environmental influences. The last part of this chapter (in Sect. 3) outlines a compact summary of the content of the book by providing a paragraph about each of the single contributions.
Article
With the fast development of science and industrial technologies, the fault diagnosis and identification has become a crucial technique for most industrial applications. To ensure the system safety and reliability, many conventional model based fault diagnosis methods have been proposed. However, with the increase in the complexity and uncertainty of engineering system, it is not feasible to establish accurate mathematical models most of the time. Rotating machinery, due to the complexity in its mechanical structure and transmission mechanics, is within this category. Thus, data-driven method is required for fault diagnosis in rotating machinery. In this paper, an intelligent fault diagnosis scheme based on long-short-term memory (LSTM) recurrent neural network (RNN) is proposed. With the available data measurement signals from multiple sensors in the system, both spatial and temporal dependencies can be utilized to detect the fault and classify the corresponding fault types. A hardware experimental study on wind turbine drivetrain diagnostics simulator (WTDDS) is conducted to illustrate the effectiveness of the proposed scheme.