This study uses a design science research methodology to develop and evaluate the Pi-Mind agent, an information technology artefact that acts as a responsible, resilient, ubiquitous cognitive clone – or a digital copy – and an autonomous representative of a human decision-maker. Pi-Mind agents can learn the decision-making capabilities of their “donors” in a specific training environment based on generative adversarial networks. A trained clone can be used by a decision-maker as an additional resource for one’s own cognitive enhancement, as an autonomous representative, or even as a replacement when appropriate. The assumption regarding this approach is as follows: when someone was forced to leave a critical process because of, for example, sickness, or wanted to take care of several simultaneously running processes, then they would be more confident knowing that their autonomous digital representatives were as capable and predictable as their exact personal “copy”. The Pi-Mind agent was evaluated in a Ukrainian higher education environment and a military logistics laboratory. In this paper, in addition to describing the artefact, its expected utility, and its design process within different contexts, we include the corresponding proof of concept, proof of value, and proof of use.
Abstract Data‐driven fault diagnosis has prevailed in machine condition monitoring in the past decades. However, traditional machine‐ and deep‐learning‐based fault diagnosis methods assumed that the source and target data share the same distribution and ignored knowledge transfer in dynamic working environments. In recent years, knowledge transfer approaches have been developed and have shown promising results in intelligent fault diagnosis and health management of rotary machines. This paper presents a comprehensive review of knowledge transfer approaches and their applications in fault diagnosis of rotary machines. A problem‐oriented taxonomy of knowledge transfer in fault diagnosis is proposed. The knowledge transfer paradigms, approaches, and applications are categorised and analysed. Future research challenges and directions are explored from data, modelling, and application perspectives.
Knowledge distillation has been widely used to produce portable and efficient neural networks which can be well applied on edge devices for computer vision tasks. However, almost all top-performing knowledge distillation methods need to access the original training data, which usually has a huge size and is often unavailable. To tackle this problem, we propose a novel data-free approach in this paper, named Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without the need of any training data or meta-data. To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data. The generator not only uses the pre-trained teacher’s intrinsic statistics in existing batch normalization layers but also obtains the maximum discrepancy from the student model. Then the generated samples are used to train the compact student network under the supervision of the teacher. The proposed method obtains an efficient student network which closely approximates its teacher network, without using the original training data. Extensive experiments are conducted to demonstrate the effectiveness of the proposed approach on CIFAR, Caltech101 and ImageNet datasets for classification tasks. Moreover, we extend our method to semantic segmentation tasks on several public datasets such as CamVid, NYUv2, Cityscapes and VOC 2012. To the best of our knowledge, this is the first work on generative model based data-free knowledge distillation on large-scale datasets such as ImageNet, Cityscapes and VOC 2012. Experiments show that our method outperforms all baselines for data-free knowledge distillation.
In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher–student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.
Remarkable achievements have been obtained by deep neural networks in the last several years. However, the breakthrough in neural networks accuracy is always accompanied by explosive growth of computation and parameters, which leads to a severe limitation of model deployment. In this paper, we propose a novel knowledge distillation technique named self-distillation to address this problem. Self-distillation attaches several attention modules and shallow classifiers at different depths of neural networks and distills knowledge from the deepest classifier to the shallower classifiers. Different from the conventional knowledge distillation methods where the knowledge of the teacher model is transferred to another student model, self-distillation can be considered as knowledge transfer in the same model - from the deeper layers to the shallow layers. Moreover, the additional classifiers in self-distillation allow the neural network to work in a dynamic manner, which leads to a much higher acceleration. Experiments demonstrate that self-distillation has consistent and significant effectiveness on various neural networks and datasets. On average, 3.49% and 2.32% accuracy boost are observed on CIFAR100 and ImageNet. Besides, experiments show that self-distillation can be combined with other model compression methods, including knowledge distillation, pruning and lightweight model design.
Industry 4.0 and highly automated critical infrastructure can be seen as cyber‐physicalsocial systems controlled bythe Collective Intelligence. Suchsystems are essentialfor the functioningofthesocietyandeconomy.Ononehand,theyhaveflexibleinfrastructureof heterogeneous systems and assets. On the other hand, they are social systems, which include collaborating humans and artificial decision makers. Such (human plus machine) resources mustbe pre‐trained to perform their mission with high efficiency. Both human and machine learning approaches must be bridged to enable such training. The importance of these systems requires the anticipation of the potential and previously unknown worst‐case scenarios during training. In this paper, we provide an adversarial training framework for the collective intelligence. We show how cognitive capabilities can be copied (“cloned”) from humans and trained as a (responsible) collective intelligence. We made some modifications to the Generative Adversarial Networks architectures and adapted them for the cloning and training tasks. We modified the Discriminator component to a so‐called “Turing Discriminator”, which includes one or several human and artificial discriminators working together. We also discussed the concept of cellular intelligence, where a person can act and collaborate in a group together with their own cognitive clones.
Link to the paper: https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cim2.12008
There can be many reasons for anyone to make a digital copy (clone) of own decision-making behavior. This enables virtual presence of a professional decision-maker simultaneously in many places and processes of Industry 4.0. Such clone can be used as one’s responsible representative when the human is not available. Pi-Mind (“Patented Intelligence”) is a technology, which enables “cloning” cognitive skills of humans using adversarial machine learning. In this paper, we present a cyber-physical environment as an adversarial learning ecosystem for cloning image classification skills. The physical component of the environment is provided by the logistic laboratory with camera-surveillance over the conveyors. The digital component of the environment contains special modifications of Generative Adversarial Networks, which include a human-operator as a trainer, an autonomous Pi-Mind clone as a trainee (a discriminator) and a smart digital adversary as a challenger (generator of sophisticated decision situations, emergencies and attacks, which supposedly catalyzes the cloning process).
In recent times, image segmentation has been involving everywhere including disease diagnosis to autonomous vehicle driving. In computer vision, this image segmentation is one of the vital works and it is relatively complicated than other vision undertakings as it needs low-level spatial data. Especially, Deep Learning has impacted the field of segmentation incredibly and gave us today different successful models. The deep learning associated Generated Adversarial Networks (GAN) has presenting remarkable outcomes on image segmentation. In this study, the authors have presented a systematic review analysis on recent publications of GAN models and their applications. Three libraries such as Embase (Scopus), WoS, and PubMed have been considered for searching the relevant papers available in this area. Search outcomes have identified 2084 documents, after two-phase screening 52 potential records are included for final review. The following applications of GAN have been emerged: 3D object generation, medicine, pandemics, image processing, face detection, texture transfer, and traffic controlling. Before 2016, research in this field was limited and thereafter its practical usage came into existence worldwide. The present study also envisions the challenges associated with GAN and paves the path for future research in this realm.
Distillation is an effective knowledge-transfer technique that uses predicted distributions of a powerful teacher model as soft targets to train a less-parameterized student model. A pre-trained high capacity teacher, however, is not always available. Recently proposed online variants use the aggregated intermediate predictions of multiple student models as targets to train each student model. Although group-derived targets give a good recipe for teacher-free distillation, group members are homogenized quickly with simple aggregation functions, leading to early saturated solutions. In this work, we propose Online Knowledge Distillation with Diverse peers (OKDDip), which performs two-level distillation during training with multiple auxiliary peers and one group leader. In the first-level distillation, each auxiliary peer holds an individual set of aggregation weights generated with an attention-based mechanism to derive its own targets from predictions of other auxiliary peers. Learning from distinct target distributions helps to boost peer diversity for effectiveness of group-based distillation. The second-level distillation is performed to transfer the knowledge in the ensemble of auxiliary peers further to the group leader, i.e., the model used for inference. Experimental results show that the proposed framework consistently gives better performance than state-of-the-art approaches without sacrificing training or inference complexity, demonstrating the effectiveness of the proposed two-level distillation framework.
Emerging technologies such as cloud computing, augmented and virtual reality, artificial intelligence and robotics, among others, are transforming the field of manufacturing and industry as a whole in unprecedent ways. This fourth industrial revolution is consequentially changing how operators that have been crucial to industry success go about their practices in industrial environments. This paper briefly introduces a novel way of conceptualizing the human operator necessarily implicates human values in the technologies that constitute it. Similarly, the design methodology known as value sensitive design (VSD) is drawn upon to discuss how these Operator 4.0 technologies can be designed for human values.
Digital twin is gaining popularity due to its
significant impacts in bridging the gap between physical and
cyber worlds. As reported by Grand View Research, Inc., the
global market of digital twin is expected to reach $26.07 billion
by 2025 with a Compound Annual Growth Rate of 38.2%. The
growing adoption of Cyber-Physical System (CPS), Internet of
Things (IoT), big data analytics, and cloud computing in
manufacturing sector has paved the way for low cost and
systematic implementation of digital twin, with promising
impacts on a) product design and development, b) machine and
equipment health monitoring, and c) product support and
services. Successful implementation of digital twin would
increase transparency, cooperation, flexibility, resilience,
production speed, scalability, and manufacturing efficiency.
Realization of smart manufacturing requires collaborative and
autonomous interactions between sensing, networking, and
computational resources across manufacturing assets where data
is gathered from physical systems is utilized for extraction of
actionable insights and provision of predictive services. In this
paper, a reference architecture based on deep learning, digital
twin, and 5C-CPS is proposed to facilitate the transformation
towards smart manufacturing and Industry 4.0.
With the emergence of edge computing paradigm, many applications such as image recognition and augmented reality require to perform machine learning (ML) and artificial intelligence (AI) tasks on edge devices. Most AI and ML models are large and computational-heavy, whereas edge devices are usually equipped with limited computational and storage resources. Such models can be compressed and reduced for deployment on edge devices, but they may loose their capability and not perform well. Recent works used knowledge transfer techniques to transfer information from a large network (termed teacher) to a small one (termed student) in order to improve the performance of the latter. This approach seems to be promising for learning on edge devices, but a thorough investigation on its effectiveness is lacking. This paper provides an extensive study on the performance (in both accuracy and convergence speed) of knowledge transfer, considering different student architectures and different techniques for transferring knowledge from teacher to student. The results show that the performance of KT does vary by architectures and transfer techniques. A good performance improvement is obtained by transferring knowledge from both the intermediate layers and last layer of the teacher to a shallower student. But other architectures and transfer techniques do not fare so well and some of them even lead to negative performance impact.
The paper summarizes research findings related to SmartResource project (2004-2007) funded by Tekes and industrial companies. The main objectives was research and development of the large-scale distributed environment for integration of smart devices, web services and humans based on combination of Semantic Web, agent technologies and service-oriented architecture. A prototype platform for self-maintained smart resources in smart spaces has been designed and implemented for particular tasks of industrial partners. In this paper we will present the summary of research results obtained during the project period and related industrial case study. Several lessons have been learned during the project in addition to published results, which we are going to share with scientific community. We also present a vision how to utilize project results to design various complex smart spaces taking into account such issues as interoperability, coordination, self-manageability, reputation and trust in future generation smart space environments.
Industry pushes a new type of Internet characterized as the Internet of Things, which represents a fusion of the physical and digital worlds. The technology of the Internet of Things opens new horizons for industrial automation, that is, automated monitoring, control, maintenance planning, and so forth, of industrial resources and processes. Internet of Things definitely needs explicit semantics, even more than the traditional Web-for automatic discovery and interoperability among heterogeneous devices and also to facilitate the behavioral coordination of the components of complex physical-digital systems. In this chapter, the authors describe their work towards the Global Understanding Environment (GUN), a general middleware framework aimed at providing means for building complex industrial systems consisting of components of different nature, based on the semantic and the agent technologies. The authors present the general idea and some emergent issues of GUN and describe the current state of the GUN realization in the UBIWARE platform. As a specific concrete case, they use the domain of distributed power network maintenance. In collaboration with the ABB Company, we have developed a simple prototype and vision of potential add-value this domain could receive from introducing semantic and agent technologies, and GUN framework in particular.
As ubiquitous systems become increasingly complex, traditional solutions to manage and control them reach their limits and pose a need for self-manageability. Also, heterogeneity of the ubiquitous components, standards, data formats, etc, creates significant obstacles for interoperability in such complex systems. The promising technologies to tackle these problems are the Semantic technologies, for interoperability, and the Agent technologies for management of complex systems. This paper describes our vision of a middleware for the Internet of Things, which will allow creation of self-managed complex systems, in particular industrial ones, consisting of distributed and heterogeneous components of different nature. We also present an analysis of issues to be resolved to realize such a middleware.
We introduce a metric for probability distributions, which is bounded, information-theoretically motivated, and has a natural Bayesian interpretation. The square root of the well-known χ2 distance is an asymptotic approximation to it. Moreover, it is a close relative of the capacitory discrimination and Jensen-Shannon divergence.
We study different ways of determining the mean distance between a reference point and its n-th neighbour among random points distributed with uniform density in a D-dimensional Euclidean space. First we present a heuristic method; though this method provides only a crude mathematical result, it shows a simple way of estimating . Next we describe two alternative means of deriving the exact expression of : we review the method using absolute probability and develop an alternative method using conditional probability. Finally we obtain an approximation to from the mean volume between the reference point and its n-th neighbour and compare it with the heuristic and exact results.
Knowledge distillation can be generally divided into offline and online categories according to whether teacher model is pre-trained and persistent during the distillation process. Offline distillation can employ existing models yet always demonstrates inferior performance than online ones. In this paper, we first empirically show that the essential factor for their performance gap lies in the reversed distillation from student to teacher, rather than the training fashion. Offline distillation can achieve competitive performance gain by fine-tuning pre-trained teacher to adapt student with such reversed distillation. However, this fine-tuning process still costs lots of training budgets. To alleviate this dilemma, we propose SHAKE, a simple yet effective SHAdow KnowlEdge transfer framework to bridge offline and online distillation, which trades the accuracy with efficiency. Specifically, we build an extra shadow head on the backbone to mimic the predictions of pre-trained teacher as its shadow. Then, this shadow head is leveraged as a proxy teacher to perform bidirectional distillation with student on the fly. In this way, SHAKE not only updates this student-aware proxy teacher with the knowledge of pre-trained model, but also greatly optimizes costs of augmented reversed distillation. Extensive experiments on classification and object detection tasks demonstrate that our technique achieves state-of-the-art results with different CNNs and Vision Transformer models. Additionally, our method shows strong compatibility with multi-teacher and augmentation strategies by gaining additional performance improvement. Code is made publicly available at https://lilujunai.github.io/SHAKE/.
Smart manufacturing needs digital clones of physical objects (digital twins) and human decision-makers (cognitive clones). The latter requires use of machine learning to capture hidden personalised decision models from humans. Machine learning nowadays is a subject of various adversarial attacks (poisoning, evasion, etc.). Responsible use of machine learning requires digital immunity (the capability of smart systems to operate robustly in adversarial conditions). Both problems (clones and immunity training) have the same backbone solution, which is adversarial training (learning on automatically generated adversarial samples). In this study, we design and experimentally test special algorithms for adversarial samples generation to fit simultaneously both purposes: to better personalise decision models for digital clones and to train digital immunity, thus, ensuring robustness of autonomous decision models. We demonstrate that our algorithms facilitate the desired robustness and accuracy of the training process.
Smart manufacturing often requires digital clones of physical objects (twins) and human decision-makers (“cognitive clones”). The latter requires use of machine learning to capture hidden personalized decision models from humans. Machine learning nowadays is a subject of various adversarial attacks (poisoning, evasion, etc.) on the training and testing data. Responsible use of machine learning requires some kind of “digital immunity” (the capability of smart systems to operate robustly in adversarial conditions). Both problems (clones and immunity training) require the same backbone solution, which is adversarial training (learning on the basis of automatically generated adversarial samples). In this study we designed and experimentally tested special algorithms for adversarial samples generation to fit simultaneously both purposes: better personalize the decision models for digital clones and to train digital immunity to ensure robustness of the autonomous decision models. We demonstrated that our algorithms essentially facilitate training process towards desired robustness for both problems.
Advances in human-centric smart manufacturing (HSM) reflect a trend towards the integration of human-in-the-loop with technologies, to address challenges of human-machine relationships. In this context, the human-cyber-physical systems (HCPS), as an emerging human-centric system paradigm, can bring insights to the development and implementation of HSM. This study presents a systematic review of HCPS theories and technologies on HSM with a focus on the human-aspect is conducted. First, the concepts, key components, and taxonomy of HCPS are discussed. HCPS system framework and subsystems are analyzed. Enabling technologies (e.g., domain technologies , unit-level technologies, and system-level technologies) and core features (e.g., connectivity, integration, intelligence , adaptation, and socialization) of HCPS are presented. Applications of HCPS in smart manufacturing are illustrated with the human in the design, production, and service perspectives. This research offers key knowledge and a reference model for the human-centric design, evaluation, and implementation of HCPS-based HSM.
Federated Learning (FL) is a decentralized machine-learning paradigm in which a global server iteratively aggregates the model parameters of local users without accessing their data. User heterogeneity has imposed significant challenges to FL, which can incur drifted global models that are slow to converge. Knowledge Distillation has recently emerged to tackle this issue, by refining the server model using aggregated knowledge from heterogeneous users, other than directly aggregating their model parameters. This approach, however, depends on a proxy dataset, making it impractical unless such prerequisite is satisfied. Moreover, the ensemble knowledge is not fully utilized to guide local model learning, which may in turn affect the quality of the aggregated model. In this work, we propose a data-free knowledge distillation approach to address heterogeneous FL, where the server learns a lightweight generator to ensemble user information in a data-free manner, which is then broadcasted to users, regulating local training using the learned knowledge as an inductive bias. Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art.
The digital twin is now within reach while manufacturing and manufacturing process become increasingly digital and the Internet of Things (IoT) is becoming more and more dominant. Digital twins are intended to model complex structures and processes that communicate with their environments in various ways, for which it is challenging to predict effects over the entire lifecycle of the product. A digital twin is a virtual model that during its life cycle simulates a physical entity or operation, providing a near real-time connection between both the physical and virtual world. A digital twin allows the industry to detect physical issues sooner, predict outcomes more accurately, and build better products. The IoT drives digital twin as a trend in a wide range of industries, by offering them the potential to take the advantages, of mass customization along with mass personalization, maintaining, at the same time, mass production efficiency. Developing a digital twin needs different components, including sensors, communications networks, and a digital platform. This chapter aims to map major architectures and applications of digital twins for Industry 4.0, along the lines of manufacturing systems, manufacturing processes and robots, automation and virtual reality in manufacturing.
The recent growth in the industry sector has prominent role of technology towards the automation in terms of computing systems and software products. The objective of Industry4.0 is automation of industry process through integration of service-oriented architecture, intelligence mechanism, proactive maintenance systems. The key areas such Robotics and 3D modeling has attracted lot research and innovation happening in Industry 4.0 concepts. This chapter presents the various modeling and simulation techniques for Industry 4.0. In the first part, the modelling techniques that are existing in literature are discussed. It is observed that, there are 16 modelling techniques possible to address the Industry 4.0. The commonly used ones like SysML, UML, metamodel, Production Flow Scheme and Ontology based context modeling are presented in detail. The second part of the chapter is focused on simulation mechanism for Industry 4.0. In this Simulation Optimization methods and its Applications are discussed. Further, how efficiently the Industry 4.0 concepts can be optimized by various parameters and performance metrics are presented.
Many recent works on knowledge distillation have provided ways to transfer the knowledge of a trained network for improving the learning process of a new one, but finding a good technique for knowledge distillation is still an open problem. In this paper, we provide a new perspective based on a decision boundary, which is one of the most important component of a classifier. The generalization performance of a classifier is closely related to the adequacy of its decision boundary, so a good classifier bears a good decision boundary. Therefore, transferring information closely related to the decision boundary can be a good attempt for knowledge distillation. To realize this goal, we utilize an adversarial attack to discover samples supporting a decision boundary. Based on this idea, to transfer more accurate information about the decision boundary, the proposed algorithm trains a student classifier based on the adversarial samples supporting the decision boundary. Experiments show that the proposed method indeed improves knowledge distillation and achieves the state-of-the-arts performance.
Industry 4.0 is a trend related to smart factories, which are cyber-physical spaces populated and controlled by the collective intelligence for the autonomous and highly flexible manufacturing purposes. Artificial Intelligence (AI) embedded into various planning, production, and management processes in Industry 4.0 must take the initiative and responsibility for making necessary real-time decisions in many cases. In this paper, we suggest the Pi-Mind technology as a compromise between completely human-expert-driven decision-making and AI-driven decision-making. Pi-Mind enables capturing, cloning and patenting essential parameters of the decision models from a particular human expert making these models transparent, proactive and capable of autonomic and fast decision-making simultaneously in many places. The technology facilitates the human impact (due to ubiquitous presence) in smart manufacturing processes and enables human-AI shared responsibility for the consequences of the decisions made. It also benefits from capturing and utilization of the traditionally human creative cognitive capabilities (sometimes intuitive and emotional), which in many cases outperform the rational decision-making. Pi-Mind technology is a set of models, techniques, and tools built on principles of value-based biased decision-making and creative cognitive computing to augment the axioms of decision rationality in industry.
Currently, deep neural networks are the state of the art on problems such as speech recognition and computer vision. In this paper we empirically demonstrate that shallow feed-forward nets can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models. Moreover, in some cases the shallow nets can learn these deep functions using the same number of parameters as the original deep models. On the TIMIT phoneme recognition and CIFAR-10 image recognition tasks, shallow nets can be trained that perform similarly to complex, well-engineered, deeper convolutional models.
Currently, deep neural networks are the state of the art on problems such as
speech recognition and computer vision. In this extended abstract, we show that
shallow feed-forward networks can learn the complex functions previously
learned by deep nets and achieve accuracies previously only achievable with
deep models. Moreover, the shallow neural nets can learn these deep functions
using a total number of parameters similar to the original deep model. We
evaluate our method on TIMIT phoneme recognition task and are able to train
shallow fully-connected nets that perform similarly to complex,
well-engineered, deep convolutional architectures. Our success in training
shallow neural nets to mimic deeper models suggests that there probably exist
better algorithms for training shallow feed-forward nets than those currently
available.
KDGAN: Knowledge distillation with generative adversarial networks”. Advances in neural information processing systems
X Wang
R Zhang
Y Sun
J Qi
Wang, X., Zhang, R., Sun, Y., and Qi, J. (2018). "KDGAN: Knowledge distillation with generative adversarial networks". Advances in neural
information processing systems, 31. https://proceedings.neurips.cc/paper_files/paper/2018/file/019d385eb67632a7e958e23f24bd07d7-Paper.pdf
Zero-shot knowledge transfer via adversarial belief matching
Jan 2019
Micaelli
Micaelli, P., and Storkey, A. J. (2019). "Zero-shot knowledge transfer via adversarial belief matching". Advances in Neural Information
Processing Systems, 32. https://proceedings.neurips.cc/paper_files/paper/2019/file/fe663a72b27bdc613873fbbb512f6f67-Paper.pdf