Conference Paper

Reinforcement Learning Approach to Vibration Compensation for Dynamic Feed Drive Systems

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This can be achieved in industrial systems for state-of-the-art RL algorithms (Gulde et al., 2019; or with safety-constrained optimization (Rattunde et al., 2021). Here, the decisions are automatically communicated ondemand between the two devices: The control unit providing the real-time capable system to fulfill the industrial requirements and a computer providing the ML framework to enable iterative learning. ...
Article
Full-text available
Intelligent manufacturing applications and agent-based implementations are scientifically investigated due to the enormous potential of industrial process optimization. The most widespread data-driven approach is the use of experimental history under test conditions for training, followed by execution of the trained model. Since factors, such as tool wear, affect the process, the experimental history has to be compiled extensively. In addition, individual machine noise implies that the models are not easily transferable to other (theoretically identical) machines. In contrast, a continual learning system should have the capacity to adapt (slightly) to a changing environment, e.g., another machine under different working conditions. Since this adaptation can potentially have a negative impact on process quality, especially in industry, safe optimization methods are required. In this article, we present a significant step towards self-optimizing machines in industry, by introducing a novel method for efficient safe contextual optimization and continuously trading-off between exploration and exploitation. Furthermore, an appropriate data discard strategy and local approximation techniques enable continual optimization. The approach is implemented as generic software module for an industrial edge control device. We apply this module to a steel straightening machine as an example, enabling it to adapt safely to changing environments.
... In terms of improving machine performance via DRL, Schoop et al. [177] used PPO to design control policies to maximize the cutting tools' life while ensuring machining quality. Gulde et al. [178] applied DRL for vibration compensation to the machine tool axis to obtain higher machining precision and a longer component lifetime. Jiang et al. [179] modeled the internal CNC data consisting of feeding axis tracking error as an LSTM network; then, ...
Article
Full-text available
To facilitate the personalized smart manufacturing paradigm with cognitive automation capabilities, Deep Reinforcement Learning (DRL) has attracted ever-increasing attention by offering an adaptive and flexible solution. DRL takes the advantages of both Deep Neural Networks (DNN) and Reinforcement Learning (RL), by embracing the power of representation learning, to make precise and fast decisions when facing dynamic and complex situations. Ever since the first paper of DRL was published in 2013, its applications have sprung up across the manufacturing field with exponential publication growth year by year. However, there still lacks any comprehensive review of the DRL in the field of smart manufacturing. To fill this gap, a systematic review process was conducted, with 208 relevant publications selected to date (20-Oct.-2022), to gain a holistic understanding of the development, application, and challenges of DRL in smart manufacturing along the whole engineering lifecycle. First, the concept and development of DRL are summarized. Then, the typical DRL applications are analyzed in the four engineering lifecycle stages: design, manufacturing, logistics, and maintenance. Finally, the challenges and future directions are illustrated, especially emerging DRL-related technologies and solutions that can improve the manufacturing system's deployment feasibility, cognitive capability, and learning efficiency, respectively. It is expected that this work can provide an insightful guide to the research of DRL in the smart manufacturing field and shed light on its future perspectives.
... The vibration control of a rotating machine was also performed through RLC using pad actuators [142]. Gulde et al. [143] implemented a method to compensate vibrations in an industrial machine tool using RLC. Eshkevari et al. [144] and Gao et al. [145] also achieved good controllability of flexible buildings structures through RLC. ...
Preprint
Full-text available
The use of Machine Learning (ML) has rapidly spread across several fields, having encountered many applications in Structural Dynamics and Vibroacoustic (SD&V). The increasing capabilities of ML to unveil insights from data, driven by unprecedented data availability, algorithms advances and computational power, enhance decision making, uncertainty handling, patterns recognition and real-time assessments. Three main applications in SD&V have taken advantage of these benefits. In Structural Health Monitoring (SHM), ML detection and prognosis lead to safe operation and optimized maintenance schedules. System identification and control design are leveraged by ML techniques in Active Noise Control (ANC) and Active Vibration Control (AVC). Finally, the so-called ML-based surrogate models provide fast alternatives to costly simulations, enabling robust and optimized product design. Despite the many works in the area, they have not been reviewed and analyzed. Therefore, to keep track and comprehend this ongoing integration of fields, we intend to make the first survey of ML applications in SD&V analyses, shedding light on the current state of implementation and emerging opportunities. For each of the three applications mentioned, we identified the main methodologies, advantages, limitations, and recommendations based on scientific knowledge. Moreover, we discuss the role of Digital Twins and Physics Guided ML to overcome current challenges and power future research progress. As a result, the survey provides a broad overview of the present landscape of ML applied in SD&V and guides the reader to an advanced understanding of progress and prospects in the field.
Article
Full-text available
We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers. The best performing method, an asynchronous variant of actor-critic, surpasses the current state-of-the-art on the Atari domain while training for half the time on a single multi-core CPU instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task involving finding rewards in random 3D mazes using a visual input.
Article
Full-text available
Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
Article
Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy - that is, succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off-policy methods. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving very similar performance across different random seeds.
Article
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.
Article
This paper demonstrates an approach to frequency domain identification for the explicit purpose of designing robust H-infinity controllers. The approach transforms raw experimental data into a plant set estimate directly usable by modern robust control design software (e.g., Matlab Robust Control Toolboxes(1,2)). A key issue in control design from raw data is the question of whether the controller will work when applied to the true system. The main feature of this approach is that the resulting controller is guaranteed to work as designed (when applied to the true system) to a prescribed statistical confidence. While the overall methodology addresses key theoretical issues, it has at the same time been specifically designed to support practical implementations. A simulation example is included to demonstrate the overall approach. (C) 1998 John Wiley & Sons, Ltd.
Article
2014 In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. We demonstrate that deterministic policy gradient algorithms can significantly outperform their stochastic counterparts in high-dimensional action spaces.
Article
Long Short-Term Memory (LSTM) is a specific recurrent neural network (RNN) architecture that was designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. In this paper, we explore LSTM RNN architectures for large scale acoustic modeling in speech recognition. We recently showed that LSTM RNNs are more effective than DNNs and conventional RNNs for acoustic modeling, considering moderately-sized models trained on a single machine. Here, we introduce the first distributed training of LSTM RNNs using asynchronous stochastic gradient descent optimization on a large cluster of machines. We show that a two-layer deep LSTM RNN where each LSTM layer has a linear recurrent projection layer can exceed state-of-the-art speech recognition performance. This architecture makes more effective use of model parameters than the others considered, converges quickly, and outperforms a deep feed forward neural network having an order of magnitude more parameters.
Article
A novel method is presented for producing dead-bea response in a lightly-damped oscillatory feedback system. Complete transient response times of the order of a fraction of the natural oscillatory period can be obtained. Excellent waveshape reproduction is achieved through a linear phase lag with frequency. The method consists of exciting several transient oscillations, at closely spaced times, with magnitudes and phases so adjusted that the resultant sum of the transient oscillation phasors is zero. The steady-state output is the arithmetic sum of the excitation magnitudes. When a step input transient is divided into two spaced excitations, one-half cycle response is obtainable. When the input transient is divided into three excitations, one-fourth period or faster transient times are realizable, depending upon the available dynamic range or signal-to-noise ratio. The principle of design is to adjust a system to the maximum possible resonant frequency, independent of the damping factor, but stable, and then to apply the Posicast control to completely remove the oscillatory component in the output. In an electrical feedback control system, the additional hardware consists of one or two artificial transmission lines.
Article
A linear quadratic regulator (LQR) for mechanical vibration systems is studied based on second-order matrix equations. The performance index is a functional depending on second derivatives. The Euler–Lagrange equations lead to linear second-order system matrix-augmented differential equations whose stable eigenvalues are the poles of the closed-loop optimal controlled systems. The optimal feedback constant matrices are determined by the stable eigenpairs, the control input matrix and the control weight matrix. The traditional matrix Riccati equation is not used, and the control input matrix involved is a general rectangular matrix.
Article
This paper demonstrates an approach to frequency domain identification for the explicit purpose of designing robust H∞ controllers. The approach transforms raw experimental data into a plant set estimate directly usable by modern robust control design software (e.g., Matlab Robust Control Toolboxes). A key issue in control design from raw data is the question of whether the controller will work when applied to the true system. The main feature of this approach is that the resulting controller is guaranteed to work as designed (when applied to the true system) to a prescribed statistical confidence. While the overall methodology addresses key theoretical issues, it has at the same time been specifically designed to support practical implementations. A simulation example is included to demonstrate the overall approach. © 1998 John Wiley & Sons, Ltd.
This paper presents a model reduction method and uncertainty modeling for the design of a low-order H. robust controller for suppression of smart panel vibration. A smart panel with collocated piezoceramic actuators and sensors is modeled using solid, transition, and shell finite elements, and then the size of the model is reduced in the state space domain. A robust controller is designed not only to minimize the panel vibration excited by applied uniform acoustic pressure, but also to be reliable in real world applications. This paper introduces the idea of Modal Hankel Singular values (MHSV) to reduce the finite element model to a low-order state space model with minimum model reduction error: MHSV measures balanced controllability and observability of each resonance mode to deselect insignificant resonance modes. State space modeling of realistic control conditions are formulated in terms of uncertainty variables. These uncertainty variables include uncertainty in actuators and sensors performances, uncertainty in the knowledge of resonance frequencies of the structure, damping ratio, static stiffness, unmodeled high resonance vibration modes, etc. The simplified model and the uncertainty model are combined as an integrated state space model, and then implemented in the H(infinity) control theory for controller parameterization. The low-order robust controller is easy to implement in an analog circuit to provide a low cost solution in a variety of applications where cost may be a limiting factor.
Article
This article presents the investigation of performance of a nonlinear quarter-car active suspension system with a stochastic real-valued reinforcement learning control strategy. As an example, a model of a quarter car with a nonlinear suspension spring subjected to excitation from a road profile is considered. The excitation is realised by the roughness of the road. The quarter-car model to be considered here can be approximately described as a nonlinear two degrees of freedom system. The experimental results indicate that the proposed active suspension system suppresses the vibrations greatly. A simulation of a nonlinear quarter-car active suspension system is presented to demonstrate the effectiveness and examine the performance of the learning control algorithm.
Article
Optimal control method for active vibration control of linear time-delay systems is investigated in this paper. In terms of two cases that time delay is integer and non-integer times of sampling period, motion equation with time delay is transformed as standard discrete forms which contain no time delay by using zero order holder respectively. Discrete quadratic function is used as objective function in design of controller to guarantee good control efficiency on sampling points. In every step of computation of the deduced controller, it contains not only current step of state feedback but also linear combination of some former steps of control. Because the controller is deduced directly from time-delay differential equation, system stability can be guaranteed easily, thus this method is generally applicable to ordinary control systems. The performance of the control method proposed and system stability when using this method are all demonstrated by numerical simulation results. Simulation results demonstrate that the presented method is a viable and attractive control strategy for applications to active vibration control. Instability in responses occurs possibly if the systems with time delay are controlled using controller designed in case of no time delay.
Article
In this article, we propose an active/passive vibration controller for a cantilever beam using a sliding mass-spring-dashpot mechanism. The controller is placed at the free end of the beam, introducing Coriolis, inertia, and centripetal nonlinearities into the system, resulting in nonlinear coupling that may be used to quench the transient vibration of the beam. When the natural frequency of the slider is twice the fundamental beam frequency (2:1 internal resonance), the two systems will be coupled through nonlinearities that cause the oscillatory energy to be transferred back and forth between the beam and the slider. Control is achieved once the vibration of the beam is absorbed by the slider and dissipated through the slider damping. Numerical results show that this technique can improve the effective damping ratio of the structure by a factor of 15. This technique is particularly useful for reducing large-amplitude oscillations to levels that may be managed using conventional methods. Due to the nonlinearities in the system, for small or zero controller damping, chaotic transient oscillations can occur depending on the amplitude of the initial disturbance of the beam.
Article
Control of structures can be carried out conveniently by modal control, whereby the structure is controlled by controlling its modes. Modal control requires the estimation of the modal states for feedback, which can present a problem. One approach that does not require modal state estimation is direct feedback control, which implies collocated sensors and actuators. This paper examines some problems encountered in direct feedback control of distributed structures in conjunction with pole placement. A perturbation technique permits the computation of control gains for multi-input systems. The paper demonstrates that the difficulties experienced in using direct feedback in conjunction with pole placement are endemic to the approach.
Article
A nonlinear adaptive vibration absorber to control the vibrations offlexible structures is investigated. The absorber is based on thesaturation phenomenon associated with dynamical systems possessingquadratic nonlinearities and a two-to-one internal resonance. Thetechnique is implemented by coupling a second-order controller with thestructure through a sensor and an actuator. Energy is exchanged betweenthe structure and the controller and, near resonance, the structure'sresponse saturates to a small value. Experimental results are presented for the control of a rectangularplate and a cantilever beam using piezoelectric ceramics andmagnetostrictive alloys as actuators. The control technique isimplemented using a digital signal processing board and a modelingsoftware. The control strategy is made adaptive by incorporating anefficient frequency-measurement technique. This is validated bysuccessfully testing the control strategy for a nonconventionalproblem, where nonlinear effects hinder the application of thenonadaptive controller.
Conference Paper
This paper presents the results of experiments using a new technique for shaping inputs to systems that vibrate. The shaped inputs move the system to the same location that was originally commanded, however, the oscillations of the machine are considerably reduced. First, an overview of the new shaping method is presented. Next, a description of the experimental apparatus is given. Lastly, the results and sample data are presented. The results demonstrate that the new shaping method performs well on machines which exhibit significant structural vibration.
Passive energy dissipation systems in structural engineering
  • T T Soong
  • G F Dargush
Feedback control systems
  • O J Smith
Intelligent active vibration control
  • M M Daniali
  • G Vossoughi
Optuna: A hyperparameter optimization framework
  • D Okanohara