
Taisuke Kobayashi- PhD
- Assistant Professor at National Institute of Informatics
Taisuke Kobayashi
- PhD
- Assistant Professor at National Institute of Informatics
About
136
Publications
5,121
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
663
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (136)
Continual learning is the one of the most essential abilities for autonomous agents, which can incrementally learn daily-life skills. For this ultimate goal, a simple but powerful method, dark experience replay (DER), has been proposed recently. DER mitigates catastrophic forgetting, in which the skills acquired in the past are unintentionally forg...
This paper investigates a novel nonlinear update rule based on temporal difference (TD) errors in reinforcement learning (RL). The update rule in the standard RL states that the TD error is linearly proportional to the degree of updates, treating all rewards equally without no bias. On the other hand, the recent biological studies revealed that the...
There has been a growing demand for microinjection using an optical microscope and micromanipulators in various fields. However, microinjection requires skilled operators, and the shortage of experts has recently become a challenge. This study proposes an individualized micromanipulation assistance system for cell rotation in microinjection. The pr...
This paper proposes self-collision avoidance for whole-body model predictive control (WB-MPC). Since WB-MPC requires solving a large-scale optimization problem, the gradients of the dynamics and cost functions must be computed quickly. To compute the gradient of the self-collision detection quickly, we create a collision detector using a deep neura...
In imitation learning, ensemble learning, in which multiple models with the same inputs and outputs make a consensu as the average of their outputs, is effective to stabilize behaviors. However, the average is one of the representative values, so the others are expected to have different characteristics. This paper, therefore, investigates a genera...
This paper proposes a new design method for a stochastic control policy using a normalizing flow (NF). In reinforcement learning (RL), the policy is usually modeled as a distribution model with trainable parameters. When this parameterization has less expressiveness, it would fail to acquiring the optimal policy. A mixture model has capability of a...
In reinforcement learning (RL), temporal difference (TD) error is known to be related to the firing rate of dopamine neurons. It has been observed that each dopamine neuron does not behave uniformly, but each responds to the TD error in an optimistic or pessimistic manner, interpreted as a kind of distributional RL. To explain such a biological dat...
The problem of uncertainty is a feature of real world robotics problems and any control framework must contend with it in order to succeed in real applications tasks. Reinforcement Learning is no different, and epistemic uncertainty arising from model uncertainty or misspecification is a challenge well captured by the sim-to-real gap. A simple solu...
Model-based reinforcement learning has attracted much attention due to its high sample efficiency and is expected to be applied to real-world robotic applications. In the real world, as unobservable disturbances can lead to unexpected situations, robot policies should be taken to improve not only control performance but also robustness. Adversarial...
Experience replay (ER) used in (deep) reinforcement learning is considered to be applicable only to off-policy algorithms. However, there have been some cases in which ER has been applied for on-policy algorithms, suggesting that off-policyness might be a sufficient condition for applying ER. This paper reconsiders more strict “experience replayabl...
There has been an increasing demand for microscopic work using optical microscopes and micromanipulators for applications in various fields. However, microinjection requires skilled operators, and the considerable shortage of experts has become a recent challenge. We overcome this challenge by proposing an assistance system based on force and visua...
Robot control using reinforcement learning has become popular, but its learning process generally terminates halfway through an episode for safety and time-saving reasons. This study addresses the problem of the most popular exception handling that temporal-difference (TD) learning performs at such termination. That is, by forcibly assuming zero va...
This paper proposes a new design method for a stochastic control policy using a normalizing flow (NF). In reinforcement learning (RL), the policy is usually modeled as a distribution model with trainable parameters. When this parameterization has less expressiveness, it would fail to acquiring the optimal policy. A mixture model has capability of a...
Soft actor-critic (SAC) in reinforcement learning is expected to be one of the next-generation robot control schemes. Its ability to maximize policy entropy would make a robotic controller robust to noise and perturbation, which is useful for real-world robot applications. However, the priority of maximizing the policy entropy is automatically tune...
Multi-Agent Reinforcement Learning (MARL) is a framework that utilizes reinforcement learning to simultaneously learn policies for multiple agents, such as robots, within the same environment. One concern with reinforcement learning is that stochastic behavior during learning can lead to risk for the agent. In the context of MARL, appropriately avo...
In this study, Passive Dynamics Autonomous Control (PDAC) is handled within the framework of model-based reinforcement learning to achieve footpstep planning. By introducing a double support phase into PDAC, the amount of conserved quantities are adjusted to stabilize gait control. Afterwards, neural networks are used to learn the transition of the...
Recently, soft actor-critic (SAC) is employed for robot control. Although its ability to maximize policy entropy is expected to achieve robustness to noise and perturbation in robot control, the priority of maximizing the policy entropy is automated based on equality constraint to lower bound. Therefore, sufficient robustness is no longer expected....
Originally, Imitating Latent Policies from Observation (ILPO) connects apparent action choices with the corresponding latent action choices, which are optimized through imitating expert’s state trajectories, with a bijection manner. This can be hold only when the finite number of choices are prepared, and therefore, ILPO cannot be applied to the ta...
This paper introduces a novel method of adding intrinsic bonuses to task-oriented reward function in order to efficiently facilitate reinforcement learning search. While various bonuses have been designed to date, they are analogous to the depth-first and breadth-first search algorithms in graph theory. This paper, therefore, first designs two bonu...
Sampling-based model predictive control (MPC) can be applied to versatile robotic systems. However, the real-time control with it is a big challenge due to its unstable updates and poor convergence. This paper tackles this challenge with a novel derivation from reverse Kullback-Leibler divergence, which has a mode-seeking behavior and is likely to...
Deep reinforcement learning (DRL) is one of the promising approaches for introducing robots into complicated environments. The recent remarkable progress of DRL stands on regularization of policy, which allows the policy to improve stably and efficiently. A popular method, so-called proximal policy optimization (PPO), and its variants constrain den...
Platooning of vehicles is expected to be a solution for drivers shortage, traffic congestion, and thereby to reduce gas emission. In this technique, it is critically important to ensure a good response to the velocity change of the leader vehicle, as well as robustness against uncertainty and disturbance. To this end, we propose an adaptive two deg...
This article presents a new data-driven framework for analyzing periodic physical human–robot interaction (pHRI) in latent state space. The model representing pHRI is critical for elaborating human understanding and/or robot control during pHRI. Recent advancements in deep learning technology would allow us to train such a model on a dataset collec...
Extraction of low-dimensional latent space from high-dimensional observation data is essential to construct a real-time robot controller with a world model on the extracted latent space. However, there is no established method for tuning the dimension size of the latent space automatically, suffering from finding the necessary and sufficient dimens...
Background and problem statement
Model-free or learning-based control, in particular, reinforcement learning (RL), is expected to be applied for complex robotic tasks. Traditional RL requires that a policy to be optimized is state-dependent, that means, the policy is a kind of feedback (FB) controllers. Due to the necessity of correct state observa...
Personal robots contain a lot of private information about users, causing a high security risk. However, the installation of new biometric sensors tends to be costly, and even if they are installed, many of them authenticate one of the users only once before starting the use of the personal robot. For sensor-less and continuous authentication, beha...
Vibration and noise during the spin-dry process in a washing machine are the “pain points” of greatest concern. These are mainly caused by an unbalanced drum that occurs from the uneven distribution of clothes in a drum of a washing machine. Until now, an unbalance in a washing machine has been considered to be a mechanical-design issue focusing on...
In machine learning, transforming features into a low-dimensional latent space has advantages such as speeding up learning and suppressing overfitting. In addition, when each feature in the latent space is independent and latent space is sparse, the overlap of information between features can be reduced. Such a sparse latent space can be useful for...
This paper addresses a new interpretation of the traditional optimization method in reinforcement learning (RL) as optimization problems using reverse Kullback–Leibler (KL) divergence, and derives a new optimization method using forward KL divergence, instead of reverse KL divergence in the optimization problems. Although RL originally aims to maxi...
Deep reinforcement learning (DRL) is one of the promising approaches for introducing robots into complicated environments. The recent remarkable progress of DRL stands on regularization of policy, which allows the policy to improve stably and efficiently. A popular method, so-called proximal policy optimization (PPO), and its variants constrain den...
Model Predictive Control (MPC) is one of the effective control methods for complex systems such as automatic driving and robotics. As one of the MPC solvers, the cross-entropy method (CEM) is well known as the most flexible and general method. Although CEM can be applied to most systems, it requires a sufficient (theoretically infinite) number of s...
The biomechanical energy harvester is expected to harvest the electric energies from human motions. A tradeoff between harvesting energy and keeping the user’s natural movements should be balanced via optimization techniques. In previous studies, the hardware itself has been specialized in advance for a single task like walking with constant speed...
Demand for deep reinforcement learning (DRL) is gradually increased to enable robots to perform complex tasks, while DRL is known to be unstable. As a technique to stabilize its learning, a target network that slowly and asymptotically matches a main network is widely employed to generate stable pseudo-supervised signals. Recently, T-soft update ha...
This paper proposes a new regularization technique for reinforcement learning (RL) towards making policy and value functions smooth and stable. RL is known for the instability of the learning process and the sensitivity of the acquired policy to noise. Several methods have been proposed to resolve these problems, and in summary, the smoothness of p...
Deep reinforcement learning (DRL) is one promising approach to teaching robots to perform complex tasks. Because methods that directly reuse the stored experience data cannot follow the change of the environment in robotic problems with a time-varying environment, online DRL is required. The eligibility traces method is well known as an online lear...
As the problems to be optimized with deep learning become more practical, their datasets inevitably contain a variety of noise, such as mislabeling and substitution by estimated inputs/outputs, which would have negative impacts on the optimization results. As a safety net, it is a natural idea to improve a stochastic gradient descent (SGD) optimize...
Deep reinforcement learning (DRL) is one of the promising approaches to make robots accomplish complicated tasks. Eligibility traces is well known as a online learning technique to improve sample efficiency in the traditional reinforcement learning with linear regressors, not DRL. This is because dependencies between parameters of deep neural netwo...
Autonomous driving has made great progress and been introduced in practical use step by step. On the other hand, the concept of personal mobility is also getting popular, and its autonomous driving specialized for individual drivers is expected for a new step. However, it is difficult to collect a large driving dataset, which is basically required...
Model-based reinforcement learning is expected to be a method that can safely acquire the optimal policy under real-world conditions by using a stochastic dynamics model for planning. Since the stochastic dynamics model of the real world is generally unknown, a method for learning from state transition data is necessary. However, model learning suf...
Behavioral cloning (BC) bears a high potential for safe and direct transfer of human skills to robots. However, demonstrations performed by human operators often contain noise or imperfect behaviors that can affect the efficiency of the imitator if left unchecked. In order to allow the imitators to effectively learn from imperfect demonstrations, w...
Behavioral cloning from observation (BCO) allows the robot to learn the policy without the expert's action information. However, it requires a few interactions with the environment to infer expert's action with risk of robot failures. In addition, BCO assumes that the inferred action is of accurate, causing wrong and inefficient updates of the poli...
A multi-agent system (MAS) is expected to be applied to various real-world problems where a single agent cannot accomplish given tasks. Due to the inherent complexity in the real-world MAS, however, manual design of group behaviors of agents is intractable. Multi-agent reinforcement learning (MARL), which is a framework for multiple agents in the s...
This paper presents a new data-driven framework for analyzing periodic physical human-robot interaction (pHRI) in latent state space. To elaborate human understanding and/or robot control during pHRI, the model representing pHRI is critical. Recent developments of deep learning technologies would enable us to learn such a model from a dataset colle...
This paper proposes a new reinforcement learning with hyperbolic discounting. Combining a new temporal difference error with the hyperbolic discounting in recursive manner and reward-punishment framework, a new scheme to learn the optimal policy is derived. In simulations, it is found that the proposal outperforms the standard reinforcement learnin...
For physical human–robot interaction (pHRI) where multi‐contacts play a key role, both robustness to achieve robot‐intended motion and adaptability to follow human‐intended motion are fundamental. However, there are tradeoffs during pHRI when their intentions do not match. This paper focuses on bipedal walking control during pHRI, which handles suc...
This paper addresses a new interpretation of reinforcement learning (RL) as reverse Kullback-Leibler (KL) divergence optimization, and derives a new optimization method using forward KL divergence. Although RL originally aims to maximize return indirectly through optimization of policy, the recent work by Levine has proposed a different derivation...
Model-free or learning-based control, in particular, reinforcement learning (RL), is expected to be applied for complex robotic tasks. Traditional RL requires a policy to be optimized is state-dependent, that means, the policy is a kind of feedback (FB) controllers. Due to the necessity of correct state observation in such a FB controller, it is se...
The biomechanical energy harvester is expected to harvest the electric energies from human motions. A tradeoff between harvesting energy and keeping the user's natural movements should be balanced via optimization techniques. In previous studies, the hardware itself has been specialized in advance for a single task like walking with constant speed...
This paper proposes a new Jacobian-based inverse kinematics (IK) explicitly considering box-constrained joint space. To control humanoid robots, the reference pose of end effector(s) is planned in task space, then mapped into the reference joints by IK. Due to the limited analytical solutions for IK, iterative numerical IK solvers based on Jacobian...
This paper proposes a new robust update rule of target network for deep reinforcement learning (DRL), to replace the conventional update rule, given as an exponential moving average. The target network is for smoothly generating the reference signals for a main network in DRL, thereby reducing learning variance. The problem with its conventional up...
With the recent development of deep learning technology, model learning of low-dimensional potential dynamics embedded in high-dimensional observation data has attracted much attention for complicated robotic control on unknown environment. In previous research, this model learning is formulated as a variational lower bound maximization problem of...
Remarkable achievements by deep neural networks stand on the development of excellent stochastic gradient descent methods. Deep-learning-based machine learning algorithms, however, have to find patterns between observations and supervised signals, even though they may include some noise that hides the true relationship between them, more or less es...
In recent years, dynamical systems handled by machine learning become remarkably larger and more complex, but they require intractable costs due to high-dimensional raw observation space. Variational autoencoder (VAE) is one of the promising approaches to the problems since it has a capability to learn dynamical systems in latent state space hidden...
This paper proposes a novel variational autoencoder derived from Tsallis statistics, named q-VAE. Starting from the viewpoint of Tsallis statistics, a new lower bound of the q-VAE is derived to maximize likelihood of the data sampled, which has a potential for disentangled representation learning. As another advantage of the q-VAE, it does not requ...
This paper addresses how to control a robot arm with variable stiffness actuators (VSAs) using deep reinforcement learning. Each VSA has two manipulated variables, target position and stiffness, thereby making learning speed inefficient due to increase of action space. To avoid the increase of action space including deviations of actuators’ stiffne...
Recently, autonomous driving has been developed due to increase of traffic accident, traffic congestion and tailgating. In this study, we mainly focus on shared autonomy, where an autonomous system shares control of a vehicle with a driver. The system in shared autonomy has a role to help the driver avoid dangerous situations without interference i...
Behavioral cloning, which is one of the imitation learning methods, enables a robot to imitate an expert’s policy from the expert’s state and action demonstrations. In that case, the robot does not need to interact with environment, thereby preventing robot failure. However, in general, it is difficult to obtain expert action information. Although...
In this paper, we propose a deep unfolding-based framework for the output feedback control of systems with input saturation. Although saturation commonly arises in several practical control systems, there is still a scarce of effective design methodologies that can directly deal with the severe non-linearity of the saturation operator. In this pape...
In this paper, we address sequential mobility assistance for daily elderly care through physical human–robot interaction. The goal of this work is to develop a robotic assistive system to provide physical support in daily life such as movement transition, e.g. sit-to-stand and walking. Using a mobile human support robotic platform, we propose an un...
Deep reinforcement learning (DRL) is one of the promising approaches for introducing robots into complicated environments. The recent remarkable progress of DRL stands on regularization of policy. By constraining the update of policy, DRL allows the policy to improve stably and efficiently. Among them, a popular method, named proximal policy optimi...
End-to-end reinforcement learning is a promising approach to enable robots to acquire complicated skills. However, this requires numerous samples to be implemented successfully. The issue is that it is often difficult to collect the sufficient number of samples. To accelerate learning in the field of robotics, knowledge gathered from robotics engin...
This paper proposes a new robust update rule of the target network for deep reinforcement learning, to replace the conventional update rule, given as an exponential moving average. The problem with the conventional rule is the fact that all the parameters are smoothly updated with the same speed, even when some of them are trying to update toward t...
Deep reinforcement learning (DRL) is one of the promising approaches to make robots accomplish complicated tasks. In the robotic problems with time-varying environment, online DRL is required since the methods that directly reuse the stored experience data cannot follow the change of the environment. Eligibility traces method is well known as an on...
This paper proposes a new optimizer for deep learning, named d-AmsGrad. In the real-world data, noise and outliers cannot be excluded from dataset to be used for learning robot skills. This problem is especially striking for robots that learn by collecting data in real time, which cannot be sorted manually. Several noise-robust optimizers have ther...
A variational autoencoder (VAE) derived from Tsallis statistics called q-VAE is proposed. In the proposed method, a standard VAE is employed to statistically extract latent space hidden in sampled data, and this latent space helps make robots controllable in feasible computational time and cost. To improve the usefulness of the latent space, this l...
This paper proposes a novel variational autoencoder (VAE) derived from Tsallis statistics, named q-VAE. A vanilla VAE is utilized to statistically extract latent space hidden in data sampled. Such latent space is useful to make robots controllable in feasible computational time and cost. To improve usefulness of the latent space, this paper focuses...
Machine learning algorithms aim to find patterns from observations, which may include some noise, especially in robotics domain. To perform well even with such noise, we expect them to be able to detect outliers and discard them when needed. We therefore propose a new stochastic gradient optimization method, whose robustness is directly built in th...
Catastrophic forgetting is one of the most challenging problems of (deep) neural networks, but autonomous robots, which would acquire many tasks in real life sequentially, require to resolve or mitigate it. Modular networks are expected to mitigate this problem since it can exploit different modules for respective tasks. However, this approach woul...
This paper proposes reinforcement learning with hyperbolic discounting. In general, return and its expectation, i.e., value function, are defined as cumulative rewards with exponential discounting due to mathematical simplicity. Animals, however, show behaviors that cannot be explained by the exponential discounting, but can be explained by the hyp...
This paper proposes an actor-critic algorithm with a policy parameterized by student-t distribution, named student-t policy, to enhance learning performance, mainly in terms of reachability on global optimum for tasks to be learned. The actor-critic algorithm is one of the policy-gradient methods in reinforcement learning, and is proved to learn th...
This paper proposes a new motion classifier using variational deep embedding with regularized student-t mixture model as prior, named VaDE-RT, to improve robustness to outliers while maintaining continuity in latent space. Normal VaDE uses Gaussian mixture model, which is sensitive to outliers, and furthermore, all the components of mixture model c...
Neural network has a critical problem, called catastrophic forgetting, where memories for tasks already learned are easily overwritten with memories for a task additionally learned. This problem interferes with continual learning required for autonomous robots, which learn many tasks incrementally from daily activities. To mitigate the catastrophic...
This paper focuses on a transition motion between bipedal walking and running, whose characteristics have been revealed through numerous biological experiments. Although hysteresis in walk-to-run and run-to-walk transitions, the amount of which is proportional to the magnitude of acceleration/deceleration, is observed, it has not been elucidated ye...