
Takayoshi Yamashita- PhD
- Lecturer at Chubu University
Takayoshi Yamashita
- PhD
- Lecturer at Chubu University
About
188
Publications
23,553
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,543
Citations
Introduction
Current institution
Publications
Publications (188)
Deep learning techniques are increasingly utilized to analyze large-scale single-cell RNA sequencing (scRNA-seq) data, offering valuable insights from complex transcriptome datasets. Geneformer, a pre-trained model using a Transformer Encoder architecture and human scRNA-seq datasets, has demonstrated remarkable success in human transcriptome analy...
Person detection methods are used widely in applications including visual surveillance, pedestrian detection, and robotics. However, accurate detection of persons from overhead fisheye images remains an open challenge because of factors including person rotation and small-sized persons. To address the person rotation problem, we convert the fisheye...
Deep Reinforcement Learning (DRL) agents are expected to be applied to robotics and other fields due to their high control capabilities. On the other hand, DRL still has the issue that the agent model is a black box and the agent’s decision-making is unclear. One research field that aims to solve these problems is eXplainable Reinforcement Learning...
Deep learning techniques are increasingly utilized to analyze large-scale single-cell RNA sequencing (scRNA-seq) data, offering valuable insights from complex transcriptome datasets. Geneformer, a pre-trained model using a Transformer Encoder architecture and human scRNA-seq datasets, has demonstrated remarkable success in human transcriptome analy...
Domestic service robots (DSRs) that support people in everyday environments have been widely investigated. However, their ability to predict and describe future risks resulting from their own actions remains insufficient. In this study, we focus on the linguistic explainability of DSRs. Most existing methods do not explicitly model the region of po...
The transparent formulation of explanation methods is essential for elucidating the predictions of neural networks, which are typically black-box models. Layer-wise Relevance Propagation (LRP) is a well-established method that transparently traces the flow of a model's prediction backward through its architecture by backpropagating relevance scores...
Action spotting is a key component in high-level video understanding. The large number of similar frames poses a challenge for recognizing actions in videos. In this paper we use frame saliency to represent the importance of frames for guiding the model to focus on keyframes. We propose the frame saliency weighting module to improve frame saliency...
Most video anomaly detection approaches are based on non-semantic features, which are not interpretable, and prevent the identification of anomaly causes. Therefore, we propose a caption-guided interpretable video anomaly detection framework that explains the prediction results based on video captions (semantic). It utilizes non-semantic features t...
Deep reinforcement learning (DRL) can learn an agent’s optimal behavior from the experience it gains through interacting with its environment. However, since the decision-making process of DRL agents is a black-box, it is difficult for users to understand the reasons for the agents’ actions. To date, conventional visual explanation methods for DRL...
Many studies applying neural networks to the field of education have focused on student performance prediction and explainability of their decisions. While those studies introduced neural networks into educational settings, such networks cannot directly support student learnings in place of teachers. Therefore, we present a method that uses a gener...
Data drift is a change in the distribution of data during machine learning model training and during operation. This occurs regardless of the type of data and adversely affects model performance. However, this method can only analyze the drift detection results from the difference in the distribution of the output of a two-sample test, and does not...
Visual explanation is an approach for visualizing the grounds of judgment by deep learning, and it is possible to visually interpret the grounds of a judgment for a certain input by visualizing an attention map. As for deep-learning models that output erroneous decision-making grounds, a method that incorporates expert human knowledge in the model...
The excellent performance of Transformer in supervised learning has led to growing interest in its potential application to deep reinforcement learning (DRL) to achieve high performance on a wide variety of problems. However, the decision making of a DRL agent is a black box, which greatly hinders the application of the agent to real-world problems...
Visual explanation is an approach for visualizing the grounds of judgment by deep learning, and it is possible to visually interpret the grounds of a judgment for a certain input by visualizing an attention map. As for deep-learning models that output erroneous decision-making grounds, a method that incorporates expert human knowledge in the model...
In orthogonal world coordinates, a Manhattan world lying along cuboid buildings is widely useful for various computer vision tasks. However, the Manhattan world has much room for improvement because the origin of pan angles from an image is arbitrary, that is, four-fold rotational symmetric ambiguity of pan angles. To address this problem, we propo...
In object detection, data amount and cost are a trade-off, and collecting a large amount of data in a specific domain is labor-intensive. Therefore, existing large-scale datasets are used for pre-training. However, conventional transfer learning and domain adaptation cannot bridge the domain gap when the target domain differs significantly from the...
Explanation generation for transformers enhances accountability for their predictions. However, there have been few studies on generating visual explanations for the transformers that use multidimensional context, such as LambdaNetworks. In this paper, we propose the Lambda Attention Branch Networks, which attend to important regions in detail and...
While convolutional neural networks (CNNs) have achieved excellent performances in various computer vision tasks, they often misclassify with malicious samples, a.k.a. adversarial examples. Adversarial training is a popular and straightforward technique to defend against the threat of adversarial examples. Unfortunately, CNNs must sacrifice the acc...
The Prototypical Part Network (ProtoPNet) is an interpretable deep learning model that combines the strong power of deep learning with the interpretability of case-based reasoning, thereby, achieving high accuracy while keeping its reasoning process interpretable without any additional supervision. Thanks to these advantages, ProtoPNet has attracte...
Deep convolutional networks have dominated advances in object detection and grasp-position estimation using computer vision. The data-collection process for these networks is, however, time-consuming and expensive. We propose an automatic data-collection method for object detection and grasp-position estimation using mobile robots and invisible mar...
Ensemble of networks with bidirectional knowledge distillation does not significantly improve on the performance of ensemble of networks without bidirectional knowledge distillation. We think that this is because there is a relationship between the knowledge in knowledge distillation and the individuality of networks in the ensemble. In this paper,...
Although recent learning-based calibration methods can predict extrinsic and intrinsic camera parameters from a single image, the accuracy of these methods is degraded in fisheye images. This degradation is caused by mismatching between the actual projection and expected projection. To address this problem, we propose a generic camera model that ha...
Data augmentation is an essential technique for improving recognition accuracy in object recognition using deep learning. Methods that generate mixed data from multiple data sets, such as mixup, can acquire new diversity that is not included in the training data, and thus contribute significantly to accuracy improvement. However, since the data sel...
Predicting the grasping point accurately and quickly is crucial for successful robotic manipulation. However, to commercially deploy a robot, such as a dishwasher robot in a commercial kitchen, we also need to consider the constraints of limited usable resources. We present a deep learning method to predict the grasp position when using a single su...
In object detection, data amount and cost are a trade-off, and collecting a large amount of data in a specific domain is labor intensive. Therefore, existing large-scale datasets are used for pre-training. However, conventional transfer learning and domain adaptation cannot bridge the domain gap when the target domain differs significantly from the...
Robot navigation with deep reinforcement learning (RL) achieves higher performance and performs well under complex environment. Meanwhile, the interpretation of the decision-making of deep RL models becomes a critical problem for more safety and reliability of autonomous robots. In this paper, we propose a visual explanation method based on an atte...
Color images are easy to understand visually and can acquire a great deal of information, such as color and texture. They are highly and widely used in tasks such as segmentation. On the other hand, in indoor person segmentation, it is necessary to collect person data considering privacy. We propose a new task for human segmentation from invisible...
Mixup is one of data augmentation methods for image recognition task, which generate data by mixing two images. Mixup randomly samples two images from training data without considering the similarity of these data and classes. This random sampling generates mixed samples with low similarities, which makes a network training difficult and complicate...
Understanding environment around the vehicle is essential for automated driving technology. For this purpose, an omnidirectional LiDAR is used for obtaining surrounding information and point cloud based semantic segmentation methods have been proposed. However, these methods requires a time to acquire point cloud data and to process the point cloud...
In action understanding in indoor, we have to recognize human pose and action considering privacy. Although camera images can be used for highly accurate human action recognition, camera images do not preserve privacy. Therefore, we propose a new task for human instance seg-mentation from invisible information, especially airborne ultrasound, for a...
A robot that picks and places the wide variety of items in a logistics warehouse must detect and recognize items from images and then decide which points to grasp. Our Multi-task Deconvolutional Single Shot Detector (MT-DSSD) simultaneously performs the three tasks necessary for this manipulation: object detection, semantic segmentation, and graspi...
Trajectory forecasting to generate plausible pedestrian trajectories in crowded scenes requires an understanding of human-human social interactions. Groups of pedestrians with the social norm move along similar trajectories, while groups of pedestrians with different norms make changes to their trajectories to avoid a collision. This paper introduc...
Although recent learning-based calibration methods can predict extrinsic and intrinsic camera parameters from a single image, the accuracy of these methods is degraded in fisheye images. This degradation is caused by mismatching between the actual projection and expected projection. To address this problem, we propose a generic camera model that ha...
It is difficult for people to interpret the decision-making in the inference process of deep neural networks. Visual explanation is one method for interpreting the decision-making of deep learning. It analyzes the decision-making of 2D CNNs by visualizing an attention map that highlights discriminative regions. Visual explanation for interpreting t...
Traffic light recognition is an important task for automatic driving support systems. Conventional traffic light recognition techniques are categorized into model-based methods, which frequently suffer from environmental changes such as sunlight, and machine-learning-based methods, which have difficulty detecting distant and occluded traffic lights...
Placing objects is a fundamental task for domestic service robots (DSRs). Thus, inferring the collision-risk before a placing motion is crucial for achieving the requested task. This problem is particularly challenging because it is necessary to predict what happens if an object is placed in a cluttered designated area. We show that a rule-based ap...
Mutual learning, in which multiple networks learn by sharing their knowledge, improves the performance of each network. However, the performance of ensembles of networks that have undergone mutual learning does not improve significantly from that of normal ensembles without mutual learning, even though the performance of each network has improved s...
Deep reinforcement learning (DRL) has great potential for acquiring the optimal action in complex environments such as games and robot control. However, it is difficult to analyze the decision-making of the agent, i.e., the reasons it selects the action acquired by learning. In this work, we propose Mask-Attention A3C (Mask A3C), which introduces a...
Driver Action Recognition is a key component in driver monitoring systems, which is helpful for the safety management of commercial vehicles. Compared with traditional human action recognition tasks, driver action recognition is required to be fast and accurate on embedded systems. We propose a fast and accurate driver action recognition method tha...
The static relationship between joints and the dynamic importance of joints leads to high accuracy in skeletal action recognition. Nevertheless, existing methods define the graph structure beforehand by skeletal patterns, so they cannot capture features considering the relationship between joints specific to actions. Moreover, the importance of joi...
Knowledge transfer among multiple networks using their outputs or intermediate activations have evolved through manual design from a simple teacher-student approach to a bidirectional cohort one. The major components of such knowledge transfer framework involve the network size, the number of networks, the transfer direction, and the design of the...
Placing objects is a fundamental task for domestic service robots (DSRs). Thus, inferring the collision-risk before a placing motion is crucial for achieving the requested task. This problem is particularly challenging because it is necessary to predict what happens if an object is placed in a cluttered designated area. We show that a rule-based ap...
In this paper, we propose a novel point clouds based 3D object detection method for achieving higher-accuracy of autonomous driving. Different types of objects on the road has a different shape. A LiDAR sensor can provide a point cloud including more than ten thousand points reflected from object surfaces in one frame. Recent studies show that hand...
We present a system for creating short summarized videos from longer cooking videos. These videos can be easily shared on social media to convey the required steps of a particular recipe. Typically creating such videos is time-consuming and requires video editing skills. Therefore, we propose a semi-automatic system using the information of an onli...
Path prediction methods with deep learning architectures consider the interactions of pedestrians with the feature of the surrounding physical environment. However, these methods process all pedestrian targets as a unified category, making it difficult to predict a suitable path for each category. In real scenes, both pedestrians and vehicles must...
Driver pose estimation is a key component in driver monitoring systems, which is helpful for driver anomaly detection. Compared with traditional human pose estimation, driver pose estimation is required to be fast and compact for embedded systems. We propose fast and compact driver pose estimation that is composed of ShuffleNet V2 and integral regr...
Domestic service robots (DSRs) are a promising solution to the shortage of home care workers. However, one of the main limitations of DSRs is their inability to interact naturally through language. Recently, data-driven approaches have been shown to be effective for tackling this limitation; however, they often require large-scale datasets, which i...
Domestic service robots (DSRs) are a promising solution to the shortage of home care workers. However, one of the main limitations of DSRs is their inability to interact naturally through language. Recently, data-driven approaches have been shown to be effective for tackling this limitation; however, they often require large-scale datasets, which i...
Driver drowsiness estimation is one of the important tasks for preventing car accidents. Most of the approaches are binary classification that classify a driver is significantly drowsy or not. Multi-level drowsiness estimation, that detects not only significant drowsiness but also moderate drowsiness, is helpful to a safer and more comfortable car...
Automatization for the picking and placing of a variety of objects stored on shelves is a challenging problem for robotic picking systems in distribution warehouses. Here, object recognition using image processing is especially effective at picking and placing a variety of objects. In this study, we propose an efficient method for object recognitio...
Convolutional neural networks (CNNs) have become a mainstream method for keypoint matching in addition to image recognition, object detection, and semantic segmentation. Learned Invariant Feature Transform (LIFT) is pioneering method based on CNN. It performs keypoint detection, orientation estimation, and feature description in a single network. A...
We proposed a picking robot system which is apllicable to various mixed items in shelves. The robot has a two-finger gripper which can change the open width of the finger. To determine the position, the pose and the open width when the gripper pick items, we proposed efficient determination algorithm which is based on a RGBD sensor data. In our exp...
Various image recognition tasks were handled in the image recognition field prior to 2010 by combining image local features manually designed by researchers (called handcrafted features) and machine learning method. After entering the 2010, However, many image recognition methods that use deep learning have been proposed. The image recognition meth...