Shin'ichi Satoh

Shin'ichi Satoh
  • Doctor of Engineering
  • Professor (Full) at National Institute of Informatics

About

500
Publications
111,054
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,761
Citations
Current institution
National Institute of Informatics
Current position
  • Professor (Full)

Publications

Publications (500)
Chapter
Understanding student engagement in online education is crucial for optimizing learning outcomes. This paper introduces ECLIPSE dataset (Extended Classroom Learning Insights via Prolonged Student Engagement), comprising 10,110 annotated images from a 55-minutes , 30-minutes and 20-minutes online lecture. Annotations include four affective states: e...
Article
Full-text available
Close-up facial images captured at short distances often suffer from perspective distortion, resulting in exaggerated facial features and unnatural/unattractive appearances. We propose a simple yet effective method for correcting perspective distortions in a single close-up face image. We first perform 3D GAN inversion using a perspective-distorted...
Chapter
Over the past decade, there has been noteworthy progress in the field of Automatic Grammatical Error Correction (GEC). Despite this growth, current GEC models possess limitations as they primarily concentrate on single sentences, neglecting the significance of contextual understanding in error correction. While a few models have begun to factor in...
Article
Full-text available
Analyzing the appearances of political figures in large-scale news archives is increasingly important with the growing availability of large-scale news archives and developments in computer vision. We present a deep learning-based method combining face detection, tracking, and classification, which is particularly unique because it does not require...
Article
Not all semantics become confusing when deploying a semantic segmentation model for real-world scene understanding of adverse weather. The true semantics of most pixels have a high likelihood of appearing in the few top classes according to confidence ranking. In this paper, we replace the one-hot pseudo label with a candidate label set (CLS) that...
Article
Adversarial attacks on thermal infrared imaging expose the risk of related applications. Estimating the security of these systems is essential for safely deploying them in the real world. In many cases, realizing the attacks in the physical space requires elaborate special perturbations. These solutions are often impractical and attention-grabbing....
Preprint
We report competitive results on RobustBench for CIFAR and SVHN using a simple yet effective baseline approach. Our approach involves a training protocol that integrates rescaled square loss, cyclic learning rates, and erasing-based data augmentation. The outcomes we have achieved are comparable to those of the model trained with state-of-the-art t...
Article
Recent person Re-IDentification (ReID) systems have been challenged by changes in personnel clothing, leading to the study of Cloth-Changing person ReID (CC-ReID). Commonly used techniques involve incorporating auxiliary information ( e.g ., body masks, gait, skeleton, and keypoints) to accurately identify the target pedestrian. However, the effec...
Preprint
Full-text available
Certified defense methods against adversarial perturbations have been recently investigated in the black-box setting with a zeroth-order (ZO) perspective. However, these methods suffer from high model variance with low performance on high-dimensional datasets due to the ineffective design of the denoiser and are limited in their utilization of ZO t...
Preprint
Human evaluation is critical for validating the performance of text-to-image generative models, as this highly cognitive process requires deep comprehension of text and images. However, our survey of 37 recent papers reveals that many works rely solely on automatic measures (e.g., FID) or perform poorly described human evaluations that are not reli...
Preprint
Full-text available
Close-up facial images captured at close distances often suffer from perspective distortion, resulting in exaggerated facial features and unnatural/unattractive appearances. We propose a simple yet effective method for correcting perspective distortions in a single close-up face. We first perform GAN inversion using a perspective-distorted input fa...
Preprint
Full-text available
Adversarial attacks on thermal infrared imaging expose the risk of related applications. Estimating the security of these systems is essential for safely deploying them in the real world. In many cases, realizing the attacks in the physical space requires elaborate special perturbations. These solutions are often \emph{impractical} and \emph{attent...
Preprint
Recent deep metric learning (DML) methods typically leverage solely class labels to keep positive samples far away from negative ones. However, this type of method normally ignores the crucial knowledge hidden in the data (e.g., intra-class information variation), which is harmful to the generalization of the trained model. To alleviate this proble...
Preprint
Full-text available
Current approaches in Multiple Object Tracking (MOT) rely on the spatio-temporal coherence between detections combined with object appearance to match objects from consecutive frames. In this work, we explore MOT using object appearances as the main source of association between objects in a video, using spatial and temporal priors as weighting fac...
Preprint
Full-text available
Although Deep Neural Networks (DNNs) have achieved impressive results in computer vision, their exposed vulnerability to adversarial attacks remains a serious concern. A series of works has shown that by adding elaborate perturbations to images, DNNs could have catastrophic degradation in performance metrics. And this phenomenon does not only exist...
Article
Recent studies show that deep person re-identification (re-ID) models are vulnerable to adversarial examples, so it is critical to improving the robustness of re-ID models against attacks. To achieve this goal, we explore the strengths and weaknesses of existing re-ID models, i.e., designing learning-based attacks and training robust models by defe...
Preprint
Existing learning-based image inpainting methods are still in challenge when facing complex semantic environments and diverse hole patterns. The prior information learned from the large scale training data is still insufficient for these situations. Reference images captured covering the same scenes share similar texture and structure priors with t...
Preprint
Full-text available
Most deep metric learning (DML) methods employ a strategy that forces all positive samples to be close in the embedding space while keeping them away from negative ones. However, such a strategy ignores the internal relationships of positive (negative) samples and often leads to overfitting, especially in the presence of hard samples and mislabeled...
Preprint
Full-text available
Understanding foggy image sequence in the driving scenes is critical for autonomous driving, but it remains a challenging task due to the difficulty in collecting and annotating real-world images of adverse weather. Recently, the self-training strategy has been considered a powerful solution for unsupervised domain adaptation, which iteratively ada...
Preprint
The large variation of viewpoint and irrelevant content around the target always hinder accurate image retrieval and its subsequent tasks. In this paper, we investigate an extremely challenging task: given a ground-view image of a landmark, we aim to achieve cross-view geo-localization by searching out its corresponding satellite-view images. Speci...
Article
Understanding foggy image sequence in driving scene is critical for autonomous driving, but it remains a challenging task due to the difficulty in collecting and annotating real-world images of adverse weather. Recently, self-training strategy has been considered as a powerful solution for unsupervised domain adaptation, which iteratively adapts th...
Article
Full-text available
Vehicle re-identification (Re-ID) research has intensified as numerous advancements have been made along with the rapid development of person Re-ID. In this paper, we tackle the vehicle Re-ID problem in open scenarios. This research differs from the early-stage studies that focused on a certain view, and it faces more challenges due to view variati...
Preprint
Full-text available
Most computer vision systems assume distortion-free images as inputs. The widely used rolling-shutter (RS) image sensors, however, suffer from geometric distortion when the camera and object undergo motion during capture. Extensive researches have been conducted on correcting RS distortions. However, most of the existing work relies heavily on the...
Article
Despite impressive progress in crowd counting over the last years, it is still an open challenge to reliably count crowds across visual domains. This paper addresses this setting, presenting an unsupervised cross-domain crowd counting framework able to perform unsupervised adaptation across domains with available unlabeled target data. We achieve t...
Preprint
Mean Average Precision (mAP) is the primary evaluation measure for object detection. Although object detection has a broad range of applications, mAP evaluates detectors in terms of the performance of ranked instance retrieval. Such the assumption for the evaluation task does not suit some downstream tasks. To alleviate the gap between downstream t...
Chapter
The pareidolia phenomenon is a discriminating characteristic of psychiatric disorders, expressed through visual illusions seen by patients. Typically, it can be diagnosed through the noise pareidolia test, which is time-consuming to both patients and experts. In this research, we propose a novel computer-assisted method to identify pareidolia pheno...
Article
There are many types of retinal disease, and accurately detecting these diseases is crucial for proper diagnosis. Convolutional neural networks (CNNs) typically perform well on detection tasks, and the attention module of CNNs can generate heatmaps as visual explanations of the model. However, the generated heatmap can only detect the most discrimi...
Article
Full-text available
The large variation of viewpoint and irrelevant con- tent around the target always hinder accurate image retrieval and its subsequent tasks. In this paper, we investigate an extremely challenging task: given a ground-view image of a landmark, we aim to achieve cross-view geo-localization by searching out its corresponding satellite-view images. Spe...
Chapter
The e-commerce fashion industry is booming and comes with the need for proper search and recommendation. However, sufficient user personalization is still a challenging task. In this paper, we introduce a personalized fashion recommendation system based on high-dimensional input of user- and environment information. The proposed framework is used t...
Article
Vehicle counting is important for smart city applications such as logistics management, traffic estimation, and financial analysis. To perform vehicle counting using aerial images, researchers have proposed many algorithms, including detection-, regression-, and density-based methods. However, most of these algorithms are only applicable to high-re...
Article
Video frame interpolation has made great progress in estimating advanced optical flow and synthesizing in-between frames sequentially. However, frame interpolation involving various resolutions and motions remains challenging due to limited or fixed pre-trained networks. Inspired by the success of the coarse-to-fine scheme for video frame interpola...
Article
Full-text available
Image captioning can show great performance for generating captions for general purposes, but it remains difficult to adjust the generated captions for different applications. In this paper, we propose an image captioning method which can generate both imageability- and length-controllable captions. The imageability parameter adjusts the level of v...
Article
Full-text available
Unsupervised domain adaptation for person re-identification (Re-ID suffers severe domain discrepancies between source and target domains. To reduce the domain shift caused by the changes of context, camera style, or viewpoint, existing methods in this field fine-tune and adapt the Re-ID model with augmented samples, either through translating sourc...
Preprint
This paper focuses on camouflaged object detection (COD), which is a task to detect objects hidden in the background. Most of the current COD models aim to highlight the target object directly while outputting ambiguous camouflaged boundaries. On the other hand, the performance of the models considering edge information is not yet satisfactory. To...
Article
Interpolating video frames involving large motions remains an elusive challenge. In case that frames involve small and fast-moving objects, conventional feed-forward neural network-based approaches that estimate optical flow and synthesize in-between frames sequentially often result in loss of motion features and thus blurred boundaries. To address...
Article
PurposeThe purpose of this study was to develop a deep learning-based computer-aided diagnosis system for skin disease classification using photographic images of patients. The targets are 59 skin diseases, including localized and diffuse diseases captured by photographic cameras, resulting in highly diverse images in terms of the appearance of the...
Article
Full-text available
Querying and retrieving relevant information still remains a difficult task, one with a relatively high cognitive cost for users, who usually focus only on the first few pages of results. This issue drives effort to support the exploration of search results through clustering and visualization. This paper contributes to this challenge by providing...
Article
Recent advances in person re-identification (re-ID) have led to impressive retrieval accuracy. However, existing re-ID models are challenged by the adversarial examples crafted by adding quasi-imperceptible perturbations. Moreover, re-ID systems face the domain shift issue that training and testing domains are not consistent. In this study, we argu...
Preprint
Full-text available
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem: we can observe only positive examples. Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem. However, such methods have two main drawbacks particularly in large-scale applications; (1...
Article
Full-text available
Background Unsupervised learning can discover various unseen abnormalities, relying on large-scale unannotated medical images of healthy subjects. Towards this, unsupervised methods reconstruct a 2D/3D single medical image to detect outliers either in the learned feature space or from high reconstruction loss. However, without considering continuit...
Preprint
Learning from implicit user feedback is challenging as we can only observe positive samples but never access negative ones. Most conventional methods cope with this issue by adopting a pairwise ranking approach with negative sampling. However, the pairwise ranking approach has a severe disadvantage in the convergence time owing to the quadratically...
Article
Full-text available
Completing a corrupted image by filling in correct structures and reasonable textures for a complex scene remains an elusive challenge. In case that a missing hole involves diverse semantic information, conventional two-stage approaches based on structural information often lead to unreliable structural prediction and ambiguous visual texture gener...
Preprint
Full-text available
Existing inpainting methods have achieved promising performance in recovering defected images of specific scenes. However, filling holes involving multiple semantic categories remains challenging due to the obscure semantic boundaries and the mixture of different semantic textures. In this paper, we introduce coherence priors between the semantics...
Chapter
Unsupervised learning can discover various diseases, relying on large-scale unannotated medical images of healthy subjects. Towards this, unsupervised methods reconstruct a single medical image to detect outliers either in the learned feature space or from high reconstruction loss. However, without considering continuity between multiple adjacent s...
Preprint
Solving cold-start problems is indispensable to provide meaningful recommendation results for new users and items. Under sparsely observed data, unobserved user-item pairs are also a vital source for distilling latent users' information needs. Most present works leverage unobserved samples for extracting negative signals. However, such an optimisat...
Article
Visual question answering (VQA) is a task of answering a visual question that is a pair of question and image. Some visual questions are ambiguous and some are clear, and it may be appropriate to change the ambiguity of questions from situation to situation. However, this issue has not been addressed by any prior work. We propose a novel task, reph...
Chapter
Completing a corrupted image with correct structures and reasonable textures for a mixed scene remains an elusive challenge. Since the missing hole in a mixed scene of a corrupted image often contains various semantic information, conventional two-stage approaches utilizing structural information often lead to the problem of unreliable structural p...
Article
Full-text available
Group re-identification (G-ReID) is an important yet less-studied task. Its challenges not only lie in appearance changes of individuals, but also involve group layout and membership changes. To address these issues, the key task of G-ReID is to learn group representations robust to such changes. Nevertheless, unlike ReID tasks, there still lacks c...
Conference Paper
Full-text available
Person re-identification has received much attention in the last few years, as it enhances the retrieval effectiveness in the video surveillance networks and video archive management. In this paper, we demonstrate a guiding robot with person followers system, which recognizes the follower using a person re-identification technology. It first adopts...
Chapter
Developmental dyslexia is a specific learning disability that is characterized by severe difficulties in learning to read. Amongst various supporting technologies, there are typefaces specially designed for readers with dyslexia. Although recent research shows the effectiveness of these typefaces, the visual characteristics of these typefaces that...
Preprint
Unsupervised crowd counting is a challenging yet not largely explored task. In this paper, we explore it in a transfer learning setting where we learn to detect and count persons in an unlabeled target set by transferring bi-knowledge learnt from regression- and detection-based models in a labeled source set. The dual source knowledge of the two mo...
Article
Image classification using convolutional neural networks (CNNs) outperforms other state-of-the-art methods. Moreover, attention can be visualized as a heatmap to improve the explainability of results of a CNN. We designed a framework that can generate heatmaps reflecting lesion regions precisely. We generated initial heatmaps by using a gradient-ba...
Conference Paper
An efficient and effective person re-identification (ReID) system relieves the users from painful and boring video watching and accelerates the process of video analysis. Recently, with the explosive demands of practical applications, a lot of research efforts have been dedicated to heterogeneous person re-identification (Hetero-ReID). In this pape...
Conference Paper
Full-text available
Pedestrian detection at nighttime is a crucial and frontier problem in surveillance, but has not been well explored by the computer vision and artificial intelligence communities. Most of existing methods detect pedestrians under favorable lighting conditions (e.g. daytime) and achieve promising performances. In contrast, they often fail under unst...
Conference Paper
How to find a person doing an action in a video database is a challenging problem because the result must be correct at an instance level with the specific person doing the appropriate action. Even though there have been many works about face recognition and action recognition, they often focus on only one separate task. In this paper, the problem...
Article
Most recent approaches for the zero-shot cross-modal image retrieval map images from different modalities into a uniform feature space to exploit their relevance by using a pre-trained model. Based on the observation that manifolds of zero-shot images are usually deformed and incomplete, we argue that the manifolds of unseen classes are inevitably...
Preprint
Full-text available
Completing a corrupted image with correct structures and reasonable textures for a mixed scene remains an elusive challenge. Since the missing hole in a mixed scene of a corrupted image often contains various semantic information, conventional two-stage approaches utilizing structural information often lead to the problem of unreliable structural p...
Chapter
Image classification using deep convolutional neural networks (DCNN) has a competitive performance with other state-of-the-art methods. Fundus image classification into disease types is also a promising application domain of DCNN. Typically fundus image classifier is trained using fundus images with labels showing disease types. Such training data...
Article
Full-text available
Most person re-identification (ReID) approaches assume that person images are captured under relatively similar illumination conditions. In reality, long-term person retrieval is common, and person images are often captured under different illumination conditions at different times across a day. In this situation, the performances of existing ReID...
Preprint
Full-text available
This work aims to identify/bridge the gap between Artificial Intelligence (AI) and Healthcare sides in Japan towards developing medical AI fitting into a clinical environment in five years. Moreover, we attempt to confirm the clinical relevance for diagnosis of our research-proven pathology-aware Generative Adversarial Network (GAN)-based medical i...
Article
Full-text available
Scene text localization is a very crucial step in the issue of scene text recognition. The major challenges—such as how there are various sizes, shapes, unpredictable orientations, a wide range of colors and styles, occlusion, and local and global illumination variations—make the problem different from generic object detection. Unlike existing scen...
Article
Visible-infrared person re-identification (RGB-IR ReID) is extremely important for the surveillance applications under poor illumination conditions. Since the difference in the feature representations not only lies in the person’ pose, viewpoint or illumination variations, but also comes from huge spectrum discrepancy, the task becomes practically...
Article
Full-text available
We propose a novel approach to identify the difficulty of visual questions for Visual Question Answering (VQA) without direct supervision or annotations to the difficulty. Prior works have considered the diversity of ground-truth answers of human annotators. In contrast, we analyze the difficulty of visual questions based on the behavior of multipl...
Book
This book constitutes the refereed proceedings of the 13th International Conference on Similarity Search and Applications, SISAP 2020, held in Copenhagen, Denmark, in September/October 2020. The conference was held virtually due to the COVID-19 pandemic. The 19 full papers presented together with 12 short and 2 doctoral symposium papers were carefu...

Network

Cited By