
Junji YamatoKogakuin University · Department of Information/Communication
Junji Yamato
Dr. Eng.
About
100
Publications
13,037
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,423
Citations
Introduction
computer vision, pattern recognition, machine learning, artificial intelligence
Skills and Expertise
Additional affiliations
April 2016 - present
July 2014 - March 2016
January 2002 - June 2014
Education
September 1996 - June 1998
April 1983 - March 1990
Publications
Publications (100)
We present an application of machine-learning (ML) techniques to source selection in the optical transient survey data with Hyper Suprime-Cam (HSC) on the Subaru telescope. Our goal is to select real transient events accurately and in a timely manner out of a large number of false candidates, obtained with the standard difference-imaging method. We...
We present an application of machine-learning (ML) techniques to source selection in the optical transient survey data with Hyper Suprime-Cam (HSC) on the Subaru telescope. Our goal is to select real transient events accurately and in a timely manner out of a large number of false candidates, obtained with the standard difference-imaging method. We...
This paper targets small- to medium-sized-group face-to-face conversations where each person wears a dualview camera, consisting of inward- and outward-looking cameras, and presents an almost fully automatic but accurate off-line gaze analysis framework that does not require users to perform any calibration steps. Our collective firstperson vision...
Techniques that use nonverbal behaviors to predict turn-changing situations-such as, in multiparty meetings, who the next speaker will be and when the next utterance will occur-have been receiving a lot of attention in recent research. To build a model for predicting these behaviors we conducted a research study to determine whether respiration cou...
In multiparty meetings, participants need to predict the end of the speaker's utterance and who will start speaking next, as well as consider a strategy for good timing to speak next. Gaze behavior plays an important role in smooth turn-changing. This article proposes a prediction model that features three processing steps to predict (I) whether tu...
This paper presents a research framework for understanding the empathy that arises between people while they are conversing. By focusing on the process by which empathy is perceived by other people, this paper aims to develop a computational model that automatically infers perceived empathy from participant behavior. To describe such perceived empa...
We propose a method for relighting fluorescent objects using a multiband image system. Fluorescence is often present in everyday articles and its optical properties make it harder to reproduce than pure reflectance. Decomposing colors into separate fluorescent and reflective components is required for accurate color reproduction. In our method, bi-...
We report the discovery of 10 supernova candidates from a transient survey with Subaru/Hyper Suprime-Cam (HSC). Our Subaru/HSC open-use observations were performed on 19 Aug 2015 UT, under poor weather condition with 1.1-1.5 arcsec seeing. The candidates were detected in real time using a quick image subtraction system (ATel #6291). Candidate scree...
An understanding of the mechanisms involved in face-to-face communication will contribute to designing advanced video conferencing and dialogue systems. Turn-taking, the situation where the speaker changes, is especially important in multi-party meetings. For smooth turn-taking, the participants need to predict who will start speaking next and to c...
This paper extends the affective computing research field by introducing first-person vision to automatic conversation analysis. We target medium-sized-party face-to-face conversations where each person wears inward-looking and outward-looking cameras. We demonstrate that the fundamental techniques required for group gaze analysis, i.e. speaker det...
To build a model for predicting the next speaker and the start time of the next utterance in multi-party meetings, we performed a fundamental study of how respiration could be effective for the prediction model. The results of the analysis reveal that a speaker inhales more rapidly and quickly right after the end of a unit of utterance in turn-keep...
We propose a new POC (phase-only correlation)-based highaccuracy correspondence detection method for multi-channel images. There is the possibility of improving detection accuracy because conventional POC-based methods do not use color information. In the proposed method, a normalized cross spectrum (or cross-phase spectrum) and weight are calculat...
This paper experimentally shows how the center wavelength and spectral power distribution (SPD) of displayed color is related to chromostereopsis. Chromostereopsis - a visual illusion whereby the impression of depth is conveyed in two-dimensional color images - can be applied to glassless binocular stereopsis by controlling color saturation even wh...
To realize a conversational interface where an agent system can smoothly communicate with multiple persons, it is imperative to know how the start timing of speaking is decided. In this research, we demonstrate a relationship between gaze transition patterns and the start timing of next speaking against the end of the last speaking in multi-party m...
A novel system, called MM+Space, is presented for recreating multiparty face-to-face conversation scenes in the real world. It aims to display and playback pre-recorded conversations as if the people were talking in front of the viewer(s). This system consists of multiple projectors and transparent screens, which display the life-size faces of peop...
In multi-party meetings, participants need to predict the end of the speaker's utterance and who will start speaking next, and to consider a strategy for good timing to speak next. Gaze behavior plays an important role for smooth turn-taking. This paper proposes a mathematical prediction model that features three processing steps to predict (I) whe...
Conversational social video is becoming a worldwide trend. Video communication allows a more natural interaction, when aiming to share personal news, ideas, and opinions, by transmitting both verbal content and nonverbal behavior. However, the automatic analysis of natural mood is challenging, since it is displayed in parallel via voice, face, and...
We report the archiving project of Kyoto Gion Festival using high-resolution multiband imaging camera. We have been developing a two-shot six-band image capturing system for recording the color and physical properties of early modern tapestries. In an experiment, an image of tapestry whose image size was 2700 M-pixel was synthesized. The resolution...
Targeting multiparty conversations, the present study aims to elucidate how an observer will tend to perceive others' emotional states, develops a computational model that realizes the automatic inferencing of the observer's perception tendency. This paper proposes a probabilistic model that automatically discovers the correlation between perceptio...
In digital archiving for cultural heritage preservation, in the medical field, and in some industrial fields, the high-fidelity reproduction of color, gloss, texture, three-dimensional (3-D) shape, and movement is very important. Multi-spectrum imaging can provide accurate color reproduction. Although several types of multi-spectral camera systems...
This study analyzes emotions established between people while interacting in face-to-face conversation. By focusing on empathy and antipathy, especially the process by which they are perceived by external observers, this paper aims to elucidate the tendency of their perception and from it develop a computational model that realizes the automatic es...
For accurate color and spectral reflectance reproduction, we propose a novel eleven-band acquisition system using a nine-view stereo camera. The proposed system consists of eight monochrome cameras with eight different narrow band-pass filters and an RGB camera. To generate an eleven-band image, the shapes of the nine captured stereo images are tra...
We propose a high-resolution and multi-spectral capturing for digital archiving of large 3D woven cultural artifacts. In the field of digital archive, it is important to measure, model, and represent the shape, color, and texture of the cultural artifact at high-definition, not only physical appearance but haptic impression. The many of the decorat...
A novel enhancement for the memory-based particle filter is proposed for visual pose tracking under severe occlusions. The enhancement is the addition of a detection-based memory acquisition mechanism. The memory-based particle filter, called M-PF, is a particle filter that predicts prior distributions from past history of target state stored in me...
This paper addresses the task of mining typical behavioral patterns from small group face-to-face interactions and linking them to social-psychological group variables. Towards this goal, we define group speaking and looking cues by aggregating automatically extracted cues at the individual and dyadic levels. Then, we define a bag of nonverbal patt...
In the digital archiving for cultural heritage preservation, in the medical field, and in some industrial fields, high-fidelity color reproduction is very important. Multiband imaging technology is a solution for accurate color reproduction. Although several types of multiband camera systems have been developed, all of them are multi-shot systems a...
This paper describes a D measurement system with wheel-rail, a capturing system with multi-band camera, and a 3D modeling of large woven cultural artifacts, and show a high-resolution D model with multi-band image.
A novel system is presented for reconstructing multiparty face-to-face conversation scenes in the real world through the use of dynamic displays that augment human head motion. This system aims to display and playback recorded conversations as if the remote people were talking in front of the viewer. It consists of multiple projectors and transpare...
This paper presents a research framework for understanding communicative emotions aroused between people while interacting in conversation. Our advance is to consider how these emotions are perceived by other people, rather than what the target's internal state really is. Because such perception is subjective, we introduce the concept of using a co...
This paper presents our real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to recognize automatically “who is speaking what” in an online manner for meeting assistance. Our system continuously captures the utterances and face poses of each speaker using a microphone array and an omni-direc...
We have measured large 3D woven cultural artifacts with rail-wheel 3D scanning system, and captured high-resolution images with a two-shot type 6-band image capturing system, and modeled woven cultural artifacts in 3D. This paper describes a digital archiving and large-scale visuo-haptic display of large 3D woven cultural artifacts.
In order to accurate color reproduction, a novel six-band stereoscopic video acquisition and visualization system using multi-spectral and stereo imaging has been proposed. The proposed system consists of two consumer-model digital cameras and an interference filter whose spectral transmittance is comb shaped. In the process of generating a six-ban...
A novel system is presented for reconstructing, in the real world, multiparty face-to-face conversation scenes; it uses dynamics projection to augment human head motion. This system aims to display and playback pre-recorded conversations to the viewers as if the remote people were taking in front of them. This system consists of multiple projectors...
This work investigates a new challenging problem: how to exactly recognize facial expression as early as possible, while most works generally focus on improving the recognition rate of facial expression recognition. The features of facial expressions in their early stage are unfortunately very sensitive to noise due to their low intensity. So, we p...
This paper presents a novel research framework for the estimation of emotional interactions produced between meeting participants. The types of emotional interaction targeted in this paper are empathy, antipathy, and unconcern. We define here emotional interaction as a brief contiguous event wherein a pair exchange emotional messages via verbal and...
A stereo one-shot six-band image-capturing system that combines multiband and stereo imaging techniques has been developed. This system can acquire both spectral color information and depth information at the same time. It worked well for two-dimensional objects that have a wavy structure like a tapestry. In this paper, we discuss the accuracy of c...
Multiband imaging technology for highly accurate color reproduction has been developed by NTT Communication Science Laboratories. In this article, we describe the principle of our image-capturing and color reproduction techniques using multiband images. In a recent project, our systems generated images containing over 100 megapixels (maximum: 2 gig...
A novel memory-based particle filter is proposed to achieve robust visual tracking of a target’s pose even with large variations
in target’s position and rotation, i.e. large appearance changes. The memory-based particle filter (M-PF) is a recent extension
of the particle filter, and incorporates a memory-based mechanism to predict prior distributi...
We have proposed a feature extraction method that is based on category-dependent processing for the recognition of characters exhibiting both deformation and degradation. Our
In the digital archiving for cultural heritage preservation, in the medical field, and in some industrial fields, high-fidelity reproduction of color, gloss, texture, and shape are very important. Multiband or full-spectrum imaging technology is a solution for accurate color reproduction. Although several types of multi band camera systems have bee...
Recent studies in the field of human vision science suggest that the human responses to the stimuli on a visual display are non-deterministic. People may attend to different locations on the same visual input at the same time. Based on this knowledge, we propose a new stochastic model of visual attention by introducing a dynamic Bayesian network to...
This paper describes an adaptive feature extraction method that exploits category-specific information to overcome both image degradation and deformation in character recognition. When recognizing multiple fonts, geometric features such as directional information of strokes are often used but they are weak against the deformation and degradation th...
This paper proposes a novel facial expression recognizer and describes its application to group meeting analysis. Our goal is to automatically discover the interpersonal emotions that evolve over time in meetings, e.g. how each person feels about the others, or who affectively influences the others the most. As the emotion cue, we focus on facial e...
This demo presents a realtime system for analyzing group meetings. Targeting round-table meetings, this system employs an omnidirectional camera-microphone system. The goal of this system is to automatically discover "who is talking to whom and when". To that purpose, the face pose/position of meeting participants are tracked on panorama images acq...
SUMMARY We propose a 2D display and camera arrangement for video communication systems that supports both spatial information be- tween distant sites and user mobility. The implementation of this arrange- ment is called the "surrounding back screen method." The method enables users to freely come from and go into other users' spaces and provides ev...
Recent studies in signal detection theory suggest that the human responses to the stimuli on a visual display are nondeterministic. People may attend to different locations on the same visual input at the same time. Constructing a stochastic model of human visual attention would be promising to tackle the above problem. This paper proposes a new me...
This paper proposes a new method for achieving precise video segmentation without any supervision or interaction. The main contributions of this report include 1) the introduction of fully automatic segmentation based on the maximum a posteriori (MAP) estimation of the Markov random field (MRF) with graph cuts and saliency-driven priors and 2) the...
Abstract A novel particle filter, the Memory-based Particle Fil- ter (M-PF), is proposed that can visually track moving ob- jects that have complex dynamics. We aim to realize robust- ness against abrupt object movements,and quick recovery from tracking failure caused by factors such as occlusions. To that end, we eliminate the Markov assumption fr...
This report proposes a new stochastic model of visual attention to predict the likelihood of where humans typically focus on a video scene. The proposed model is composed of a dynamic Bayesian network that simulates and combines a person¿s visual saliency response and eye movement patterns to estimate the most probable regions of attention. Dynami...
This paper describes an adaptive feature extraction method that exploits category specific information to overcome both image degradation and deformation. When recognizing multiple fonts, geometric features such as directional information of strokes are often used but they are weak against the deformation and degradation that appear in videos and n...
This paper presents a realtime system for analyzing group meetings that uses a novel omnidirectional camera-microphone system. The goal is to automatically discover the visual fo- cus of attention (VFOA), i.e. "who is looking at whom", in addition to speaker diarization, i.e. "who is speaking and when". First, a novel tabletop sensing device for ro...
This paper presents a novel face tracker and verifies its effectiveness for analyzing group meetings. In meeting scene analysis,
face direction is an important clue for assessing the visual attention of meeting participants. The face tracker, called STCTracker
(Sparse Template Condensation Tracker), estimates face position and pose by matching face...
Recent studies in signal detection theory suggest that the human responses to the stimuli on a visual display are nondeterministic. People may attend to different locations on the same visual input at the same time. To predict the likelihood of where humans typically focus on a video scene, we propose a new stochastic model of visual attention by i...
We propose a novel method for pose-invariant facial expression recognition from monocular video sequences that combines stochastic and determinis- tic search processes. We use the simple face model called variable-intensity template, which can be prepared with very little time and effort. We tackle the two issues found in previous work on the varia...
In this paper, we present t-Room, the next generation video communication system we are developing. Our approach is to build rooms with identical layouts, including walls of display panels on which users and physical or virtual objects are all shown at life-size. In this way, the user space enclosed by t-Room's surrounding displays can be shared as...
We propose a novel video mediation method that immerses remote users in a virtual shared space. In the implemented system using this method, video cameras and screens surround users, and on the screens placed behind them remote users and physical or virtual objects are all shown in life-size. Unlike conventional video conferencing systems, the meth...
In this paper, we propose a method for pose-invariant facial expression recognition from monocular video sequences. The advantage
of our method is that, unlike existing methods, our method uses a very simple model, called the variable-intensity template,
for describing different facial expressions, making it possible to prepare a model for each per...
A novel probabilistic framework is proposed for analyzing cross-modal nonverbal interactions in multiparty face-to-face conversations. The goal is to determine "who responds to whom, when, and how" from multimodal cues including gaze, head gestures, and utterances. We formulate this prob- lem as the probabilistic inference of the causal relationshi...
Figure 5 shows that the dimensionality of the ESA causes differences in the recommendation's effect on user decision-making. The 3D body was not always superior to the 2D body for recommendation, and on-screen agents seem to have weak points, too. Those differences cannot be explained only by the advantages or disadvantage of pointing. In the 2D wo...
The t-Room is a remote computer supported cooperative work (CSCW) system that we are develop- ing. Our approach is to build rooms with an identical layout, including walls of display screens on which users and physical or virtual objects are all shown at life size, and to provide symmetry of awareness and immersion in each other's physical space. T...
This paper provides an update to the January 2006 letter on "ambient intelligence", describing this year's achievements and the latest demonstration systems. The Ambient Intelligence Project (also known by the code name project Mushroom) aims to bridge the boundaries between technological fields and thus cover the entire field of communication scie...
In this paper, we demonstrate a novel poster image matching system for wireless multimedia applications. We propose a method that incorporates both color and layout information of the poster image to achieve a robust performance in poster im- age matching. We apply both color compensation and back- ground separation to extract a poster from an imag...
A novel method based on a probabilistic model for con- versation scene analysis is proposed that can infer conversa- tion structure from video sequences of face-to-face commu- nication. Conversation structure represents the type of con- versation such as monologue or dialogue, and can indicate who is talking / listening to whom. This study assumes...
This paper presents experiments conducted to evaluate an automatic video editing system, founded on vision-based head tracking, that clearly conveys face-to-face multiparty conversations, such as meetings, to viewers. Systems that archive meetings and teleconferences to effectively facilitate human communication are attracting considerable interest...
A novel measure for automatically quantifying the amount of interpersonal influence present in face-to- face conversations is proposed based on the visual- attention patterns of the part icipants as inferred from video sequences. First, we focus on the gaze of the participants as an indicator of addressing / listening behavior and build a probabili...
未来をさがそう もくじ
1.なくなってほしいな
(カギ、迷子、忘れ物や落とし物、わからない言葉、犯罪、交通事故、聞こえないこと・見えないこと、病気、手術、老化、ゴミ)
こんなふうに進化してきた(1) [電話]
2.なくならないでほしいな
(友だち、漢字、生き物、、年賀状、動物園、文房具、生鮮食品、砂浜、死の実感、季節感)
こんなふうに進化してきた(2) [テレビ]
3.どっちがいいのかな
(都会といなか、学校、映画館、自動車の運転、きっぷ、病院、お年寄りの世話、痛み、天気予報、審判)
こんなふうに進化してきた(3) [コンピュータ]
4.どうなっているのかな
(お金、発電所、テレビ放送、CD・MD・DVD、新聞、デパート、健康診断、会議、ガソリンスタンド、畑、漁、自家用車、ネチケット...
A novel probabilistic framework is proposed for inferring the structure of conversation in face-to-face multiparty communication, based on gaze patterns, head directions and the presence/absence of utterances. As the structure of conversation, this study focuses on the combination of participants and their participation roles. First, we assess the...
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998. Includes bibliographical references (p. 67-69). by Junji Yamato. M.S.
This paper presents an automatic video editing system based on head tracking for archiving meetings. Systems that archive meetings are attracting considerable interest. Conventional systems use a fixed-viewpoint camera and simple camera selection based on participants' utterances. However, conventional systems fail to adequately convey who is talki...
This paper presents an automatic video editing system based on head tracking for multiparty conversations. Archiving meetings is attracting considerable interest. Conventional systems use a fixed-viewpoint camera and simple camera selection based on participants' utterances. However, conventional systems fail to adequately convey to the viewers who...
A novel probabilistic framework is proposed for inferring gaze patterns and the struc- ture of conversation in face-to-face multiparty communication, based on head directions and the presence/absence of utterances of participants. First, we define three classes of conversational regimes, which are characterized by the topology of the gaze pattern;...
This paper presents an automatic video editing system based on head tracking for multiparty conversations. Systems that record meetings and those that support teleconferences are attracting considerable interest. Conventional systems use a fixed-viewpoint camera and simple camera selection based on participants' utterances. However, conventional sy...
This paper compares the effect of a robot's and on-screen agent's recommendations on human decision-making using a quantitative evaluation method. We are interested in whether a robot's physical body produces some differences in the effect or not. Previous research investigated the advantage of a physical body; however, the advantage was not clarif...
We have developed a small cylindrical display for anthropomorphic agents that communicate with multiple users in a 3D environment. A previously reported cylindrical display was dark with poor contrast in the lower part of the screen because the pixel density is much lower than in the upper part. We improved the uniformity of the pixel density by us...
In our pursuit of ways to quantitatively evaluate communication between humans and robots, we recently focused on the effect of shared attention on human decision-making. We used a head robot that can make facial expressions and has human face tracking capability, and designed the interaction so that the robot often looked at the same objects the s...