Hideo Saito

Hideo Saito
Keio University | Keidai · Department of Information and Computer Science

Professor

About

610
Publications
55,152
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,641
Citations
Citations since 2016
130 Research Items
1685 Citations
2016201720182019202020212022050100150200250300
2016201720182019202020212022050100150200250300
2016201720182019202020212022050100150200250300
2016201720182019202020212022050100150200250300
Introduction

Publications

Publications (610)
Article
Full-text available
Detecting surgical tools is an essential task for analyzing and evaluating surgical videos. However, most studies focus on minimally invasive surgery (MIS) and cataract surgery. Mainly because of a lack of a large, diverse, and well-annotated dataset, research in the area of open surgery has been limited so far. Open surgery video analysis is chall...
Chapter
Recording surgery in operating rooms is one of the essential tasks for education and evaluation of medical treatment. However, recording the fields which depict the surgery is difficult because the targets are heavily occluded during surgery by the heads or hands of doctors or nurses. We use a recording system which multiple cameras embedded in the...
Article
Full-text available
Gesture and multimodal communication researchers typically annotate video data manually, even though this can be a very time-consuming task. In the present work, a method to detect gestures is proposed as a fundamental step towards a semi-automatic gesture annotation tool. The proposed method can be applied to RGB videos and requires annotations of...
Article
Full-text available
Multi-camera multi-person (MCMP) tracking and re-identification (ReID) are essential tasks in safety, pedestrian analysis, and so on; however, most research focuses on outdoor scenarios because they are much more complicated to deal with occlusions and misidentification in a crowded room with obstacles. Moreover, it is challenging to complete the t...
Preprint
BACKGROUND As cervical myelopathy (CM) is a progressive disease, early detection and intervention are essential for its mitigation. While several screening methods exist, they are difficult to understand for community-dwelling people, and the equipment required to set up the test environments is expensive. Thus, a simple screening system is necessa...
Preprint
BACKGROUND Cervical myelopathy (CM) causes several symptoms such as clumsiness of the hands, and often requires surgery. Screening and early diagnosis of CM are important because some patients are unaware of their early symptoms and consult a surgeon only after their condition has become severe. The 10-second hand grip and release (10-s) test is co...
Article
Background Cervical myelopathy (CM) causes several symptoms such as clumsiness of the hands and often requires surgery. Screening and early diagnosis of CM are important because some patients are unaware of their early symptoms and consult a surgeon only after their condition has become severe. The 10-second hand grip and release test is commonly u...
Preprint
Full-text available
Trajectory prediction has gained great attention and significant progress has been made in recent years. However, most works rely on a key assumption that each video is successfully preprocessed by detection and tracking algorithms and the complete observed trajectory is always available. However, in complex real-world environments, we often encoun...
Article
Full-text available
In this paper, we propose a framework for 3D human pose estimation using a single 360° camera mounted on the user’s wrist. Perceiving a 3D human pose with such a simple setup has remarkable potential for various applications ( e . g ., daily-living activity monitoring, motion analysis for sports training). However, no existing method has tackled...
Preprint
We present a novel framework of motion tracking from event data using implicit expression. Our framework use pre-trained event generation MLP named implicit event generator (IEG) and does motion tracking by updating its state (position and velocity) based on the difference between the observed event and generated event from the current state estima...
Article
This work presents a method for identifying surgical field states using time-of-flight (ToF) sensors equipped with a surgical light. It is important to understand the surgical field state in a smart surgical room. In this study, we aimed to identify surgical field states by using 28 ToF sensors with a surgical light installed on each. In the experi...
Preprint
Full-text available
Diminished reality is a technology that aims to remove objects from video images and fills in the missing region with plausible pixels. Most conventional methods utilize the different cameras that capture the same scene from different viewpoints to allow regions to be removed and restored. In this paper, we propose an RGB-D image inpainting method...
Article
Full-text available
Recording surgery is an important technique for education and the evaluation of medical treatments. However, capturing targets such as the surgical field, surgical tools, and the surgeon’s hands, is almost impossible since these targets are heavily occluded by the surgeon’s head and body during a surgery. We used a recording system in which multipl...
Article
Full-text available
Study design: Cross-sectional study. Objective: To develop a binary classification model for cervical myelopathy (CM) screening based on a machine learning algorithm using Leap Motion (Leap Motion, San Francisco, CA), a novel noncontact sensor device. Summary of background data: Progress of CM symptoms are gradual and cannot be easily identifi...
Article
Full-text available
Due to the recent technological advances in inertial measurement units (IMUs), many applications for the measurement of human motion using multiple body-worn IMUs have been developed. In these applications, each IMU has to be attached to a predefined body segment. A technique to identify the body segment on which each IMU is mounted allows users to...
Article
There has been a substantial amount of research involving computer methods and technology for the detection and recognition of diabetic foot ulcers (DFUs), but there is a lack of systematic comparisons of state-of-the-art deep learning object detection frameworks applied to this problem. DFUC2020 provided participants with a comprehensive dataset c...
Article
Full-text available
Trajectory prediction has gained great attention and significant progress has been made in recent years. However, most works rely on a key assumption that each video is successfully preprocessed by detection and tracking algorithms and the complete observed trajectory is always available. However, in complex real-world environments, we often encoun...
Article
In consideration of interdependency between image sensing/recognition (computer vision, CV) and 3D image synthesis (computer graphics, CG) in visual computing, Keio University, Department of Information and Computer Science, reorganized its first course on CV and CG in the undergraduate program into a series of three courses on visual computing in...
Chapter
Object handover is a fundamental and essential capability for robots interacting with humans in many applications such as household chores. In this challenge, we estimate the physical properties of a variety of containers with different fillings such as container capacity and the type and percentage of the content to achieve collaborative physical...
Article
Full-text available
The ability to recognize and identify terrain characteristics is an essential function required for many autonomous ground robots such as social robots, assistive robots, autonomous vehicles, and ground exploration robots. Recognizing and identifying terrain characteristics is challenging because similar terrains may have very different appearances...
Article
Full-text available
Detecting surgical tools is an essential task for the analysis and evaluation of surgical videos. However, in open surgery such as plastic surgery, it is difficult to detect them because there are surgical tools with similar shapes, such as scissors and needle holders. Unlike endoscopic surgery, the tips of the tools are often hidden in the operati...
Article
Full-text available
The key to an accurate understanding of terrain is to extract the informative features from the multi-modal data obtained from different devices. Sensors, such as RGB cameras, depth sensors, vibration sensors, and microphones, are used as the multi-modal data. Many studies have explored ways to use them, especially in the robotics field. Some paper...
Chapter
Cervical myelopathy (CM) is a pathology of the spinal cord that causes upper limb disorders. CM is often screened by conducting the 10-s grip and release (G&R) test, which mainly focuses on hand dysfunction caused by CM. This test has patients repeat gripping and releasing their hands as quickly as possible. Spine surgeons observe the quickness of...
Chapter
Assistive technology is increasingly important as the senior population grows. The purpose of this study is to develop a means of preventing fatal injury by monitoring the movements of the elderly and sounding an alarm if an accident occurs. We present a method of detecting an anomaly in a first-person’s gait from an egocentric video. Followed by t...
Article
Full-text available
Purpose Measuring range of motion (ROM) in the wrist joint is an essential part of hand and wrist functional evaluations, especially before and after surgery. However, accurate measurements require experience and time. To reduce patient and surgeon burdens related to ROM measurement, a smartphone-based system, which enables participants to measure...
Book
This book constitutes the refereed proceedings of the 17th International Conference on Virtual Reality and Augmented Reality, EuroVR 2020, held in Valencia, Spain, in November 2020. The 12 full papers were carefully reviewed and selected from 35 submissions. The papers are organized in topical sections named: Perception, Cognition and Behaviour; Tr...
Preprint
The ability to both recognize and discover terrain characteristics is an important function required for many autonomous ground robots such as social robots, assistive robots, autonomous vehicles, and ground exploration robots. Recognizing and discovering terrain characteristics is challenging because similar terrains may have very different appear...
Preprint
There has been a substantial amount of research on computer methods and technology for the detection and recognition of diabetic foot ulcers (DFUs), but there is a lack of systematic comparisons of state-of-the-art deep learning object detection frameworks applied to this problem. With recent development and data sharing performed as part of the DF...
Article
State-of-the-art methods for diminished reality propagate pixel information from a keyframe to subsequent frames for real-time inpainting. However, these approaches produce artifacts, if the scene geometry is not sufficiently planar. In this article, we present InpaintFusion, a new real-time method that extends inpainting to non-planar scenes by co...
Article
Full-text available
Human motion capture (MoCap) plays a key role in healthcare and human–robot collaboration. Some researchers have combined orientation measurements from inertial measurement units (IMUs) and positional inference from cameras to reconstruct the 3D human motion. Their works utilize multiple cameras or depth sensors to localize the human in three dimen...
Chapter
Recording surgery in operating rooms is an essential task for education and evaluation of medical treatment. However, recording the desired targets, such as the surgery field, surgical tools, or doctor’s hands, is difficult because the targets are heavily occluded during surgery. We use a recording system in which multiple cameras are embedded in t...
Chapter
Full-text available
Real-time camera pose estimation is one of the indispensable technologies for Augmented Reality (AR). While a large body of work in Visual Odometry (VO) has been proposed for AR, practical challenges such as scale ambiguities and accumulative errors still remain especially when we apply VO to large-scale scenes due to limited hardware and resources...
Chapter
Diminished reality is a technology that aims to remove objects from video images and fills in the missing region with plausible pixels. Most conventional methods utilize the different cameras that capture the same scene from different viewpoints to allow regions to be removed and restored. In this paper, we propose an RGB-D image inpainting method...
Article
Full-text available
This paper presents a method for estimating the six Degrees of Freedom (6DoF) pose of texture-less primitive-shaped objects from depth images. As the conventional methods for object pose estimation require rich texture or geometric features to the target objects, these methods are not suitable for texture-less and geometrically simple shaped object...
Chapter
Estimating accurate depth from an RGB image in any environment is challenging task in computer vision. Recent learning based method using deep Convolutional Neural Networks (CNNs) have driven plausible appearance, but these conventional methods are not good at estimating scenes that have a pure rotation of camera, such as in-plane rolling. This mov...
Article
Full-text available
Supplemental Digital Content is available in the text.
Article
Point cloud registration is a key problem for robotics and computer vision communities. This represents estimating a rigid transform which aligns one point cloud to another. Iterative closest point (ICP) is a well-known classical method for this problem, yet it generally achieves high alignment only when the source and template point cloud are most...
Article
Full-text available
We present a novel 3D reconstruction system that can generate a stable triangle mesh using data from multiple RGB-D sensors in real time for dynamic scenes. The first part of the system uses moving least squares (MLS) point set surfaces to smooth and filter point clouds acquired from RGB-D sensors. The second part of the system generates triangle m...
Article
Background: An individual's gait is a key factor for consideration in evaluating their overall health. Several medical studies have demonstrated the correlation between gait and incidence rate of diseases, mortality, and risk of fall. However, gait is only occasionally evaluated during medical visits, which may delay the detection of health proble...
Conference Paper
Full-text available
In this paper, we propose a method to predict the future shot direction in a tennis match using pose information and player position. As far as we know, there is no work that deals with such a predictive task, so there is no shot direction dataset as yet. Therefore, using a YouTube tennis match video, we construct an time of impact and shot directi...
Conference Paper
The second ACM International Workshop on Multimedia Content Analysis in Sports (ACM MMSports'19) is held in Nice, France on October 25th, 2019 co-located with the ACM International Conference on Multimedia 2019 (ACM Multimedia 2019). The goal of this workshop is to bring together researchers and practitioners from academia and industry to address c...
Chapter
Full-text available
Superquadrics are one of the ideal shape representations for adapting various kinds of primitive shapes with a single equation. This paper revisits the task of representing a 3D human body with multiple superquadrics. As a single superquadric surface can only represent symmetric primitive shapes, we present a method that segments the human body int...
Chapter
Recent semantic segmentation systems have achieved significant improvement by performing pixel-wise training with hierarchical features using deep convolutional neural network models. While the learning process usually requires pixel-level annotated images, it is difficult to get desirable amounts of fine-labeled data and thus the training set size...
Article
This paper presents a method to estimate a time-sequential trajectory of the center of mass (CoM) of an athlete from a multi-view set of cameras. Collecting the CoM typically requires large-scale measuring systems or attaching sensors to the athletes. To mitigate such hardware limitations, the present study takes a multi-view video-based approach....
Preprint
This work addresses the task of open world semantic segmentation using RGBD sensing to discover new semantic classes over time. Although there are many types of objects in the real-word, current semantic segmentation methods make a closed world assumption and are trained only to segment a limited number of object classes. Towards a more open world...
Preprint
Full-text available
We present DetectFusion, an RGB-D SLAM system that runs in real-time and can robustly handle semantically known and unknown objects that can move dynamically in the scene. Our system detects, segments and assigns semantic class labels to known objects in the scene, while tracking and reconstructing them even when they move independently in front of...
Conference Paper
Today, cameras have become smaller and cheaper and can be utilized in various scenes. We took advantage of that to develop a thumb tip wearable device to estimate joint angles of a thumb as measuring human finger postures is important in terms of human-computer interface and to analyze human behavior. The device we developed consists of three small...