Keiji Yanai

Keiji Yanai
University of Electro-Communications | UEC

About

229
Publications
46,998
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,925
Citations

Publications

Publications (229)
Chapter
Semantic segmentation models require a large number of images with pixel-level annotations for training, which is a costly problem. In this study, we propose a method called StableSeg that infers region masks of any classes without needs of additional training by using an image synthesis foundation model, Stable Diffusion, pre-trained with five bil...
Preprint
Full-text available
Recent one-stage transformer-based methods achieve notable gains in the Human-object Interaction Detection (HOI) task by leveraging the detection of DETR. However, the current methods redirect the detection target of the object decoder, and the box target is not explicitly separated from the query embeddings, which leads to long and hard training....
Chapter
Virtual fitting, in which a person’s image is changed to an arbitrary clothing image, is expected to be applied to shopping sites and videoconferencing. In real-time virtual fitting, image-based methods using a knowledge distillation technique can generate high-quality fitting images by inputting only the image of arbitrary clothing and a person wi...
Chapter
Cross-modal recipe retrieval aims to exploit the relationships and accomplish mutual retrieval between recipe images and texts, which is clear for human but arduous to formulate. Although many previous works endeavored to solve this problem, most works did not efficiently exploit the cross-modal information among recipe data. In this paper, we pres...
Article
Full-text available
The field of Neural Style Transfer (NST) has led to interesting applications that enable us to transform reality as human beings perceive it. Particularly, NST for material translation aims to transform the material of an object into that of a target material from a reference image. Since the target material (style) usually comes from a different o...
Preprint
Full-text available
In this paper, we present a cross-modal recipe retrieval framework, Transformer-based Network for Large Batch Training (TNLBT), which is inspired by ACME~(Adversarial Cross-Modal Embedding) and H-T~(Hierarchical Transformer). TNLBT aims to accomplish retrieval tasks while generating images from recipe embeddings. We apply the Hierarchical Transform...
Chapter
The existing methods on video synthesis have succeeded in generating higher quality videos by using guide information such as human pose skeletons, segmentation masks and optical flows as auxiliary information. Some existing video generation methods on human motion adopts a two-step video generation consisting of generation of pose sequences and vi...
Preprint
Full-text available
Human-object interaction (HOI) detection as a downstream of object detection tasks requires localizing pairs of humans and objects and extracting the semantic relationships between humans and objects from an image. Recently, one-stage approaches have become a new trend for this task due to their high efficiency. However, these approaches focus on d...
Article
Recent works of real-time semantic segmentation, remove or make use of light decoders from dense deep neural networks to achieve fast inference speed. This strategy helps to achieve real-time performance; however, the accuracy is significantly compromised in comparison to non-real-time methods. In this paper, we introduce two key modules aimed to d...
Article
Full-text available
Vision-induced gustatory manipulation interfaces can help people with dietary restrictions feel as if they are eating what they want by modulating the appearance of the alternative foods they are eating in reality. However, it is still unclear whether vision-induced gustatory change persists beyond a single bite, how the sensation changes over time...
Chapter
In recent years, multi-task learning (MTL) for image translation tasks has been actively explored. For MTL image translation, a network consisting of a shared encoder and multiple task-specific decoders is commonly used. In this case, half parts of the network are task-specific, which brings a significant increase in the number of parameters when t...
Chapter
A dog which assists rescue activity in the scene of disasters such as earthquakes and landslides is called a “disaster rescue dog” or just a “rescue dog”. In Japan where earthquakes happen frequently, a research project on “Cyber-Rescue” is being organized for more efficient rescue activities. In the project, to analyze the activities of rescue dog...
Chapter
Currently, many segmentation image datasets are open to the public. However, only a few open segmentation image dataset of food images exists. Among them, UEC-FoodPix is a large-scale food image segmentation dataset which consists of 10,000 food images with segmentation masks. However, it contains some incomplete mask images, because most of the se...
Chapter
The field of Neural Style Transfer (NST) has led to interesting applications that enables to transform the reality as human beings perceive. Particularly, NST for material translation aims to change the material (texture) of an object to a different material from a desired image. In order to generate more realistic results, in this paper, we propos...
Preprint
Full-text available
In the research community of continuous hand gesture recognition (HGR), the current publicly available datasets lack real-world elements needed to build responsive and efficient HGR systems. In this paper, we introduce a new benchmark dataset named IPN Hand with sufficient size, variation, and real-world elements able to train and evaluate deep neu...
Preprint
Full-text available
In this paper, we tackle a challenging domain conversion task between photo and icon images. Although icons often originate from real object images (i.e., photographs), severe abstractions and simplifications are applied to generate icon images by professional graphic designers. Moreover, there is no one-to-one correspondence between the two domain...
Chapter
Unsupervised image-to-image translation such as CycleGAN has received considerable attention in recent research. However, when handling large images, the quality of generated images are not in good quality. Progressive Growing GAN has proved that progressively growing of GANs could generate high pixels images. However, if we simply combine PG-metho...
Chapter
Continual learning is training of a single identical network with multiple tasks sequentially. In general, naive continual learning brings severe catastrophic forgetting. To prevent it, several methods of continual learning for Deep Convolutional Neural Networks (CNN) have been proposed so far, most of which aim at image classification tasks. In th...
Chapter
We usually predict how objects will move in the near future in our daily lives. However, how do we predict? In this paper, to address this problem, we propose a GAN-based network to predict the near future for fluid object domains such as cloud and beach scenes. Our model takes one frame and predict future frames. Inspired by the self-attention mec...
Chapter
Images generated from Cycle-Consistent Adversarial Network (CycleGAN) become blurry especially in areas with complex edges because of loss of edge information in downsampling of encoders. To solve this problem, we design a new model called ED-CycleGAN based on original CycleGAN. The key idea is using a pre-trained encoder: training an Encoder-Decod...
Article
Full-text available
The objective of this paper was the development of a content-based image retrieval system, using siamese and triplet convolutional neural networks. These networks were used to generate visual descriptors, extracting semantic information from two images (siamese) or three images (triplet) at the same time. Then, a similarity learning was done, encod...
Preprint
To minimize the annotation costs associated with the training of semantic segmentation models, researchers have extensively investigated weakly-supervised segmentation approaches. In the current weakly-supervised segmentation methods, the most widely adopted approach is based on visualization. However, the visualization results are not generally eq...
Conference Paper
Full-text available
We have been studying augmented reality (AR)-based gustatory manipulation interfaces and previously proposed a gustatory manipulation interface using generative adversarial network (GAN)-based real time image-to-image translation. Unlike three-dimensional (3D) food model-based systems that only change the color or texture pattern of a particular ty...
Conference Paper
To estimate food calorie accurately from food images, accurate food image segmentation is needed. So far no large-scale food image segmentation datasets which have pixel-wise labels exists. In this paper, we added segmentation masks to the food images in the existing dataset, UEC-Food100, semi-automatically. To estimate segmentation masks, we revis...
Conference Paper
Some recent smartphones such as iPhone Xs have pair of cameras which can be used as stereo cameras on their backside. Regarding iPhone with iOS11 or more, the official API provides the function to estimate depth information from two backside cameras in the real-time way. By taking advantage of this function, we have developed an iOS app, "DepthCalo...
Conference Paper
In recent years, thanks to the development of generative adversarial networks (GAN), it has become possible to generate food images. However, the quality is still low and it is difficult to generate appetizing and delicious-looking images. In the latest GAN study, StyleGAN enabled high-level feature separation and stochastic variation of generated...
Conference Paper
provides a summary and overview of the 5th International Workshop on Multimedia Assisted Dietary Management.
Conference Paper
In recent years, a large number of images are being posted on SNS. The users often synthesize or modify their photos before uploading them. However, the task of synthesizing and modifying photos requires a lot of time and skill. In this demo, we demonstrate easy and fast image synthesis and modification through "sketch-based food image generation''...
Chapter
Full-text available
The surveillance of Aedes aegypti and Aedes albopictus mosquito to avoid the spreading of arboviruses that cause Dengue, Zika and Chikungunya becomes more important, because these diseases have greatest repercussions in public health in the significant extension of the world. Mosquito larvae identification methods require special equipment, skillfu...
Article
To minimize the annotation costs associated with training semantic segmentation models and object detection models, weakly supervised detection and weakly supervised segmentation approaches have been extensively studied. However most of these approaches assume that the domain between training and testing is the same, which at times results in consi...
Article
In recent years, a rise in healthy eating has led to various food management applications which have image recognition function to record everyday meals automatically. However, most of the image recognition functions in the existing applications are not directly useful for multiple-dish food photos and cannot automatically estimate food calories. M...
Chapter
In recent years, deep learning has attracted attention not only as a method on image recognition but also as a technique for image generation and transformation. Above all, a method called Style Transfer is drawing much attention which can integrate two photos into one integrated photo regarding their content and style. Although many extended works...
Chapter
In this paper, we study about font generation and conversion. The previous methods dealt with characters as ones made of strokes. On the contrary, we extract features, which are equivalent to the strokes, from font images and texture or pattern images using deep learning, and transform the design pattern of font images. We expect that generation of...
Conference Paper
Full-text available
We propose a novel gustatory manipulation interface which uti- lizes the cross-modal effect of vision on taste elicited with aug- mented reality (AR)-based real-time food appearance modulation using a generative adversarial network (GAN). Unlike existing sys- tems which only change color or texture pattern of a particular type of food in an inflexi...
Conference Paper
In most of the cases, the estimated calories are just associated with the estimated food categories, or the relative size compared to the standard size of each food category which are usually provided by a user manually. In addition, in the case of calorie estimation based on the amount of meal, a user conventionally needs to register a size-known...
Conference Paper
In recent years, a large number of food photos are being posted globally on SNS. To obtain many views or "likes", attractive photos should be posted. However, some casual foods are served with utensils on a plate or a bowl at restaurants, which spoils attractiveness of meal photos. Especially in Japan where ramen noodle is the most popular casual f...
Conference Paper
In this demo, we demonstrate "Real-time Food Category Change'' based on a Conditional Cycle GAN (cCycle GAN) with a large-scale food image data collected from the Twitter Stream. Conditional Cycle GAN is an extension of CycleGAN, which enables "Food Category Change'' among ten kinds of typical foods served in bowl-type dishes such as beef rice bowl...
Article
Weakly supervised segmentation has drawn considerable attention, because of the high costs associated with the creation of pixel-wise annotated image datasets that are used for training fully supervised segmentation models. We propose a weakly supervised semantic segmentation method using CNN-based class-specific saliency maps and fully connected C...
Conference Paper
In recent years, a rise in healthy eating has led to various food management applications, which have image recognition to automatically record meals. However, most image recognition functions in existing applications are not directly useful for multiple-dish food photos and cannot automatically estimate food calories. Meanwhile, methodologies on i...
Conference Paper
This paper describes "food image transformation" based on a conditional cycleGAN[A3] (cCycleGAN) with a large-scale food image data collected from the Twitter stream. A cCycleGAN is an extension of CycleGAN, which enables "food category transfer" among 10 types of foods and retain the shape of a given food. We experimentally show that 200 and 30,00...
Conference Paper
Recently, image generation by Deep Convolutional Neural Network has been studied widely by many researchers. In this paper, we describe CNN-based image generation on food images. Especially, we focus on image generation using conditional Generative Adversarial Network (cGAN) with a large-scale dataset. In the experiments, we trained cGAN with a "ra...
Article
Recently, mobile applications for recording everyday meals draw much attention for self dietary. However, most of the applications return food calorie values simply associated with the estimated food categories, or need for users to indicate the rough amount of foods manually. In fact, it has not been achieved to estimate food calorie from a food p...
Article
Full-text available
Bottom-up and top-down visual cues are two types of information that helps the visual saliency models. These salient cues can be from spatial distributions of the features (space-based saliency) or contextual/task-dependent features (object-based saliency). Saliency models generally incorporate salient cues either in bottom-up or top-down norm sepa...
Conference Paper
Image-based food calorie estimation is crucial to diverse mobile applications for recording everyday meal. However, some of them need human help for calorie estimation, and even if it is automatic, food categories are often limited or images from multiple viewpoints are required. Then, it is not yet achieved to estimate food calorie with practical...
Conference Paper
In this paper, we propose a conditional fast neural style transfer network. We extend the network proposed as a fast neural style transfer network by Johnson et al. [1] so that the network can learn multiple styles at the same time. To do that, we add a conditional input which selects a style to be transferred out of the trained styles. In addition...

Network

Cited By