Article

Data Set Production and Evaluation for Semantic Segmentation of 3D CG Images by H.265/HEVC

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

As one of purpose of study on image segmentation, we are able to consider whether between object and background region can be judged or not. Thus far, we studied on multi-view 3D CG image quality assessment in the case of occurring the coded degradation for object or background region with 8 viewpoints parallax barrier method. As a result, it is enough to obtain knowledge for subjective quality assessment. However, for objective quality assessment, this is estimation by using data mining. On the other hand, by spreading of deep learning theory, we are able to use deep convolutional neural network easier comparatively. In this paper, we produced, proposed, evaluated, and discussed for data set and platform appropriately in order to carry out semantic segmentation with high efficiency of 3D CG images encoded and decoded by H.265/HEVC.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Weeds are among the major factors that could harm crop yield. Site-specific weed management has become an effective tool to control weed and machine vision combined with image processing is an effective approach for weed detection. In this work, an encoder-decoder deep learning network was investigated for pixel-wise semantic segmentation of crop and weed. Different input representations including different color space transformations and color indices were compared to optimize the input of the network. Three image enhancement methods were investigated to improve model robustness against different lighting conditions. The results show that for images without enhancement, color space transformation and vegetation indices without NIR (Near Infrared) information did not improve the segmentation results, while inclusion of NIR information significantly improved the segmentation accuracy, indicating the effectiveness of NIR information for precise segmentation under weak lighting condition. Image enhancement improved the image quality and consequently the robustness of segmentation models against different lighting conditions. The best MIoU value for pixel-wise segmentation was 88.91% and the best mean accuracy of object-wise segmentation was 96.12%. The deep network and image enhancement methods applied in this work provided promising segmentation results for weed detection and did not need large amount of data for model training, which is suitable for site-specific weed management.
Article
Full-text available
Scene classification and semantic segmentation are two important research directions in computer vision. They are widely used in the research of automatic driving and human-computer interaction. The purpose of the scene classification is to use the image classification to determine the category of the scene in an image by analyzing the background and the target object, while semantic segmentation aims to classify the image at the pixel level and mark the position and semantic information of the scene unit. In this paper, we aimed to train the semantic segmentation neural network in different scenarios to obtain the models with same number of scene categories, which then be used to process the images. During the process of actual test, the semantic segmentation dataset was firstly divided into three categories based on the scene classification algorithm. Then the semantic segmentation neural network is trained under three scenarios, and three semantic segmentation network models are obtained accordingly. To test the property of our methods, the semantic segmentation models we got were selected to treat other pictures, and the results getting from the performance of scene-aware semantic segmentation were much better than semantic segmentation without considering categories. Our study provided an essential improvement of semantic segmentation by adding category information into consideration, which will be helpful to obtain more precise models for further picture analysis.
Article
Full-text available
The liver is a common site for the development of primary (i.e., originating from the liver, e.g., hepatocellular carcinoma) or secondary (i.e., spread to the liver, e.g., colorectal cancer) tumor. Due to its complex background, heterogeneous and diffusive shape, automatic segmentation of tumor remains a challenging task. So far, only the interactive method has been adopted to obtain acceptable segmentation results of liver tumor. In this paper, we design an Attention Hybrid Connection Network (AHCNet) architecture which combines soft and hard attention mechanism and long and short skip connections. We also propose a cascade network based on liver localization network, liver segmentation network and tumor segmentation network to cope with this challenge. Simultaneously, the joint dice loss function is proposed to train the liver localization network to obtain the accurate 3D liver bounding box, and focal binary cross-entropy is used as a loss function to fine-tune the tumor segmentation network for detecting more potentially malignant tumor and reduce false positives. Our framework is trained using 110 cases in the LiTS dataset and extensively evaluated by the 20 cases in the 3DIRCADb dataset and 117 cases in a Clinical dataset, which indicates that the proposed method can achieve faster network convergence and accurate semantic segmentation, and further demonstrate the proposed method has good clinical value.
Conference Paper
Full-text available
The automatic digitizing of paper maps is a significant and challenging task for both academia and industry. As an important procedure of map digitizing, the semantic segmentation section is mainly relied on manual visual interpretation with low efficiency. In this study, we select urban planning maps as a representative sample and investigate the feasibility of utilizing U-shape fully convolutional based architecture to perform end-to-end map semantic segmentation. The experimental results obtained from the test area in Shibuya district, Tokyo, demonstrate that our proposed method could achieve a very high Jaccard similarity coefficient of 93.63% and an overall accuracy of 99.36%. For implementation on GPGPU and cuDNN, the required processing time for the whole Shibuya district can be less than three minutes. The results indicate the proposed method can serve as a viable tool for urban planning map semantic segmentation task with high accuracy and efficiency.
Article
Degraded image semantic segmentation is of great importance in autonomous driving, highway navigation systems, and many other safety-related applications and it was not systematically studied before. In general, image degradations increase the difficulty of semantic segmentation, usually leading to decreased semantic segmentation accuracy. Therefore, performance on the underlying clean images can be treated as an upper bound of degraded image semantic segmentation. While the use of supervised deep learning has substantially improved the state of the art of semantic image segmentation, the gap between the feature distribution learned using the clean images and the feature distribution learned using the degraded images poses a major obstacle in improving the degraded image semantic segmentation performance. The conventional strategies for reducing the gap include: 1) Adding image-restoration based pre-processing modules; 2) Using both clean and the degraded images for training; 3) Fine-tuning the network pre-trained on the clean image. In this paper, we propose a novel Dense-Gram Network to more effectively reduce the gap than the conventional strategies and segment degraded images. Extensive experiments demonstrate that the proposed Dense-Gram Network yields stateof-the-art semantic segmentation performance on degraded images synthesized using PASCAL VOC 2012, SUNRGBD, CamVid, and CityScapes datasets.
Article
We present a random forest framework that learns the weights, shapes, and sparsities of feature representations for real-time semantic segmentation. Typical filters (kernels) have predetermined shapes and sparsities and learn only weights. A few feature extraction methods fix weights and learn only shapes and sparsities. These predetermined constraints restrict learning and extracting optimal features. To overcome this limitation, we propose an unconstrained representation that is able to extract optimal features by learning weights, shapes, and sparsities. We, then, present the random forest framework that learns the flexible filters using an iterative optimization algorithm and segments input images using the learned representations. We demonstrate the effectiveness of the proposed method using a hand segmentation dataset for hand-object interaction and using two semantic segmentation datasets. The results show that the proposed method achieves real-time semantic segmentation using limited computational and memory resources.
Article
We can come to approach on super-resolution processing based on deep learning by appearing deep learning tools. These performance are shown by applying the only deep learning theory for super-resolution processing. However, we consider that the optimal condition and design for super-resolution processing are achieved better by improving these algorithms and setting parameter appropriately. In this paper, first, we carried out experiments on optimal condition and design of super-resolution processing for the multi-view 3D images encoded and decoded by H.265/HEVC, focused on structure of convolutional neural network by using Chainer. And then, we assessed for the generated images quality objectively, and compare to each image. Finally, we discussed for experimental results.
Article
Video Object Segmentation, and video processing in general, has been historically dominated by methods that rely on the temporal consistency and redundancy in consecutive video frames. When the temporal smoothness is suddenly broken, such as when an object is occluded, or some frames are missing in a sequence, the result of these methods can deteriorate significantly or they may not even produce any result at all. This paper explores the orthogonal approach of processing each frame independently, i.e disregarding the temporal information. In particular, it tackles the task of semi-supervised video object segmentation: the separation of an object from the background in a video, given its mask in the first frame. We present Semantic One-Shot Video Object Segmentation (OSVOS-S), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one shot). We show that instance level semantic information, when combined effectively, can dramatically improve the results of our previous method, OSVOS. We perform experiments on two recent video segmentation databases, which show that OSVOS-S is both the fastest and most accurate method in the state of the art.
Article
Previously, it is not obvious to what extent was accepted for the assessors when we see the 3D image (including multi-view 3D) which the luminance change may affect the stereoscopic effect and assessment generally. We think that we can conduct a general evaluation, along with a subjective evaluation, of the luminance component using both the S-CIELAB color space and CIEDE2000. In this study, first, we performed three types of subjective evaluation experiments for contrast enhancement in an image by using the eight viewpoints parallax barrier method. Next, we analyzed the results statistically by using a support vector machine (SVM). Further, we objectively evaluated the luminance value measurement by using CIEDE2000 in the S-CIELAB color space. Then, we checked whether the objective evaluation value was related to the subjective evaluation value. From results, we were able to see the characteristic relationship between subjective assessment and objective assessment.
Article
Many previous studies on image quality assessment of 3D still images or video clips have been conducted. In particular, it is important to know the region in which assessors are interested or on which they focus in images or video clips, as represented by the ROI (Region of Interest). For multi-view 3D images, it is obvious that there are a number of viewpoints; however, it is not clear whether assessors focus on objects or background regions. It is also not clear on what assessors focus depending on whether the background region is colored or gray scale. Furthermore, while case studies on coded degradation in 2D or binocular stereoscopic videos have been conducted, no such case studies on multi-view 3D videos exist, and therefore, no results are available for coded degradation according to the object or background region in multi-view 3D images. In addition, in the case where the background region is gray scale or not, it was not revealed that there were affection for gaze point environment of assessors and subjective image quality. In this study, we conducted experiments on the subjective evaluation of the assessor in the case of coded degradation by JPEG coding of the background or object or both in 3D CG images using an eight viewpoint parallax barrier method. Then, we analyzed the results statistically and classified the evaluation scores using an SVM.