Chenbo Zhao’s research while affiliated with The University of Tokyo and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (13)


Mobility Patterns of Trailers Around International Container Terminals: A Case Study in Sendai Port, Japan
  • Conference Paper

December 2024

·

7 Reads

Huixuan Zheng

·

Chenbo Zhao

·

·

[...]

·

Naoya Fujiwara

Enhanced large-scale building extraction evaluation: developing a two-level framework using proxy data and building matching
  • Article
  • Full-text available

July 2024

·

72 Reads

·

3 Citations

Deep learning-based building extraction methods have widespread applications in diverse fields. However, the evaluation of large-scale extraction results remains challenging, due to traditional evaluation metrics rely on manually created ground-truth samples and the lack of comprehensive reference-building data for developing countries. To address these problems, we proposed a two-level framework for evaluating large-scale footprint extraction. First, we utilised global open-source population and land use data as the proxy data, to assess grid-level completeness for the areas with insufficient reference data. Second, we introduced an improved two-way area-overlapping method to match the extracted footprints with the reference buildings, thereby enabling a comprehensive evaluation of the study region. Tested in Hyogo Prefecture and Numazu City, Japan, the results demonstrated a 2.6-% improvement in grid classification accuracy and an increase of 0.53 in the completeness correlation, compared with the results obtained using a single proxy indicator. Moreover, the optimised matching method achieved an outstanding semantic matching accuracy of 99%, with high efficiency and robustness in multi-scale matching. Therefore, the proposed approach can effectively evaluate large-scale footprint extraction results and interpret their semantic relationship with actual buildings, applicable globally regardless of the availability of reference building datasets. ARTICLE HISTORY

Download

Fig. 2. (a) Web questionnaire survey screen (smartphone) and targeted subjective impression evaluation survey items in this study, (b) frequency distribution of respondents by age.
Fig. 3. Distribution of images of the streets in Setagaya ward, Tokyo, by land-use area considered in the questionnaire survey.
Fig. 4. Architecture of the subjective perceptions model.
Fig. 4. Architecture of the subjective perceptions model.
Fig. 5. SegFormer architecture.

+4

Evaluating the subjective perceptions of streetscapes using street-view images

July 2024

·

79 Reads

·

36 Citations

Landscape and Urban Planning

Developing a model to evaluate urban streetscapes based on subjective perceptions is important for quantitative understanding. However, previous studies have only considered limited types of subjective perceptions, neglecting the relationships between them. Further, accurately measuring subjective perception with low computational costs for large-scale urban regions at high spatial resolutions has been difficult. We present a deep-learning-based multilabel classification model that can measure 22 subjective perceptions scores from street-view images. This model uses the results of a web questionnaire survey encompassing 22 subjective perceptions, with 8.8 million responses. Our model demonstrates high accuracy (0.80–0.91) in measuring subjective perception scores from street-view images and achieves low computational cost by training on 22 subjective perception relationships. The 22 subjective perceptions were analyzed using PCA and k-means analysis. By categorizing the 22 subjective perceptions into a two-dimensional space visualized and grouped into distinct groups—positive, negative, calm, and lively—we unearthed vital insights into the intricate nuances of human perception. In addition, the study used semantic segmentation to extract landscape elements from street-view images and applied ℓ1-regularized sparse modeling to identify the landscape elements structurally correlating with each subjective perception class. The analysis revealed that only seven out of nineteen landscape elements significantly correlated with subjective impressions, and these effects varied by class. Notably, sky coverage positively influences positive subjective perceptions, such as attractiveness and calmness, but negatively affects lively impressions. The proposed model can be used to map the overall image of a city and identify landscape design issues in community development design.



Fig. 2. Overall workflow and network architecture of this study. (a) overall workflow (b) stable diffusion architecture (c) details of stable diffusion denoising module, LoRA and ControlNet Architecture.
Fig. 3. Detailed module collaboration of LoRA/ControlNet with stable diffusion.
Fig. 6. Segmentation visualized results of the 3 groups of experiments in ConvNeXt algorithm.
Label Freedom: Stable Diffusion for Remote Sensing Image Semantic Segmentation Data Generation

November 2023

·

1,663 Reads

·

13 Citations

Remote sensing image semantic segmentation of land use, benefitted from the development of deep learning and consequently made considerable progress in terms of inferencing accuracy and speed. However, the effective training of semantic segmentation models for remote sensing imagery necessitates extensively detailed pixel-level annotations, and gathering such data is both time-intensive and laborious. Thus, this study implemented low-rank adaptation on a stable diffusion algorithm to learn the distribution of the pixel-level annotations in case of the LoveDA dataset. Consequently, the annotation-image pairs were used to train the remote sensing image generator based on stable diffusion guided by ControlNet. We proposed a stable diffusion based approach, which can generate image-annotation pairs from scratch. The generated annotation and image pairs achieved a high accuracy of 0.520 mean intersection-over-union on LoveDA dataset, which is close to the original data training result of 0.539 mIoU. Furthermore, the mixed training using generated and original data achieved 0.542 mIoU, thereby demonstrating the data augmentation function of our approach. This study provided a solution for the high-cost pixel-level annotation issue, and thus, exhibited the potential of artificial intelligence generated content.



Quantitative land price analysis via computer vision from street view images

April 2023

·

288 Reads

·

17 Citations

Engineering Applications of Artificial Intelligence

Land price is an important economic factor in producing meaningful references for regional planners by assisting them in urban planning, economic decision-making, and land resource allocation. However, related studies in land price analysis were mainly focused on the factors of site area and plot ratio, analysis of the potential impact of streetscape factors and human subjective perception on the land price has been lacking, regardless of the impact on supply and demand relationship. Therefore, this study developed a new approach for estimating and analyzing land prices through deep learning that considered the streetscape and human subjective perception factors. In the estimation part, we developed a fine-grained end-to-end deep learning model, the input is street view images, and the output is land prices. In the analysis part, we extracted the semantic segmentation results and human subjective perception scores from the images and combined them with the results of land price estimation. We then introduced a combination of quantitative analysis using the gradient-weighted class activation mapping and L1-based sparse linear regression to model the relationship between the streetscape and the human subjective perception quantitatively. The gradient-weighted class activation mapping was used to determine which categories of pixels deep learning relied on to output the results of land price estimation quantitatively. Combined with segmentation results, we implemented L1-based sparse linear regression and quantitatively determined the importance of the streetscape factors for land prices. Overall, our deep learning model achieved 77.99 % accuracy by only using street views in estimating the land price, and we illustrated the impacts of the streetscape and perception on the land price by showing that perception scores are more important than streetscape factors such as road and mountain in the streetscape. Comfortable, lived-in feel in perception have the most important impact on land price estimation.


People Flow Trend Estimation Approach and Quantitative Explanation Based on the Scene Level Deep Learning of Street View Images

February 2023

·

196 Reads

·

1 Citation

People flow trend estimation is crucial to traffic and urban safety planning and management. However, owing to privacy concerns, the collection of individual location data for people flow statistical analysis is difficult; thus, an alternative approach is urgently needed. Furthermore, the trend in people flow is reflected in streetscape factors, yet the relationship between them remains unclear in the existing literature. To address this, we propose an end-to-end deep-learning approach that combines street view images and human subjective score of each street view. For a more detailed people flow study, estimation and analysis were implemented using different time and movement patterns. Consequently, we achieved a 78% accuracy on the test set. We also implemented the gradient-weighted class activation mapping deep learning visualization and L1 based statistical methods and proposed a quantitative analysis approach to understand the land scape elements and subjective feeling of street view and to identify the effective elements for the people flow estimation based on a gradient impact method. In summary, this study provides a novel end-to-end people flow trend estimation approach and sheds light on the relationship between streetscape, human subjective feeling, and people flow trend, thereby making an important contribution to the evaluation of existing urban development.


Deep Learning Approach for Classifying the Built Year and Structure of Individual Buildings by Automatically Linking Street View Images and GIS Building Data

January 2023

·

274 Reads

·

18 Citations

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

The built year and structure of individual buildings are crucial factors for estimating and assessing potential earthquake and tsunami damage. Recent advances in sensing and analysis technologies allow the acquisition of high-resolution street view images (SVIs) that present new possibilities for research and development. In this study, we developed a model to estimate the built year and structure of a building using omnidirectional SVIs captured using an onboard camera. We used geographic information system (GIS) building data and SVIs to generate an annotated built year and structure dataset by developing a method to automatically combine the GIS data with images of individual buildings cropped through object detection. Furthermore, we trained a deep learning model to classify the built year and structure of buildings using the annotated image dataset based on a deep convolutional neural network (DCNN) and a vision transformer (ViT). The results showed that SVI accurately predicts the built year and structure of individual buildings using ViT (overall accuracies for structure = 0.94 [3 classes] and 0.96 [2 classes] and for age = 0.68 [6 classes] and 0.90 [3 classes]). Compared with DCNN-based networks, the proposed Swin transformer based on ViT architectures effectively improves prediction accuracy. The results indicate that multiple high-resolution images can be obtained for individual buildings using SVI, and the proposed method is an effective approach for classifying structures and determining building age. The automatic, accurate, and large-scale mapping of the built year and structure of individual buildings can help develop specific disaster prevention measures.


Development of a Large-Scale Roadside Facility Detection Model Based on the Mapillary Dataset

December 2022

·

1,754 Reads

·

9 Citations

The detection of road facilities or roadside structures is essential for high-definition (HD) maps and intelligent transportation systems (ITSs). With the rapid development of deep-learning algorithms in recent years, deep-learning-based object detection techniques have provided more accurate and efficient performance, and have become an essential tool for HD map reconstruction and advanced driver-assistance systems (ADASs). Therefore, the performance evaluation and comparison of the latest deep-learning algorithms in this field is indispensable. However, most existing works in this area limit their focus to the detection of individual targets, such as vehicles or pedestrians and traffic signs, from driving view images. In this study, we present a systematic comparison of three recent algorithms for large-scale multi-class road facility detection, namely Mask R-CNN, YOLOx, and YOLOv7, on the Mapillary dataset. The experimental results are evaluated according to the recall, precision, mean F1-score and computational consumption. YOLOv7 outperforms the other two networks in road facility detection, with a precision and recall of 87.57% and 72.60%, respectively. Furthermore, we test the model performance on our custom dataset obtained from the Japanese road environment. The results demonstrate that models trained on the Mapillary dataset exhibit sufficient generalization ability. The comparison presented in this study aids in understanding the strengths and limitations of the latest networks in multiclass object detection on large-scale street-level datasets.


Citations (8)


... Therefore, instance segmentation methods that rely on natural images will not yield optimal results in remote sensing applications. In order to achieve optimal segmentation results for remote sensing datasets, researchers have developed instance segmentation models tailored to remote sensing image characteristics [9][10][11][12][13]. Xu et al. [14] proposed a remote sensing image instance segmentation method based on BoxInst from the perspective of weak supervision, which fully utilizes the existing rich OBB annotations and reduces the annotation burden. ...

Reference:

A New Instance Segmentation Model for High-Resolution Remote Sensing Images Based on Edge Processing
Enhanced large-scale building extraction evaluation: developing a two-level framework using proxy data and building matching

... Furthermore, combining street view data with deep learning technologies has become a mainstream method for evaluating residents' subjective perceptions (Ji et al. 2021;Ogawa et al. 2024;. Kruse et al. (2021) used a deep residual network (ResNet) model trained on tagged street view data to evaluate human perceptions of urban playability. ...

Evaluating the subjective perceptions of streetscapes using street-view images
  • Citing Article
  • July 2024

Landscape and Urban Planning

... Therefore, evaluating the subjective perceptions of each location with high spatial resolution for the entire city is challenging. To overcome this limitation, subjective impression evaluation methods, using street-view images and crowdsourcing, have been proposed (Dubey et al., 2016;Imadegawa et al., 2023;Naik et al., 2014;Quercia et al., 2014;Rossetti et al., 2019;Xu et al. 2022;Yao et al., 2021;Zhang et al., 2018). These approaches have collected a considerable amount of data on subjective perceptions via web surveys that present street-view images; subsequently, models have been developed to evaluate these (Dubey et al., 2016;Zhang et al., 2018). ...

Predicting Impression Evaluation of Building Exterior Appearance Using Street Image Big Data and Deep Learning
  • Citing Conference Paper
  • December 2023

... However, these things alone are not enough, so it is important to generate not only highquality images, but also image-annotation pairs that are more practical. For example, Zhao et al [89] implemented a twostage fine-tuning of the SD model to enable the generation from noise to annotations and then from annotations to images.Toker et al [90] extended the standard diffusion model to a joint probabilistic model for images and their corresponding labels, which enabled the simultaneous generation of labels and images. ...

Label Freedom: Stable Diffusion for Remote Sensing Image Semantic Segmentation Data Generation

... 性 [27] 、天气及昼夜差异 [28] 等变量对公众视觉 感知的作用或调节效应。同时,众包视觉感 知方法还被用于推测和分析城市地价 [29] 、犯罪 率 [30] 、城市开发潜力 [ [22] A visual perception reasoning model based on DCNN [22] 3 基于众包视觉感知方法的东京高密度木构居住区层次聚类分析 [22] Hierarchical clustering analysis of the high-density wooden residential areas in Tokyo based on crowdsourced visual perception method [22] 练 马 区 Optimization and evaluation of street scenes based on crowdsourced visual perception data and GAN model [34] 5 基于众包视觉感知数据和SD模型的街道场景优化和评估 [35] Optimization and evaluation of street scenes based on crowdsourced visual perception data and SD [Methods] The research employs a case study approach, analyzing multiple urban research papers focused on Tokyo, Japan, that utilize crowdsourced visual perception methods. It systematically investigates how these methods, ...

Quantitative land price analysis via computer vision from street view images

Engineering Applications of Artificial Intelligence

... (2023) employ a multi-city material categories (brick, stucco, etc.) using geotagged SVI perspective views, aligning visual patterns with ground-truth material information for scalable building classification. Combining the aspects of building age and material, Ogawa et al. (2023) introduced a method to detect and geolocate buildings from panoramic images, automatically annotating them with objective building data in Kobe, Japan. Furthermore, building type or usage -a critical attribute in urban remote sensing and land use classification frameworks -is another important aspect in street-level research (Kang et al., 2018;Zhao et al., 2021;Lindenthal and Johnson, 2021;Ramalingam and Kumar, 2023;Li et al., 2025b). ...

Deep Learning Approach for Classifying the Built Year and Structure of Individual Buildings by Automatically Linking Street View Images and GIS Building Data

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

... While numerous open-source databases exist that cater to specific domains of building information, they are often limited to a single aspect of building data. The foundational building attribute knowledge within these databases is typically derived from field surveys or footprint data extracted from remote sensing imagery [10]. To date, no open-source database has been developed that integrates street view imagery to provide a comprehensive, visually-driven repository encompassing the spatial. ...

Large-Scale Building Footprint Extraction from Open-Sourced Satellite Imagery via Instance Segmentation Approach