Weihao Xuan’s research while affiliated with Waseda University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (14)


Figure 1. An example of the wildfire occurring in Maui, Hawaii, USA, August 2023. (a) Pre-event optical imagery (© Maxar). (b) Postevent optical image (© Maxar) with land-cover features obscured by smoke from wildfires. (c) Post-event SAR imagery (© Capella Space) unaffected by smoke, clearly showing disaster area.
Figure 2. Geographic distribution of disaster events present in the BRIGHT. Note that the locations of test events in IEEE GRSS DFC 2025 are hidden in this figure.
Summary of the basic information of the BRIGHT dataset, with disaster events listed in chronological order. Information on the test events for IEEE GRSS DFC 2025 is excluded from this table.
Mean co-registration errors obtained from different multi- modal image registration methods.
The mIoU on different events for different DL models. The highest values are highlighted in purple, and the second-highest results are highlighted in teal.
BRIGHT: A globally distributed multimodal building damage assessment dataset with very-high-resolution for all-weather disaster response
  • Preprint
  • File available

January 2025

·

31 Reads

·

·

Olivier Dietrich

·

[...]

·

Disaster events occur around the world and cause significant damage to human life and property. Earth observation (EO) data enables rapid and comprehensive building damage assessment (BDA), an essential capability in the aftermath of a disaster to reduce human casualties and to inform disaster relief efforts. Recent research focuses on the development of AI models to achieve accurate mapping of unseen disaster events, mostly using optical EO data. However, solutions based on optical data are limited to clear skies and daylight hours, preventing a prompt response to disasters. Integrating multimodal (MM) EO data, particularly the combination of optical and SAR imagery, makes it possible to provide all-weather, day-and-night disaster responses. Despite this potential, the development of robust multimodal AI models has been constrained by the lack of suitable benchmark datasets. In this paper, we present a BDA dataset using veRy-hIGH-resoluTion optical and SAR imagery (BRIGHT) to support AI-based all-weather disaster response. To the best of our knowledge, BRIGHT is the first open-access, globally distributed, event-diverse MM dataset specifically curated to support AI-based disaster response. It covers five types of natural disasters and two types of man-made disasters across 12 regions worldwide, with a particular focus on developing countries where external assistance is most needed. The optical and SAR imagery in BRIGHT, with a spatial resolution between 0.3-1 meters, provides detailed representations of individual buildings, making it ideal for precise BDA. In our experiments, we have tested seven advanced AI models trained with our BRIGHT to validate the transferability and robustness. The dataset and code are available at https://github.com/ChenHongruixuan/BRIGHT. BRIGHT also serves as the official dataset for the 2025 IEEE GRSS Data Fusion Contest.

Download


Foundation Models for Remote Sensing and Earth Observation: A Survey

October 2024

·

205 Reads

Remote Sensing (RS) is a crucial technology for observing, monitoring, and interpreting our planet, with broad applications across geoscience, economics, humanitarian fields, etc. While artificial intelligence (AI), particularly deep learning, has achieved significant advances in RS, unique challenges persist in developing more intelligent RS systems, including the complexity of Earth's environments, diverse sensor modalities, distinctive feature patterns, varying spatial and spectral resolutions, and temporal dynamics. Meanwhile, recent breakthroughs in large Foundation Models (FMs) have expanded AI's potential across many domains due to their exceptional generalizability and zero-shot transfer capabilities. However, their success has largely been confined to natural data like images and video, with degraded performance and even failures for RS data of various non-optical modalities. This has inspired growing interest in developing Remote Sensing Foundation Models (RSFMs) to address the complex demands of Earth Observation (EO) tasks, spanning the surface, atmosphere, and oceans. This survey systematically reviews the emerging field of RSFMs. It begins with an outline of their motivation and background, followed by an introduction of their foundational concepts. It then categorizes and reviews existing RSFM studies including their datasets and technical contributions across Visual Foundation Models (VFMs), Visual-Language Models (VLMs), Large Language Models (LLMs), and beyond. In addition, we benchmark these models against publicly available datasets, discuss existing challenges, and propose future research directions in this rapidly evolving field.



Segment Anything with Multiple Modalities

August 2024

·

70 Reads

Robust and accurate segmentation of scenes has become one core functionality in various visual recognition and navigation tasks. This has inspired the recent development of Segment Anything Model (SAM), a foundation model for general mask segmentation. However, SAM is largely tailored for single-modal RGB images, limiting its applicability to multi-modal data captured with widely-adopted sensor suites, such as LiDAR plus RGB, depth plus RGB, thermal plus RGB, etc. We develop MM-SAM, an extension and expansion of SAM that supports cross-modal and multi-modal processing for robust and enhanced segmentation with different sensor suites. MM-SAM features two key designs, namely, unsupervised cross-modal transfer and weakly-supervised multi-modal fusion, enabling label-efficient and parameter-efficient adaptation toward various sensor modalities. It addresses three main challenges: 1) adaptation toward diverse non-RGB sensors for single-modal processing, 2) synergistic processing of multi-modal data via sensor fusion, and 3) mask-free training for different downstream tasks. Extensive experiments show that MM-SAM consistently outperforms SAM by large margins, demonstrating its effectiveness and robustness across various sensors and data modalities.


SynRS3D : A Synthetic Dataset for Global 3D Semantic Understanding from Monocular Remote Sensing Imagery

June 2024

·

124 Reads

Global semantic 3D understanding from single-view high-resolution remote sensing (RS) imagery is crucial for Earth Observation (EO). However, this task faces significant challenges due to the high costs of annotations and data collection, as well as geographically restricted data availability. To address these challenges, synthetic data offer a promising solution by being easily accessible and thus enabling the provision of large and diverse datasets. We develop a specialized synthetic data generation pipeline for EO and introduce SynRS3D, the largest synthetic RS 3D dataset. SynRS3D comprises 69,667 high-resolution optical images that cover six different city styles worldwide and feature eight land cover types, precise height information, and building change masks. To further enhance its utility, we develop a novel multi-task unsupervised domain adaptation (UDA) method, RS3DAda, coupled with our synthetic dataset, which facilitates the RS-specific transition from synthetic to real scenarios for land cover mapping and height estimation tasks, ultimately enabling global monocular 3D semantic understanding based on synthetic data. Extensive experiments on various real-world datasets demonstrate the adaptability and effectiveness of our synthetic dataset and proposed RS3DAda method. SynRS3D and related codes will be available.



Figure 5. The interface of point cloud labeling program for annotating SemanticSTF.
Comparison of state-of-the-art domain adaptation methods on SemanticKITTI→SemanticSTF adaptation. SemanticKITTI serves as the source domain and the entire SemanticSTF including all four weather conditions serves as the target domain.
Class-wise IoU on domain generalization with SemanticKITTI or SynLiDAR as the source and validation set of light fog in SemanticSTF as the target.
Comparison of state-of-the-art domain adaptation methods on SemanticKITTI→SemanticSTF adaptation for light fog.
Comparison of state-of-the-art domain adaptation methods on SemanticKITTI→SemanticSTF adaptation for rain. '-' represents no samples captured in rain in the validation set of SemanticSTF.
3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds

April 2023

·

98 Reads

Robust point cloud parsing under all-weather conditions is crucial to level-5 autonomy in autonomous driving. However, how to learn a universal 3D semantic segmentation (3DSS) model is largely neglected as most existing benchmarks are dominated by point clouds captured under normal weather. We introduce SemanticSTF, an adverse-weather point cloud dataset that provides dense point-level annotations and allows to study 3DSS under various adverse weather conditions. We study all-weather 3DSS modeling under two setups: 1) domain adaptive 3DSS that adapts from normal-weather data to adverse-weather data; 2) domain generalizable 3DSS that learns all-weather 3DSS models from normal-weather data. Our studies reveal the challenge while existing 3DSS methods encounter adverse-weather data, showing the great value of SemanticSTF in steering the future endeavor along this very meaningful research direction. In addition, we design a domain randomization technique that alternatively randomizes the geometry styles of point clouds and aggregates their embeddings, ultimately leading to a generalizable model that can improve 3DSS under various adverse weather effectively. The SemanticSTF and related codes are available at \url{https://github.com/xiaoaoran/SemanticSTF}.




Citations (6)


... While these methods have shown promising results, they also require full mask annotations, which are difficult and time-consuming to obtain. As a result, some approaches [16,17,18] have begun exploring label-efficient strategies for SAM. WeSAM [16] and SlotSAM [18] leverage self-training techniques [19] with weak labels, such as points, boxes, and polygons, to generate pseudo-labels, allowing the network to predict complete masks. ...

Reference:

PointSAM: Pointly-Supervised Segment Anything Model for Remote Sensing Images
CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model
  • Citing Chapter
  • November 2024

... Adverse weather conditions constitute prevalent and indispensable scenarios in the context of autonomous driving. Several datasets tailored for adverse weather environments have been proposed, primarily oriented towards advanced tasks such as 3D segmentation [34], [35], [36] and detection [37]. These works can be divided into two aspects, the first of which is the collection of real data on adverse weather conditions. ...

3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds
  • Citing Conference Paper
  • June 2023

... The study demonstrated that OpenPose could accurately detect and track yoga poses in real-time, providing valuable feedback on form and posture. Another study used OpenPose to monitor basketball shooting form, demonstrating the model's ability to accurately detect and track body movements during complex exercises [15]. ...

MaskVO: Self-Supervised Visual Odometry with a Learnable Dynamic Mask
  • Citing Conference Paper
  • January 2022

... The DeGroot model is the most fundamental one, where each individual updates his/her opinion by taking a weighted average of all neighbor's opinions [32]. The works [33] and [34] have respectively studied how a networked SIS and SIR models are coupled with the DeGroot model. However, the DeGroot model is over-simplified as it always leads to a consensus on a strongly connected graph, while persistent disagreements often happen in the real world. ...

On a Discrete-Time Network SIS Model with Opinion Dynamics
  • Citing Conference Paper
  • December 2021

... The model uses complex spatial relationships and multi-view information to model effectively to improve the prediction accuracy of different types of vehicles in urban traffic. In [48], a multi-agent interactive prediction method was proposed for complex driving scenarios, aiming to effectively predict the behaviour of different vehicles in challenging driving situations and improve traffic safety. In [49], a multitask learning framework based on graph neural networks was presented for predicting the interaction behaviour of heterogeneous traffic participants in urban roads. ...

Multi-agent Interactive Prediction under Challenging Driving Scenarios
  • Citing Conference Paper
  • April 2021

... As an example of context involving feedback loops that complicate the identification of the determinants of health-related behaviors at a population level, we consider the decision to wear a mask. For simplicity, the existing population level behavior-disease models largely consider Susceptible-Infected-Susceptible (SIS) or Susceptible-Infected-Removed (SIR) models for disease dynamic [14,23,[33][34][35]. However, in a disease context with a significant proportion of asymptomatic infectives, the perceived risk of a person depends on incomplete information including the observed incidence (rather than the true incidence), and estimates of the disease prevalence publicized in media favored in the interpretation domain. ...

On a Network SIS Model with Opinion Dynamics
  • Citing Article
  • January 2020

IFAC-PapersOnLine