Content uploaded by Dineng Zhao
Author content
All content in this area was uploaded by Dineng Zhao on Jan 07, 2025
Content may be subject to copyright.
MuSRFM: Multiple scale resolution fusion based precise and robust satellite
derived bathymetry model for island nearshore shallow water regions using
sentinel-2 multi-spectral imagery
Xiaoming Qin
a,b
, Ziyin Wu
a,b,c,*
, Xiaowen Luo
b
, Jihong Shang
b
, Dineng Zhao
b
,
Jieqiong Zhou
b
, Jiaxin Cui
b,d
, Hongyang Wan
b
, Guochang Xu
e
a
Ocean College, Zhejiang University, Zhoushan, Zhejiang 316021, China
b
Key Laboratory of Submarine Geosciences, Second Institute of Oceanography, MNR, Hangzhou, Zhejiang 310012, China
c
School of Oceanography, Shanghai Jiao Tong University, Shanghai 200030, China
d
School of Ocean Sciences, China University of Geosciences (Beijing), Beijing, 100083, China
e
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
ARTICLE INFO
Keywords:
Satellite Derived Bathymetry
Multi-Spectral Imagery
Deep Learning
Multiple Resolution Fusion
ABSTRACT
The multi-spectral imagery based Satellite Derived Bathymetry (SDB) provides an efcient and cost-effective
approach for acquiring bathymetry data of nearshore shallow water regions. Compared with conventional pix-
elwise inversion models, Deep Learning (DL) models have the theoretical capability to encompass a broader
receptive eld, automatically extracting comprehensive spatial features. However, enhancing spatial features by
increasing the input size escalates computational complexity and model scale, challenging the hardware. To
address this issue, we propose the Multiple Scale Resolution Fusion Model (MuSRFM), a novel DL-based SDB
model, to integrate information of varying scales by utilizing temporally fused Sentinel-2 L2A multi-spectral
imagery. The MuSRFM uses a Multi-scale Center-aligned Hierarchical Resampler (MCHR) to composite large-
scale multi-spectral imagery into hierarchical scale resolution representations since the receptive eld gradu-
ally narrows its focus as the spatial resolution decreases. Through this strategy, the MuSRFM gains access to rich
spatial information while maintaining efciency by progressively aggregating features of different scales through
the Cropped Aligned Fusion Module (CAFM). We select St. Croix (Virgin Islands) as the training/testing dataset
source, and the Root Mean Square Error (RMSE) obtained by the MuSRFM on the testing dataset is 0.8131 m
(with a bathymetric range of 0–25 m), surpassing the machine learning based models and traditional semi-
empirical models used as the baselines by over 35 % and 60 %, respectively. Additionally, multiple island
areas worldwide, including Vieques, Oahu, Kauai, Saipan and Tinian, which exhibit distinct characteristics, are
utilized to construct a real-world dataset for assessing the generalizability and transferability of the proposed
MuSRFM. While the MuSRFM experiences a degradation in accuracy when applied to the diverse real-world
dataset, it outperforms other baseline models considerably. Across various study areas in the real-world data-
set, its RMSE lead over the second-ranked model ranges from 6.8 % to 38.1 %, indicating its accuracy and
generalizability; in the Kauai area, where the performance is not ideal, a signicant improvement in accuracy is
achieved through ne-tuning on limited in-situ data. The code of MuSRFM is available at https://github.com/
qxm1995716/musrfm.
1. Introduction
Accurate bathymetric data of nearshore coastal regions surrounding
islands is of utmost importance in many elds, such as coastal zone
development and management, marine aquaculture and marine
engineering. The approach humans use to obtain bathymetry data has
undergone a long evolution, transitioning from traditional weighted
cables and probes to acoustic-sounding instruments, such as Single-
Beam Echo-Sounders (SBES) and Multi-Beam Echo-Sounders (MBES),
and the current emerging laser-sounding technology. Undoubtedly, the
* Corresponding author at: Ocean College, Zhejiang University, Zhoushan, Zhejiang 316021, China.
E-mail address: zywu@sio.org.cn (Z. Wu).
Contents lists available at ScienceDirect
ISPRS Journal of Photogrammetry and Remote Sensing
journal homepage: www.elsevier.com/locate/isprsjprs
https://doi.org/10.1016/j.isprsjprs.2024.09.007
Received 24 April 2024; Received in revised form 3 September 2024; Accepted 6 September 2024
ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
Available online 14 September 2024
0924-2716/© 2024 The Authors. Published by Elsevier B.V. on behalf of International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). This is an
open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ).
development of technology has led to improvements in the efciency of
bathymetry surveying. For example, MBES can offer greater efciency
and more comprehensive coverage than traditional mapping equipment
such as SBES. However, the operation of MBES is constrained by various
factors, including sea conditions and water depth in the survey region, as
well as the high cost associated with survey operation. In recent years,
the development of Airborne Lidar Bathymetry (ALB) has introduced a
more expedient approach to measuring water depth, facilitating the
highly efcient acquisition of bathymetry across large areas. Nonethe-
less, the utilization of ALB for bathymetric surveys is also constrained by
its exorbitant cost. Therefore, nding a feasible and cost-effective
approach to obtain bathymetry data from nearshore shallow water re-
gions in islands and coastal zones holds signicant scientic and prac-
tical value. The launch of earth observation satellites in recent years has
signicantly increased due to the advancement of technology and sat-
ellite payload capabilities, providing various high-quality observation
data, especially the publicly accessible 10 m resolution Sentinel-2 multi-
spectral imagery data that have been widely used in various elds and
applications. Beneting from the advancements in remote sensing,
Satellite Derived Bathymetry (SDB), which uses high-resolution multi-
spectral imagery, provides an efcient and low-cost approach for
obtaining wide-range bathymetry data in shallow water regions.
The principle of SDB based on multi-spectral imagery data is to
construct an inversion model that utilizes the correlation between the
reectances captured by sensors on earth observation satellites and the
bathymetry. Studies on SDB based on multi-spectral imagery date back
at least to the 1970 s and gradually evolved into multiple branches,
encompassing analytical models, semi-empirical models and empirical
models (Ashphaq et al., 2021). Among these branches, the analytical
model achieves bathymetry derivation by constructing a radiative
transfer physical model expression on the basis of various in-situ optical
parameters related to the seabed and water column (Albert and Mobley,
2003). By modifying the radiative transfer function, Lee et al. (1998;
1999) proposed the Hyperspectral Optimization Process Exemplar
(HOPE), which achieves bathymetry derivation without any in-situ data
by using the Levenberg-Marquardt optimization algorithm to search for
the optimal parameters that best t the observed hyper-spectral re-
ectances, and successfully adapted to work with multi-spectral imagery
data (Lee et al., 2012;Xia et al., 2019). In comparison, the principle of
the semi-empirical SDB model is much more straightforward and intu-
itive. It expresses the radiative transfer of light in the water body based
on explicit assumptions and employs statistical methods to construct a
regression model to t the in-situ bathymetry data. One of the most
widely recognized semi-empirical SDB models is the log-transformation
linear band model proposed by Lyzenga (1978; 2006), which combines
reectances from multiple spectral bands to mitigate the inuences
caused by scattering and the bottom type to achieve improved accuracy.
Another classic semi-empirical model is the Logarithmic Band Ratio
(LBR) model proposed by Stumpf et al. (2003). The LBR model exhibits
enhanced robustness and introduces implicit compensation for variable
bottom types, which has been adopted and further extended to include
more band combinations by numerous studies (Li et al., 2019; Via˜
na-
Borja et al., 2023). The empirical SDB model is further simplied
compared with the semi-empirical model, which directly constructs the
mapping function between the remote sensing reectances and ba-
thymetry. Among the empirical models, researchers employ various
regression tools and feature extraction algorithms, such as Multiple
Linear Regression, Principle Component Analysis (PCA), etc., to improve
the accuracy of SDB (Mishra et al., 2004). In recent years, research on
empirical SDB models has put more attention on introducing Machine
Learning (ML). The ML-based SDB model builds a feature extraction and
mapping function, such as Support Vector Regression, Multi-Layer Per-
ceptron and Random Forest (Ceyhun and Yalçın, 2010;Misra et al.,
2018; Kaloop et al., 2021;Mudiyanselage et al., 2022;Wu et al., 2022),
whose inputs come from the reectance of different bands of multi-
spectral imagery and uses in-situ data to complete the tting. The
increased complexity in feature mapping of the ML models facilitates the
establishment of tting functions that are far more complex than simple
linear regression, resulting in the inversion accuracy of these ML-based
SDB models outperforming that of traditional algorithms.
With the progression of research, the signicance of other kinds of
information contained in multi-spectral imagery, such as spatial fea-
tures, has become evident. Most extant SDB models perform bathymetry
inversion by employing the reectances from the target pixel, indicating
a limitation in the receptive eld of these models to the scale of a single
pixel and ignoring the spatial distribution characteristics. Moreover,
traditional SDB modeling presupposes global homogeneity of the
physical properties of the water column and bottom type, which
signicantly deviates from reality, thereby limiting the generalization
and migration deployment capabilities of these SDB models. To address
these issues, SDB models with geographical adaptivity have been pro-
posed, which involve subdividing the whole scene into multiple
geographical regions or local zones, and then constructing a specialized
model for each delineated area (Su et al., 2013;Liu et al., 2018;Wang
et al., 2019a). Monteys et al. (2015) analyzed four spatial prediction
models, including geographically weighted regression (GWR), a hybrid
of GWR and Kriging (GRWK) and Kriging with an External Drift model
calibrated by Local Neighbourhood (KED-LN) and Global Neighbour-
hood (KED-GN), and conrmed the improvement brought by spatial
information. Cahalane et al. (2019) compared the performance of Linear
Regression (LR, non-spatial) and Regression Kriging (RK, spatial) on
three types of remote sensing data, including Landsat-8, RapidEye and
Pleiades, and found that the spatial RK model yielded the most accurate
results. Niroumand-Jadidi et al. (2020) utilized K-Nearest Neighbors (K-
NN) to partition the feature space and allocate an optimal linear
regression model for each subspace with suitable band combinations,
yielding nal bathymetry by averaging the predictions of the regression
models corresponding to clusters that are proximate to the target in the
feature space. Chen et al. (2021) segmented the whole region into
distinct regions based on the residual errors and then constructed mul-
tiple independent inversion models tailored for each zone. To incorpo-
rate spatial location information, some studies have included
geographical coordinates such as latitude, longitude and UTM co-
ordinates in the input (Wang et al., 2019b;Zhu et al., 2022;Lowell and
Rzhanov, 2024), but these models, theoretically, cannot be deployed
across different regions. Another intuitive approach to integrating
spatial information is to widen the receptive eld of the inversion model,
as evidenced by studies that have expanded the input of the model from
one pixel to the aggregate of several adjoining pixels (Zhu et al., 2021;
Knudby and Richardson, 2023). Notably, the emergence of Deep
Learning (DL) in computer vision provides a new perspective, particu-
larly because of its extensive receptive eld and superior capacity for
high-dimensional feature abstraction, which are innately suited for SDB
missions. Ai et al. (2020) applied Convolution Neural Networks (CNNs)
on multi-spectral imagery from ZY-3, GF-1 and WorldView-2 data and
constructed a simple model with one convolutional layer to successfully
deploy for SDB. Subsequently, more CNNs-based SDB models have been
introduced into SDB research (Wilson et al., 2020, September;Lum-
ban-Gaol et al., 2021; Zhong et al., 2022; Lumban-Gaol et al., 2022;
Knudby and Richardson, 2023), especially with the deepening of
research more up-to-date models such as U-Net and ReneNet have been
applied (Mandlburger et al., 2021; Sun et al., 2023).
A concise review of the development of SDB reveals that the gradual
increase in accuracy correlates with the advancement of research.
However, several nontrivial limitations persist. The foremost issue is the
conict between the computational complexity and the receptive eld of
the model on high-resolution multi-spectral imagery data. High-
resolution multi-spectral imagery is vital for various types of remote
sensing applications, such as segmentation (Hang et al., 2022), classi-
cation (Pan et al., 2021) and object detection (Han et al., 2021), and its
granular information—such as texture and contextual information—is
also crucial for SDB. Merely expanding the input range inevitably
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
151
increases the computational load; hence, the primary issue is enabling
the model to perceive a wide range of spatial information efciently. The
second issue is that the training dataset and testing dataset in most
current SDB studies often overlap geographically, leading to an inade-
quate assessment of the SDB model’s generalizability. One of the
intended applications of SDB is to achieve rapid bathymetry inversion in
inaccessible regions that lack in-situ water depth measurements, which
implies that the generalization ability of the SDB model is likely the most
pivotal factor after accuracy. For semi-empirical and empirical SDB
models that rely on in-situ data, only a small number of studies have
considered the geographical independence of the training and testing
datasets or evaluated the transferability (Via˜
na-Borja et al., 2023;
Lumban-Gaol et al., 2022; Mandlburger et al., 2021;Mudiyanselage
et al., 2022), yet these factors are crucial for evaluating the generaliz-
ability of the SDB models. Alternations in geographical locations result
in changes in various physical parameters related to the water column
and bottom type, necessitating the assessment of the SDB model’s
generalizability over a wider range. In addition, certain minor issues
should not be neglected, such as how to ensure the quality of multi-
spectral imagery and how to process elevation data of the study areas
in conjunction with multi-spectral imagery to obtain bathymetry data.
In response to these challenges, we propose a novel SDB model with
Sentinel-2 L2A Surface Reectance (SR) data as input, termed the
Multiple Scale Resolution Fusion Model (MuSRFM), which uses DL
models to improve the accuracy and generalizability of the SDB model.
To efciently fuse spatial features of multiple scales, we propose a Multi-
scale Center-aligned Hierarchical Resampler (MCHR) to compress
textural and contextual information of different scales into hierarchical
representations, and we encode these resampled SR data through mul-
tiple branches of the MuSRFM and utilize the embedded Cropped
Aligned Fusion Module (CAFM) to fuse hierarchical features of different
ranges. For data preprocessing, we conduct median ltering on the
multi-temporal Sentinel-2 L2A SR data to reduce interferences and
integrate these SR data with in-situ elevations to extract bathymetry in
the study areas. We select ve study areas from around the world, as
shown in Fig. 1, to construct a training dataset, a testing dataset, and a
real-world dataset that is used for evaluating generalizability. The
training and testing datasets are sourced from St. Croix (Virgin Islands),
whereas the real-world dataset is much larger, and consists of Vieques
(Puerto Rico), Saipan and Tinian (Commonwealth of the Northern
Mariana Islands, CNMI), as well as parts of Oahu and Kauai (Hawaii).
The analysis of the results obtained on the testing dataset and real-world
dataset shows that the MuSRFM signicantly outperforms other baseline
models in terms of accuracy and robustness, and further ne-tuning
experiments demonstrate that it can achieve ideal performance with
only a very small amount of in-situ data.
2. Data
This study utilizes Sentinel-2 multi-spectral imagery data for ve
study areas, depicted in Fig. 1, to conduct SDB research, with in-situ
bathymetry data derived from ALB measurements collected in corre-
sponding areas. These study areas, which are located in the Atlantic
Ocean, Western Pacic Ocean and Eastern Pacic Ocean, designate St.
Croix as the source of the training and testing datasets, whereas the
remaining four areas are utilized to construct the real-world dataset for
inference.
2.1. Sentinel-2 multi-spectral imagery data
Sentinel-2 is a constellation of two polar-orbiting satellites operated
by the European Space Agency (ESA). The single payload they carried is
Fig. 1. Geographic distribution of the study areas used in our research, including (a) St. Croix, (b) Vieques, (c) Kauai, (d) Oahu and (e) Saipan and Tinian. Among
them, St. Croix is selected as the source of the training and testing datasets, while all remaining areas are used for inference experiments.
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
152
a Multi-Spectral Instrument (MSI) with thirteen spectral bands, of which
four bands (B2, B3, B4 and B8) are at 10 m, six bands (B5, B6, B7, B8a,
B11 and B12) are at 20 m, and three bands (B1, B9 and B10) are at 60 m
spatial resolution. Here, we use atmospherically corrected SR L2A data
processed by Sen2Cor for SDB research. However, the Sentinel-2 SR
imagery data of a single scene can be compromised by clouds, sun glints
and other factors, which may introduce deviations in SR imagery and
result in unsatisfactory SDB performance. To minimize these in-
terferences, median ltering is applied to multi-temporal SR imageries,
thereby generating fused SR data. This method has been adopted in
several recent studies (Han et al., 2023;Xu et al., 2024), and is similarly
employed herein using Google Earth Engine (GEE) to achieve efcient
multi-temporal data fusion. For St. Croix, which serves as both the
training and testing datasets, we integrate L2A data obtained from Jan.
2018 to Jan. 2022 into multiple fused SR imageries, and after testing, we
nd that when the period is longer than 24 months the quality of the
fusion is highly satisfactory. Finally, eleven fused SR imageries are ob-
tained after visual inspection, as shown in Table 1. This operation can
reduce the interference caused by factors such as varied Inherent Optical
Properties (IOPs) and clouds, enhancing the content and stability of SR
imagery. Since the fused SR values are determined by the observations
during each period, we compare some pixels of the fused imageries and
nd that there are slight differences, which can be caused by various
effects, including changes in IOPs. Hence, the use of multiple fused SR
imageries to build the training/testing dataset can implicitly introduce
variations in IOPs, and it can also be regarded as a means of data
enrichment. Whereas the other four areas are designated for inference, a
single fused SR imagery is constructed for each area, and their corre-
sponding period is also determined by manual inspection. The details
related to the fusion of multi-temporal Sentinel-2 SR imageries for each
study area in the real-world dataset are delineated in Table 1. Finally, for
the vacant pixels in the fused SR imagery, a bilinear interpolation is
deployed to ll these pixels to facilitate the subsequent resample in
MCHR.
2.2. ALB data
The in-situ data used in this study are ALB data collected by the
National Oceanic and Atmospheric Administration (NOAA), which can
be downloaded from the Internet (https://coast.noaa.gov/dataviewer/).
The ALB data, which are Digital Elevation Model (DEM) raster data, are
properly processed and ready for use, encompassing not only under-
water terrain but also a portion of terrestrial terrain elevation data. The
details of the ALB data are presented in Table 1. The DEM data presented
herein means elevation values relative to specic vertical datums that
vary by geographical location, with the St. Croix area utilizing VIVD09,
the Vieques area employing PRVD02, the Oahu and Kauai areas refer-
enced to Mean Sea Level (MSL), and the Saipan and Tinian areas uti-
lizing NMVD03. Here, the original spatial resolution of these DEMs is 1
m, yet to facilitate spatial alignment with the Sentinel-2 L2A SR imagery,
they have been downsampled to a resolution of 10 m. In addition, we
conducted manual inspection and trimming of the DEM located in the
shoreline to remove several inland water bodies or abnormal targets.
2.3. Data preprocessing for the ALB DEM
The in-situ data for SDB research usually are bathymetry data ob-
tained via acoustic mapping such as MBES, which can be directly used as
ground truth (GT) data for SDB model calibration. However, the in-situ
data used in this research are the DEM of the coastal region obtained via
LiDAR, so the foremost issue here is the deployment of data pre-
processing to convert the elevation into bathymetry, namely, the Digital
Bathymetry Model (DBM), as shown in Fig. 2. Here, the instantaneous
sea level elevation essential for bathymetry conversion is indetermin-
able through the tidal model because the obtained dates of the pixels in
the fused Sentinel-2 SR imagery are not the same. Since the DEM we
used covers both shoreland and water, we assume that the sea surface is
at and the elevation of the sea surface can be roughly equal to the
shallowest point in the water area, that is, the maximum elevation of the
water area. This elevation can be regarded as the datum for the calcu-
lation of bathymetry, and we highlight that this is only suitable when
adopting DEM containing coastal pixels (elevations at the junction of
water and land) as in-situ data. Initially, B8 of the SR imagery is
segmented via binary threshold segmentation to produce a preliminary
Water-Land Map (WLM), and the distances from each water pixel to its
nearest land pixel in the WLM, namely Distance to Coast Map (DtCM),
Table 1
The information of SR data and ALB data are used in this study, d for water depth.
Area
(Number of Fused SR
Imagery)
Sentinel-2 SR Data
Period
ALB Survey
Date/
LiDAR Type
DEM Vertical Datum/ Bathymetric Accuracy (95 %
condence level)
DEM Horizontal Datum/ Bathymetric Accuracy
(95 % condence level)
St. Croix (11) 20180101–––20200630
20180101–––20210101
20180101–––20220101
20180630–––20200630
20180630–––20210101
20180630–––20210630
20190101–––20210101
20190101–––20210630
20190101–––20220101
20190630–––20210630
20200101–––20220101
2019/
VQ880-GII
VIVD09/
0.121 m
NAD83
−UTM 20 N/
0.696 m
Vieques (1) 20180630–––20210630 2019/
VQ880-GII
PRVD02/
0.121 m
NAD83
−UTM 20 N/
0.696 m
Oahu (1) 20180101–––20220101 2013/
CZMIL
MSL/sqrt
(0.2
2
+(0.013d)
2
) m (shallow water) sqrt
(0.3
2
+(0.013d)
2
) m (deep water)
NAD83(PA11)/
3.5 +0.05d m
Kauai (1) 20180101–––20220101 2013/
CZMIL
MSL/
0.125 m
(shallow water)
0.2 m
(deep water)
NAD83(PA11)/
1m
Saipan &Tinian (1) 20180530–––20210630 2019/
Hawkeye 4X
NMVD03/
0.1 m
NAD83(MA11)
−UTM 55 N/
Not Provided
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
153
are calculated. Subsequently, pixels and their elevations along the
coastline are extracted, which may contain limited abnormal pixels due
to shadow blocking and enclosed inland water bodies. Here, we adopt
the Median Absolute Deviation (MAD) to detect these abnormal eleva-
tions and eliminate them, as follows.
xM−n*MAD <xi<xM+n*MAD
Where x
M
is the median elevations of the selected pixels, MAD=1.4826
×median (|x–x
M
|), and nis 3. The WLM and DtCM are updated after
removing outliers. With outliers eliminated, the morphological erosion
algorithm removes inland water bodies to derive the nal WLM, which is
then applied to obtain the valid DEM of water (DEM
W
). To ensure safety,
the MAD algorithm is reapplied to remove outliers from the uppermost
elevations (for example, the top 500 elevations). Ultimately, the
maximum elevation in the DEM
W
is used as the datum for converting
elevations and yielding the DBM, namely
DBM =HL−DEMW
H
L
is the maximum elevation value in the DEM
W
and the range of DBM is
0 ~ 25 m. Notably, the number of samples used for outlier removal in
the DEM
W
should be carefully considered (default is 500), and a greater
number of samples is recommended when interference, such as shadows
that introduce abnormal elevations, is severe.
2.4. Study areas
Fig. 1 illustrates the global distribution of the study areas, with St.
Croix exclusively providing the training and testing datasets, while the
remaining areas constitute the real-world dataset used for inference.
Included in the real-world dataset, Vieques is nearest to St. Croix—the
area of the training/testing dataset—whereas the other study areas are
situated thousands of kilometers apart. This setup maintains distinct
differences between the training, testing and real-world datasets, which
is essential for assessing generalizability and aligns closely with the
object of rapid cross-regional deployment of the SDB model. Table 2
presents detailed information about the number of bathymetry samples
in each study area. The volume of the real-world dataset signicantly
exceeds those of the other two datasets.
2.4.1. Training and testing dataset: St. Croix
St. Croix, as shown in Fig. 3, is located in the Western Atlantic
Caribbean Sea and has a steep underwater slope on the northwest side,
in contrast to its atter southern seabed. Owing to its diverse underwater
topography, this study area has been chosen for the establishment of
distinct training and testing datasets, instead of traditional mixed, as
illustrated in Fig. 3(b). The selection of training and testing datasets
from the same study area aims to minimize data distribution discrep-
ancies, thus ensuring a more accurate assessment of SDB model per-
formance. As mentioned before, eleven fused SR imageries are utilized
to construct the training and testing datasets, emulating variability in
imagery due to changes in factors like IOPs. Following the above data
preprocessing, approximately 1,921,400 bathymetry samples are ob-
tained for each imagery, covering an area of 192.14 km
2
, as shown in
Table 2. Each fused SR imagery contains approximately 1,155,100
training samples and 766,300 testing samples, with a ratio around
6.5:3.5, conrming that there is no geographical overlap.
2.4.2. Real-world dataset: Vieques, Oahu, Kauai, Saipan and Tinian
The real-world dataset is a large dataset that is composed of four
study areas worldwide, as shown in Fig. 4, including Vieques, Oahu,
Kauai, Saipan and Tinian. Each study area has its own characteristics
and is far apart from each other, but all are located in the tropical region,
the same as St. Croix. This dataset aims to evaluate the performance,
especially the generalizability, of SDB models on various scenes, such as
different underwater terrain distributions and unseen multi-spectral SR
Fig. 2. The workow of DBM construction.
Table 2
Volumes of valid bathymetry samples in different datasets.
Area Number of Pixels (10
m)
Purpose (Total Size)
St. Croix
(Single
Imagery)
1,921,400 Training Dataset (1,155,100)
Testing Dataset (766,300)
Vieques 1,456,042 Real-World Dataset for Inference
(4,937,405)Oahu 2,092,767
Kauai 718,363
Saipan and
Tinian
670,233
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
154
imagery.
First, Vieques, Puerto Rico, located in the Caribbean Sea, lies close to
St. Croix. Factors such as geographical proximity and similar coral reef
assemblages within and between reefs in comparable settings (Riegl,
et al., 2008) make these two areas similar. Therefore, the introduction of
this study area to the real-world dataset aims to evaluate the perfor-
mance of the SDB model under the presumption of a limited data dis-
tribution shift. As illustrated in Fig. 4(a), the underwater topography of
this study area is uniformly smooth with clear water and negligible
terrestrial suspended matter, making it ideal for SDB based on multi-
spectral imagery.
Second, Oahu, positioned in the Eastern Pacic Ocean as part of the
Hawaiian archipelago, is considerably larger than other study areas in
the real-world dataset, as depicted in Fig. 4(b), with 2,092,767 ba-
thymetry samples included. Since shadow occlusion degrades the fused
SR imagery quality on Oahu’s western and northern sides, the study area
is restricted to its eastern and southern sides. The eastern section of the
study area hosts extensive coral reef cover, characterized predominantly
by at underwater topography and transparent water. The existence of
expansive urban areas centers on Oahu, accompanied by numerous man-
made structures such as ports, enhances the complexity of the nearshore
underwater terrain.
Kauai, another island of the Hawaiian archipelago chosen for this
research, is approximately 120 km away from Oahu. As with the Oahu
area, only the eastern and southern sides are selected for inversion in
this study due to the prolic presence of shadow-occluded pixels on the
north and west sides. In contrast to Oahu, as depicted in Fig. 6, the DBM
of Kauai indicates that the submarine terrain in this region possesses a
steep gradient accompanied by a rapid increase in water depth, thus
severely restricting the extent of the area available for inversion.
Notably, the SR value in this area is lower than that in Oahu, which can
be found visually.
Finally, Saipan and Tinian, are located in the CNMI of the Western
Pacic Ocean and are notably isolated from other study areas in the real-
world dataset. Among these two islands, Saipan is characterized by
distinctive topographic features: the underwater terrains on the eastern
and western sides differ markedly, with the eastern side characterized
with a steep gradient, whereas the western area boasts a large lagoon
formed by coral debris extending more than 4 km offshore. In compar-
ison, the underwater terrain surrounding Tinian is predominantly steep,
and areas with bathymetry values shallower than 25 m are typically
within 500 m of the coastline, apart from its southwest part.
In short, the data preprocessing in our research can be roughly
divided into Sentinel-2 data processing and ALB DEM processing. The
processed Sentinel-2 SR imagery determines the quality of the model
input, especially the invalid pixel lling, which is critical for the stability
of the MCHR output, and thus affects the accuracy of the inversion re-
sults. As another core of data preprocessing, the main result is sensitive
to the manual trimming of the DEM and the conversion from the DEM to
the DBM since these processing steps determine the true bathymetry
used as supervise signals for training and evaluating. For different target
regions or different satellite multi-spectral imagery data, the parameters
need to be reexamined to determine the optimal settings.
3. Method
This study proposes a novel DL-based multiple scale resolution fusion
model, MuSRFM, to integrate multi-scale features to achieve precise
bathymetry inversion. As depicted in Fig. 5, the MCHR crops and
resamples large-scale SR imagery patches into different scale resolu-
tions, and as the process goes deeper, the scale decreases while the
receptive eld is narrowed, and each patch is resampled to obtain
patches with a xed shape, akin to the operation of zooming in. The
input for the MuSRFM comprises an array of multiple resolution SR
imagery patches, with each patch aligned with a unique scale that serves
to encapsulate spatial features of varying ranges. The backbone of the
MuSRFM processes the input hierarchical representations through
multistage encoders across different branches, with each encoder
composed of multiple Residual Module (RMd) proposed in ResNet (He
et al., 2016), as illustrated in Fig. 6. As a core component of the
MuSRFM, the CAFM is utilized for fusing features across different res-
olution branches and integrating them into the nest 10 m spatial res-
olution. To enhance the stability of the MuSRFM, a simplied decoder,
termed the Hierarchical Feature Concatenate Module (HFCM), is
employed to amalgamate features from multiple stages for the genera-
tion of the bathymetry map.
3.1. The workow of MCHR
The underwater terrain in high-resolution Sentinel-2 multi-spectral
SR imagery exhibits textural and contextual correlations, yet the scale
of these spatial features varies. Indeed, one of the principal factors
enabling DL models to realize advancements in elds such as computer
vision is their large receptive eld, which enables them to capture
larger-scale spatial features. However, as mentioned before, merely
expanding the input range can undoubtedly lead to prohibitive
computational complexity, particularly when the model is required to
perceive information in a broader range, such as dimensions spanning
Fig. 3. (a) Fused SR imagery of St. Croix, and (b) its corresponding DBM and the partition of the training dataset and testing dataset.
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
155
Fig. 4. The fused SR imagery and DBM in the real-world dataset, including (a) Vieques, (b) Oahu, (c) Kauai and (d) Saipan and Tinian.
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
156
thousands of pixels. Notably, for the larger-scale spatial features, the
intricacies are less critical, similar to how people view a map, which
focuses only on signicant content at a large scale before gradually
zooming in on the details. This constitutes the fundamental idea behind
the MCHR, as illustrated in Fig. 5, wherein large-scale SR multi-spectral
imagery, measuring 810 ×N m (where N is set at 15, yielding 12150 m),
is decomposed and resampled into multiple spatial resolution bran-
ches—namely, 10 m, 30 m, 90 m, 270 m and 810 m—with the scales
rising to cover spatial features of different sizes.
Fig. 5 shows the MCHR workow, where the input is a patch ob-
tained by cropping from the original fused SR imagery and is extremely
large, measuring 12150 m, making it challenging for the current DL
model that receives a single resolution input. Building upon the previ-
ously discussed concept of zooming in, the MCHR gradually crops this
enormous SR imagery along its center, reducing the shape of the resul-
tant patch to a third of its original height and width after each crop. As
depicted in Fig. 5, the richness of the content within the patch is
diminished and corresponds to a decrease in spatial extent. Since the
Fig. 5. The workow of MCHR, processing the SR imagery of the target area into multiple resolution scale hierarchical representations.
Fig. 6. The structure of MuSRFM.
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
157
parameter N is set to be 15, the height and width of the patch are
reduced from 12150 m to 4050 m and further decreased to 1350 m, 450
m and ultimately 150 m. However, it is impractical for the model to
manage such sizeable inputs; hence, we downsample at all levels except
the top branch. The bottommost level, spanning 12150 m, undergoes
resampling from a ne 10 m resolution to a coarser 810 m resolution,
whereas the resolutions of the other levels, from bottom to top, are 270
m, 90 m, 30 m and 10 m, and all have the same size. For branches with
coarse spatial resolution, such as branch RES-810, the content of the
resampled patch becomes blurrier, sharpening with the resolution
reduced to 30 m and 10 m. This is in line with the previously stated
concept of detail enhancement through zooming in. To avoid the
problem caused by MCHR exceeding the edge of input SR imagery
during sampling, we take two measures: (1) the extent of the input SR
imagery is signicantly larger than the area used to obtain the SDB; (2)
the input SR imagery is padded by specic constants (such as the min-
imum reectances of each band), and the expansion range depends on
the maximum sampling range of MCHR. In short, the MCHR segmentes
the original high-resolution SR imagery into ve smaller patches, pre-
serving the spatial features and detailed information associated with the
output scale, and addressing the challenges associated with the
increased receptive eld. In terms of the number of channels, all 12
channels of the Sentinel-2 L2A data are used, that is, the shape of each
patch is 15 ×15 ×12.
3.2. MuSRFM
Generally, as shown in Fig. 6, the input of the MuSRFM consists of
ve hierarchical patches, RES-10, RES-30, RES-90, RES-270 and RES-
810, which are obtained from MCHR sampling, and the output is a ba-
thymetry map with 10 m resolution matching the extent of branch RES-
10. MuSRFM fundamentally processes SR imagery patches of varying
resolutions via multiple serially connected encoder branches, where
each branch consists of a series of consecutively connected RMds,
segmented into four distinct stages. For the branch with a large spatial
resolution, such as RES-810, it can perceive the spatial characteristics
corresponding to kilometer-scale underwater terrain changes; while for
the branch with a ne spatial resolution input, such as RES-10, it can be
used to perceive the detailed features corresponding to small-scale
underwater terrain information. The multiple-stage encoder is used to
progressively transform the features from concrete to abstract.
Nevertheless, given that these encoders operate in parallel, there is a
requisite for an algorithm to amalgamate the features derived from
disparate branches and establish correlations between different scale
resolution levels–a function aptly performed by the CAFM. The CAFM
stands as one of the pivotal components of the MuSRFM, collaborating
seamlessly with the MCHR, which facilitates feature integration across
hierarchical resolution-specic branches. As illustrated in Fig. 5 and
Fig. 6, there is a progressive renement of spatial resolution, implying
that the lower branch encompasses the upper branch’s content. This
approach concurrently offers a practical framework for the CAFM,
which involves the alignment and integration of large-scale features at
lower levels with small-scale features at higher levels. Fig. 7 intuitively
elucidates the framework of CAFM. To clarify the process, we demon-
strate the feature fusion process using branches RES-810 and RES-270 as
a case, methodically outlining the operational steps. Initially, a feature
subset is derived by performing central cropping on the feature from
branch RES-810, resulting in a spatial shape (N/3, N/3), while the
channel dimension is omitted in the expression since this process is
performed on each band independently. Subsequently, bilinear inter-
polation is applied to this subset to reconstruct it to an interpolated
feature with shape (N, N). Next, the interpolated feature is concatenated
with the feature of branch RES-270 along the channel dimension, and
the updated feature of branch RES-270 is obtained through an RMd to
fuse the concatenated feature along the channel dimension. This pro-
cedure is carried out repetitively, realizing feature updates from the
lowest level (branch RES-810) to the highest level (branch RES-10), and
in this process, the features of different resolution-scale branches are
integrated into the nest resolution, namely, 10 m, which is used for
bathymetry inversion. Given that the encoder is partitioned into four
distinct stages, a CAFM is inserted at the end of each stage to facilitate
the propagation and amalgamation of features with varying levels of
abstraction.
To produce the output, a straightforward HFCM is employed to
directly concatenate the integrated features from the CAFM of multiple
stages within the RES-10 branch, subsequently acquiring the nal ba-
thymetry map with the spatial resolution of 10 m through an output
head. The difference between the output head and the aforementioned
Fig. 7. The detailed workow of the CAFM of MuSRFM.
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
158
RMd is that it is followed by two continuous convolution layers with no
normalization that reduces the channel dimension to 1, and at the end
adds a ReLU function to ensure that the output is not less than 0, namely
f(x) = max(x,0)
The purpose of the HFCM is to amalgamate the integrated features
originating from different stages, encompassing both concrete features
and abstract features, with the aim of securing a relatively stable output.
Indeed, the outcomes of the ablation experiment results in Section 4.2
substantiated the stability brought by HFCM.
3.3. Settings
Fig. 6 illustrates that the RMd serves as the foundational unit
comprising the entire model, with the multistage encoder of each branch
being constituted by four such RMd. As previously mentioned, the di-
mensions of each patch that is input into each branch are only 15 ×15 ×
12, so the depth of each encoder should not be very deep. Here, for the
two highest-level branches (RES-10 and RES-30), the number of blocks
in each RMd is 2, 3, 3, 3, and the number of channels gradually increases
to 64, 128, 192, 384 respectively, whereas the numbers of blocks and
channels for the two lowest branches (RES-810 and RES-270) are 2, 2, 2,
2 and 48, 96, 128, 192, and the congurations for the middle branch
(RES-90) are 2, 3, 2, 2 and 64, 128, 128, 192, respectively. For the RMd
in the CAFM, to make the model as lightweight as possible, it is set to
contain only one block. Generally, the MuSRFM is designed to be
compact, so it can be trained on limited computational resources and
deployed on a local device with a consumer-grade GPU.
The training dataset consists of eleven median-fused SR imageries of
St. Croix, as shown in Table 3, with the explicit intent of incorporating
changes in SR imagery induced by varying factors such as IOPs.
Although performing MCHR in real-time theoretically allows diverse
data augmentation methods, its time-intensive nature necessitates the
use of a static and preprocessed dataset sampled by the MCHR during the
training process. The limitation lies in the geographic immobility of the
samples, which restricts options for data augmentation methodologies,
such as shifting and random cropping. Furthermore, the reectance
value is affected by an array of physical parameters, including ba-
thymetry, which precludes the application of color-related data aug-
mentations. In light of these constraints, our study employs only random
vertical ips, random horizontal ips, and random rotations for data
augmentation.
Regarding additional detailed congurations during training, the
mini-batch size is set to 256, the learning rate is modulated by
employing a cosine learning rate scheduler, the initial learning rate is set
to 2.5e-4, the minimal learning rate is set to 1e-7, and the number of
epochs is set to 15. The optimizer used during training is AdamW, and
the loss function implemented is the Mean Square Error (MSE) with an
additional mask, MSE
M
, which is
MSEM=1
K
M
i=1
(xi⊙mi−yi⊙mi)2
In this context, x
i
represents the predicted bathymetry map, y
i
represents
the GT map, m
i
represents a Boolean mask (element 1 corresponds to a
valid pixel and 0 represents a null value), Ksignies the number of all
valid pixels in this mini-batch, and Mis the number of samples in each
mini-batch. More details about the MuSRFM and other information can
be found in the code we provide.
3.4. Validation metrics
In this study, we adopt ve metrics to evaluate the accuracy of our
MuSRFM and other baseline models quantitatively, including the RMSE,
Mean Absolute Error (MAE), Median Absolute Error (MedAE), R
2
and R2
R
(the R
2
based on the regression line). The RMSE is the most commonly
used metric in SDB research, which is more sensitive to abnormal de-
viation values and is calculated as follows.
RMSE =
N
i=1
(yi−xi)2
N
Where yis the in-situ bathymetry, namely, GT, xis the predicted
bathymetric value, and Nsignies the total number of bathymetry
samples used. Compared with the RMSE, the MAE is more tolerant of
abnormal deviation values, and its calculation is as follows.
MAE =N
i=1|yi−xi|
N
Additionally, the MedAE is the median value of all absolute errors.
MedAE =median(|yi−xi| ),i=1,2,3,⋯,N
The coefcient of determination, denoted as R
2
, is used to evaluate the
goodness-of-t between the predicted bathymetry and the GT based on
the 1:1 line.
R2=1−N
i=1(yi−xi)2
N
i=1(yi−y)2,andy =N
i=1yi
N
In addition, an alternative version of R
2
, designated R2
R, is introduced to
assess the goodness-of-t on the basis of the regression line obtained
between the GT and predictions instead of the 1:1 line.
R2
R=1−N
i=1(yi−fi)2
N
i=1(yi−y)2
Where f
i
is the tted value of the obtained regression line. Here, these
two versions of the coefcient of goodness-of-t, namely, R
2
and R2
R, are
used to evaluate the goodness-of-t around the 1:1 line and the
regression line, and the comparison of these two metrics highlights the
t between the predictions and the GT even if it is deviating from the 1:1
line.
4. Experiment and results
In our experiment, we reproduce ve existing SDB models as base-
lines to evaluate the performance of the MuSRFM. Among them, the LBR
model, as a classic semi-empirical SDB model, is widely utilized in
numerous studies (Stumpf et al., 2003; Zhang et al., 2022), namely
B=m1ln(q×Ri)
ln(q×Rj)+m0
Moreover, the Polynomial LBR (PLBR) is deemed to be more accurate
than the original LBR (Han et al., 2023), as shown in the following
equation.
Table 3
The results on the testing dataset. Here, ↓means that the smaller the metric is,
the better, and ↑means the opposite.
Model RMSE ↓
(m)
MAE ↓
(m)
MedAE ↓
(m)
R
2
↑R2
R↑
LBR 2.2581 1.6601 1.2845 0.7812 0.7867
PLBR 2.2700 1.6764 1.2913 0.7788 0.7865
SMART-SDB 1.3366 0.9702 0.7274 0.9233 0.9240
RFR 1.3124 0.9531 0.6977 0.9261 0.9340
CatBoost 1.2855 0.9293 0.6769 0.9291 0.9367
MuSRFM 0.8131 0.5841 0.4261 0.9716 0.9738
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
159
B=m2ln(q×Ri)
lnq×Rj2
+m1
ln(q×Ri)
ln(q×Rj)+m0
Where Bis the predicted bathymetry, qis a constant assigned a value of
20,000 here, and R
i
and R
j
are the SR values of the blue band and green
band from multi-spectral imagery, respectively. Furthermore, the
SMART-SDB employs the K-NN approach to amalgamate multiple PLBR
models corresponding to different band combinations for inverting river
channel bathymetry, and, as its paper says, it is also suitable for near-
shore shallow water bathymetry inversion (Niroumand-Jadidi et al.,
2020). With respect to ML models, which have garnered notable
attention in recent SDB research, this study uses ML-based SDB models
including Random Forest Regressor (RFR) (Poursanidis et al., 2019; Wu
et al., 2022; Mudiyanselage et al., 2022) and CatBoost(Lowell and
Rzhanov, 2024).
While the LBR and PLBR are based on linear regressions, the time
cost for training is extremely fast and only approximately 2 s. SMART-
SDB relies on the K-NN to cluster samples in the feature space and
assign an optimal submodel for each cluster, so more time is needed for
tting and ne-tuning, at about 180 s. As the complexity of the SDB
model increases, the time it takes for training also increases signi-
cantly. Catboost requires approximately 552 s for training with 5000
iterations, and RFR needs about 1184 s with 75 estimators. Finally, as we
expect, the training time of the MuSRFM is much greater since it is a DL-
based model with the time cost for the whole training process being
approximately 9285 s (about 155 min) on an RTX A5000 GPU. Although
time consumption is not an evaluation metric that we consider here, it
can reect the complexity of the SDB model to some extent.
4.1. Results on the testing dataset
The SDB models complete training and parameter calibration using
the constructed training dataset, with subsequent performance evalua-
tion conducted on the testing dataset. As illustrated in Table 3, there are
signicant differences in accuracy between the baselines and the
MuSRFM. Notably, the inherent randomness of the MuSRFM, a DL-based
model, can lead to variability in its training outcomes, so the results of
the MuSRFM in Table 3 are derived from the mean of metrics obtained
across ve consecutive repeated experiments. The band setting for the
input of RFR and CatBoost is derived from Lowell and Rzhanov (2024),
including B1, B2, B3, B4, B5 and B11, and excludes coordinates in the
original paper because the training and testing datasets are geographi-
cally isolated. Moreover, for SMART-SDB, the necessary parameters NP
and K are set to 15 and 20, respectively. Since the training dataset and
the testing dataset include eleven pairs of SR imageries and DBM, the
resulting number of samples is approximately eleven times those in
Table 2, specically about 12,706,100 and 8,429,300, respectively. To
compare the predicted bathymetry map with the GT map intuitively, the
predictions in the experiment are obtained by masking via the corre-
sponding GT map; however, if the GT map is lacking, the deep-water
regions can be masked by limiting the inversion range (e.g. within 3
km from the shoreline) and setting bathymetric threshold as 25 m.
First, the accuracies of the traditional LBR and PLBR models, with
RMSEs of 2.2581 m and 2.27 m, are less ideal. This shortfall is attributed
to the fact that the consistency of IOPs cannot be guaranteed, since the
training and testing area is geographically isolated and its whole area is
extensive (ranging over 30 km), challenging the capacity of the simple
linear regression model to adequately t the complex relationship be-
tween the SR and bathymetry. In comparison, the accuracy of SMART-
SDB shows a marked enhancement, with its RMSE being reduced to
1.3366 m compared with over 2 m obtained by LBR and PLBR; this
improvement stems from its strategy of partitioning the feature space
through K-NN clustering and allocating the optimal PLBR, incorporating
a specic bands combination, each subset. This nding augments the
prior assertion that simple linear regression models like LBR and PLBR
are inefcient under these conditions. Furthermore, the SDB model
employing ML algorithms outperforms the three aforementioned
models, with CatBoost achieving a marginally lower RMSE of 1.2855 m
than the 1.3124 m of RFR. Signicantly, RFR and CatBoost are based on
ensemble learning, which integrates multiple decision branches to
construct a more complex and stronger decision-maker, somewhat akin
to SMART-SDB which integrates multiple PLBR models, albeit with
increased complexity. Furthermore, the metrics of the MuSRFM
outperform those of the semi-empirical and ML-based SDB models, with
the RMSE decreasing greatly to 0.8131 m from ~ 2.25 m (LBR) and ~
1.29 m (CatBoost), exceeding 60 % and 35 % respectively, and other
metrics, such as the MedAE, MAE, R
2
and R2
R, also signicantly
improved. These metrics quantitatively demonstrate the enhanced ac-
curacy achieved by the MuSRFM.
For a more intuitive assessment of the model performance, the out-
comes of each SDB model on the testing dataset are depicted via anal-
ysis. For direct comparison, the predictions of the MuSRFM depicted in
Fig. 8,Fig. 9 and Fig. 10 are derived from the model that achieves the
median RMSE out of ve experiments, yielding slightly degraded RMSE,
MAE, MedAE, R
2
and R2
Rvalues of 0.8335 m, 0.6045 m, 0.4502 m,
0.9702 and 0.9731, respectively. The results presented in Fig. 8(a) and
Fig. 8(b) are consistent with the metrics of the LBR and PLBR reported in
Table 3; that is, their predictions and GT are scattered and have weak
goodness-of-t, both around the 1:1 line and regression line. The metrics
of SMART-SDB, RFR and CatBoost are markedly similar, as reected by
their corresponding density scatter plots in Fig. 8(c), Fig. 8(d) and Fig. 8
(e). However, upon examining the details, the ML-based SDB models
outperform the others, with CatBoost exhibiting the best performance
among the baseline models used for comparison. In contrast, Fig. 8(f)
shows that the predictions of the MuSRFM have a greater degree of t
with the GT, with R
2
and R2
Rboth exceeding 0.97 and closely aligned,
coupled with a greater density near the 1:1 line and low dispersion.
Fig. 9 presents the bathymetry map inverted by various SDB models
using the fused SR imagery from Jan. 2018 to Jan. 2022 in the testing
dataset. The inversion results from each SDB model generally align with
the expected trends, yet the bathymetry maps from models beyond GT
and MuSRFM markedly suffer from impulse noise, manifested as the
massive existence of point-like anomalies in the obtained inversion and
resulting in poor spatial continuity, as shown in the amplied patch of
all subplots in Fig. 9. This kind of noise results from the limited receptive
eld of these SDB models. That is, except for the MuSRFM, the receptive
eld of these SDB models is restricted to a single pixel on SR imagery;
thus, theoretically, these models can establish a mapping between only
an individual pixel and its corresponding bathymetry. In contrast, the
multiple scale resolution patches sampled by the MCHR encompass a
vast spatial extent with rich spatial features, greatly surpassing the in-
formation contained in a single pixel, thereby providing the MuSRFM
with broader and richer feature perception. However, we notice that the
accuracy of the MuSRFM decreases with increasing water depth,
prompting an analysis of the distribution of residual errors and the
number of training samples over different bathymetry ranges, as illus-
trated in Fig. 10. As anticipated, over 13 m, there is an obvious increase
in the variance of the residual error, and about 40 % of the samples are
located in this range, whereas the changes in the median residual error
and number of training samples are pretty aligned. Notably, the residual
error smaller than 1 m is relatively high, potentially attributed to these
shallow samples usually having taller waves and their SR value is
dominated by substrate reectance instead of water depth. In addition,
considering that the residual errors of each bathymetric interval in
Fig. 10 are highly correlated with the number of training samples, this
may also be due to the extremely small number of valid samples in this
interval. Moreover, according to the variation in the median residual
error in Fig. 10, the MuSRFM tends to underestimate the bathymetry
with increasing water depth, and the median residual error value in the
box plot shifts dramatically from zero after the bathymetry exceeds 18
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
160
m, so the optimal application range of the MuSRFM in the testing dataset
should be in the range of 0 ~ 18 m.
4.2. Results of ablation experiments
Ablation experiments here aim to evaluate the contribution of indi-
vidual components of the MuSRFM to its overall accuracy. Fig. 6 shows
that the MuSRFM can be broadly categorized into three components: the
backbone, the CAFM utilized to fuse different scale resolution features,
and the HFCM used to amalgamate features at various stages and
generate the predicted bathymetry map. Consequently, we sequentially
remove the CAFM and HFCM from the MuSRFM and, similar to the
above, derive the mean RMSE from ve consecutive repeated experi-
ments to objectively evaluate each submodel in the ablation experiment,
and the structure of each ablation model is shown in Fig. S1. As shown in
Fig. S1(b), the output head module of the ablation model without the
HFCM is located after the nest branch of the CAFM of the last stage.
Fig. S1(c) shows the ablation model without the CAFM contains only a
single branch with an input resolution of 10 m, with a limited receptive
eld for the smallest patch that shown in Fig. 5. Finally, the ablation
model without both the CAFM and the HFCM, depicted in Fig. S1(d), is
fully degraded into a pure CNN model with four sequentially connected
encoder stages.
The outcomes of the ablation experiment in Fig. 11 corroborate the
efcacy of our idea of multiple scale resolution fusion. Eliminating the
CAFM from the MuSRFM led to its simplication into a single-branch
model that accepts only input at 10 m resolution (RES-10), causing its
mean RMSE to escalate considerably from 0.8131 m to 0.9245 m; like-
wise, the mean RMSE for the submodel without the HFCM stands at
0.8367 m, substantially superior to that of the submodel devoid of both
the HFCM and CAFM, which is 0.9104 m. The observed alternations in
the RMSE signify that the CAFM is efcacious and underscore the
effectiveness of multiple scale resolution feature fusion within the
MuSRFM. Additionally, the error bar in Fig. 11 reveals that the HFCM
contributes to the stability of the MuSRFM instead of the accuracy, as a
signicant increase in standard deviation after the removal of the HFCM
indicates the stability of the trained model is affected. Remarkably, even
without both the CAFM and the HFCM, the degraded model still out-
performs the other models in Table 3, which means that a lightweight
CNN model with only 10 m resolution input (a 15 ×15 patch) also
outmatches the ML-based or traditional SDB models, further demon-
strates the benets that a large spatial receptive eld can bring
compared to those SDB models that rely on a single pixel as input. This
comparison strongly illustrates the vital importance and potential value
of spatial features and large receptive elds in SDB research, thus sub-
stantiating the feasibility of the core principle of the MuSRFM.
Fig. 8. Density scatter plots of the models used: (a) LBR, (b) PLBR, (c) SMART-SDB, (d) RFR, (e) CatBoost and (f) MuSRFM. The displayed MuSRFM results are from
the model with the median RMSE in ve repeated experiments.
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
161
4.3. Results of inference on the real-world dataset
The real-world dataset consists of data from four distinct study areas,
with a detailed description of its sources and various features delineated
in Section 2.4.2. The objective of employing a wide real-world dataset is
to assess the performance of the SDB models on an expansive scale,
evaluating their generalization and transfer capabilities across diverse
regions, with the rich data characteristics that the limited testing dataset
cannot offer. The MuSRFM employed in this section is also the model
that achieves the median RMSE across ve trials. Table 4 shows the
inference results of various SDB models achieved on the real-world
dataset.
In general, the MuSRFM outperforms the other four SDB models
across all four study areas, with RFR and CatBoost achieving better re-
sults than LBR, PLBR and SMART-SDB, except in the Kauai area. First,
the slight increase in error among various SDB models implies that the
Fig. 9. The bathymetry maps of the testing dataset area are obtained from the fused SR data from Jan. 2018 to Jan. 2022, which are (a) GT, (b) LBR, (c) PLBR, (d)
SMART-SDB, (e) RFR, (f) CatBoost and (g) MuSRFM.
Fig. 10. Distribution of residual errors of MuSRFM and the number of training samples in bathymetric intervals with a range of 1 m.
Fig. 11. Results of the ablation experiment.
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
162
data distribution in Vieques closely resembles that of the training
dataset, with the accuracy of LBR and PLBR even surpassing that ob-
tained in the testing dataset. Second, the results obtained in the Oahu,
Saipan and Tinian areas indicate that the MuSRFM signicantly out-
performs the other SDB models, achieving RMSEs of 1.6901 m and
1.7973 m, which are substantially lower than the other models; more-
over, discounting the MuSRFM, the ML-based models still surpass the
remainder, with SMART-SDB exhibiting the poorest performance among
all the SDB models assessed. Finally, in the Kauai area, all the included
SDB models, even the MuSRFM, suffer degradations across all the met-
rics, and it is particularly striking that the SMART-SDB and ML-based
SDB models show a pronounced increase in error, with the R
2
and R2
R
demonstrating a complete lack of linear correlation between the pre-
dictions and the GT. The results obtained in certain study areas reveal
noticeably low R
2
values, yet R2
Rremains above 0.9—for example, the
MuSRFM in the Kauai—suggesting a persistent substantial linear cor-
relation between the predictions and GT, despite the regression line
deviating from the 1:1 line. A thorough assessment of the metrics in
Table 4 shows that the MuSRFM notably outperforms its counterparts in
all but the MedAE for Vieques—where RFR and CatBoost slightly
excel—with an RMSE that is at least 6 %~38 % lower than that of the
second-ranked SDB models. In particular, the R2
Rof the MuSRFM across
all the study areas in the real-world dataset consistently surpasses 0.9,
reecting a robust linear correlation between the predictions and the
GT.
The performance of the MuSRFM in Vieques is quite satisfactory, as
shown in Table 4 and Fig. 12, which is likely due to the similar data
distributions between Vieques and St. Croix. The density scatter pre-
sented in Fig. 12(d) reveals that the regression line for the predictions
and GT align closely with the 1:1 line, indicating a small number of
outliers. However, a detailed examination of the GT and SDB of Vieques
in Fig. 12 reveals that the MuSRFM has a propensity for underestimating
bathymetry in the northern nearshore region, particularly in relatively
deep regions, a trend that is aligned with the phenomenon depicted in
Fig. 10. The partial amplication of the Punta Este region of Vieques in
Fig. 12(c) corroborated the quality of the SDB obtained by the MuSRFM,
Table 4
The inference results on the real-world dataset.
Area Model RMSE ↓
(m)
MAE ↓
(m)
MedAE
↓
(m)
R
2
↑R2
R↑
Vieques LBR 2.0807 1.5121 1.0921 0.8685 0.8857
PLBR 1.9639 1.4582 1.1012 0.8829 0.8855
SMART-
SDB
1.8829 1.3616 1.0030 0.8991 0.9228
RFR 1.4875 1.0813 0.7649 0.9328 0.9401
CatBoost 1.4985 1.0887 0.7661 0.8685 0.8857
MuSRFM 1.3852 1.0496 0.8050 0.9417 0.9500
Oahu LBR 3.5289 2.4703 1.6493 0.6398 0.8479
PLBR 4.0137 2.8236 1.8914 0.5340 0.8494
SMART-
SDB
6.2633 3.6890 1.8974 −0.1348 0.2484
RFR 2.7340 1.7160 0.9259 0.7838 0.8384
CatBoost 3.0776 1.9043 0.9861 0.7260 0.7829
MuSRFM 1.6901 1.0185 0.5926 0.9174 0.9298
Kauai LBR 4.3710 3.5860 3.2307 0.4945 0.8994
PLBR 5.2326 4.3484 4.0231 0.2757 0.9001
SMART-
SDB
10.0472 7.6225 5.7171 −1.6706 0.0257
RFR 8.6693 7.1163 6.7992 −0.9883 0.3085
CatBoost 6.4159 5.1228 4.2812 −0.0890 0.5338
MuSRFM 3.1862 2.6469 2.3846 0.7314 0.9123
Saipan
and
Tinian
LBR 5.5301 3.9239 3.1227 0.4533 0.8091
PLBR 6.8183 4.3370 3.3610 0.1689 0.7514
SMART-
SDB
3.3639 2.2503 1.5992 0.7977 0.8164
RFR 2.1240 1.4395 0.7969 0.9193 0.9238
CatBoost 2.7283 1.7458 0.8529 0.8669 0.8682
MuSRFM 1.7973 1.1464 0.5835 0.9422 0.9530
Fig. 12. (a) GT of Vieques and (b) SDB predicted via MuSRFM with (c) local amplication of the Punta Este area and (d) density scatter plot of the entire area.
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
163
yet it concurrently reveals some deciencies. In particular, the resolu-
tion of the high-frequency content is suboptimal, leading to the loss of
detail.
Within the study areas included in the real-world dataset, Oahu has
the largest scale and boasts a count of valid samples exceeding
2,000,000. The results presented in Table 4 clearly reveal that MuSRFM
outperforms other competing SDB models, as evidenced by an RMSE of
only 1.6901 m, which is notably less than the RMSE of 2.734 m obtained
by the second-ranked model. It can be seen from Fig. 13(e) that although
the samples are mainly concentrated near the 1:1 line, there is a pro-
pensity for the model to underestimate bathymetry at greater depths. In
addition to the high-density distribution near the regression line, there is
an obvious high-density cluster in the lower right part in Fig. 13(e),
which mainly comes from the Kaneohe Bay on Oahu’s eastern coast
shown in Fig. 13(c). This area features abundant coral reefs, and the
comparison in Fig. 13(c) between GT and SDB demonstrates that the
predictions of SDB are obviously lower than GT in these reef-rich zones.
Based on these characteristics, we believe that the seaoor sediment of
these areas making the reectance here is very different from the
training dataset, which leads to deviation. Fig. 13(d) depicts the results
obtained at the entrance of Pearl Harbor in Mamala Bay, where the
MuSRFM also underestimates the bathymetry when confronted with
such scenarios, likely because excessively high ow velocity at the
entrance enhances the turbidity of the water body and consequently
degrades imaging quality. Additionally, the MuSRFM is also unable to
accurately resolve the high-frequency details of the underwater terrain.
The real-world dataset reveals the worst accuracy of SDB in the Kauai
area, where the MuSRFM yields an RMSE of 3.1862 m. Fig. 14 shows
that the predictions via the MuSRFM are markedly shallower than the
measured bathymetry, with the enlarged details in Fig. 14(c) high-
lighting this global deviation and echoing the bias between the regres-
sion line and the 1:1 line observed in Fig. 14(d). The global deviation
observed in the predictions persists across the entire bathymetry range
and might be attributed to the distinct factors like IOPs of water body or
bottom type (e.g., dark sand composed of basaltic lava rock features a
Fig. 13. (a) GT of Oahu and (b) SDB predicted by the MuSRFM in this area. Here, (c) Kaneohe Bay and (d) Mamala Bay are enlarged to show local details, with (e) the
density scatter plot of GT versus SDB for the entire area.
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
164
higher absorption) in this region, which potentially lowers the SR value,
resulting in the imagery appearing darker than those in the other study
areas in Fig. 4. In Fig. S2, we analyze the mean SR values of each band
corresponding to the bathymetry pixels between 0 m and 25 m in the
training dataset (which uses fused imagery obtained from Jan. 2018 to
Jan. 2022) and the real-world dataset. The mean SR value in Kauai is
signicantly lower than that in other areas, especially for the
bathymetry-sensitive shortwave bands, including B1, B2, B3, B4 and B5.
Such spectral characteristics make it too different from the training
dataset, which in turn leads to signicant deviations in all SDB models
including MuSRFM. Additionally, the deviation may also be attributed
to the improper selection of the maximum elevation of water area during
the conversion from the DEM to the DBM. However, a signicant issue of
MuSRFM is noted in this study area, that is, the obvious difference in the
predicted bathymetry scale between adjacent patches in the deep-water
area far from the coastline, and such a scale shift caused a noticeable
inconsistency between output patches, as described on the right side of
Fig. 14(c). On the one hand, this stems from the MuSRFM predictions
being a single, discrete patch, and the low SR values, especially the
extremely low SR values in the deep-water area far from the coastline,
affect the contrast of the imagery, making it difcult to capture effective
global and local spatial features and causing deviations between
patches; on the other hand, the data distribution of input caused by
unusual SR value in Kauai area is too different from that of the training
dataset, which causes a scale shift in the MuSRFM prediction. Although
this phenomenon is only identied signicantly within the Kauai area in
the real-world dataset, we assert that its presence is an unavoidable
consequence in the context of broader extrapolative applications,
because the area where the model is deployed may be very different
from the source of the training dataset (for example, there are obvious
differences between Kauai and Oahu, which are even geographically
close). This question needs further in-depth research.
Within the study area encompassing Saipan and Tinian, the RMSE of
the MuSRFM reaches a value of 1.7973 m, thereby exceeding the per-
formance of the baseline models. The most striking characteristic of this
study area is the presence of a large-scale lagoon on the west ank of
Saipan, which signicantly diverges from the training dataset, thus
presenting a considerable challenge. The comparison shown in Fig. 15
Fig. 14. (a) GT of Kauai and (b) SDB obtained via the MuSRFM, and (c) Opala Bay is enlarged to show local details, with (d) the density scatter plot of the entire area.
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
165
indicates that the MuSRFM predictions display no signicant systematic
deviation, whereas the regression line closely aligns with the 1:1 line in
Fig. 15(e). In the characteristic lagoon area shown in Fig. 15(c), the
MuSRFM predictions closely match the measured bathymetry, preser-
ving geomorphological details around Managaha Island and accurately
reecting the trend of this whole area. The local amplication of Tinian
Harbor shown in Fig. 15(d) conrms that the inversion is generally ac-
curate, albeit with blurred ne details, which is consistent with previous
ndings. Although the regression line of predictions and measurements
in this study area closely approximates the 1:1 line, there are a signi-
cant number of anomalous deviation points in the lower half of the
scatter plot, as outlined by the red dashed rectangle, albeit with a low
Fig. 15. (a) GT of Saipan and Tinian area and (b) SDB predicted by the MuSRFM, (c) lagoon on the west side of Saipan and (d) Tinian Harbor are enlarged to show
local details, with (e) the density scatter plot of this entire area.
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
166
density. Fig. S3(b) and Fig. S3(d) show the geological distributions of the
predictions that underestimate the bathymetry of more than 5 m in
Saipan and Tinian (the red dashed rectangle in Fig. 15(e)), which
explicitly shows that these underestimations are mostly located east of
these islands. Combined with the bathymetry shown in Fig. S3(a) and
Fig. S3(c), most of these regions with underestimations of more than 5 m
have steep slopes and result in a narrow invertible zone, which is very
different from the training dataset. This phenomenon further illustrates
the importance of local characteristics for SDB, which reects the po-
tential improvement by retraining the model more locally under similar
situations and conrms the signicance of enriching the training data-
sets to introduce more diverse patterns for the robustness of the
MuSRFM.
To analyze the signicant differences in the performances of all four
areas contained in the real-world dataset, we further compare the re-
lationships between the logarithmic band ratio value and the bathym-
etry of the training dataset and real-world dataset, as shown in Fig. S4.
Here, the reectances and bathymetry of the training dataset come from
the median fused SR imagery of St. Croix during the period from Jan.
2018 to Jan. 2022. Among the four areas in the real-world dataset, as
expected, the similarity between Vieques and the training area in St.
Croix is the highest, which is the reason why the MuSRFM and other SDB
models have satisfactory performance in Vieques. For the Oahu and
Saipan &Tinian areas as shown in Fig. S4(c) and Fig. S4(e), the scatter is
much more discrete, but some high-density areas are near the regression
line. While the distribution of Kauai is the most different from that of the
training dataset, the density scatter plot is severely discrete, as shown in
Fig. S4(d), and no obvious high-density scatter area exists. This dramatic
difference between Kauai and other areas can also provide supplemental
evidence to explain why the MuSRFM performs well on other sites but
fails in this area. By analyzing the characteristics of Kauai and per-
forming a comparison with other areas, we believe that without ne-
tuning, the suitable application scenario to deploy the MuSRFM
directly is with spectral characteristics or data distribution similar to
that of St. Croix, the training dataset.
Upon evaluation of the SDB models on the real-world dataset, it is
evident that the MuSRFM outperforms the other baseline models in
terms of accuracy and generalizability; however, it concurrently has
several inherent aws, including underestimating the bathymetry and
blurring the details. Despite the presence of these aws, these experi-
mental results substantiate the superiority and effectiveness of the
MuSRFM, as well as the concept of multiple scale resolution fusion in
SDB research.
5. Discussion
5.1. Improvement brought by performing ne-tuning on limited data
Although the MuSRFM outperforms other SDB models in both the
testing dataset and the real-world dataset, the training of our model
from scratch relies on massive in-situ data. However, in some cases,
there are insufcient in-situ data that can be used for training the
MuSRFM. At this point, it is common to ne-tune the already trained
model via a small amount of labeled data. Therefore, here, we choose the
Kauai area, the most challenging site in the real-world dataset, as the
experimental area and randomly select 100, 1000 and 10,000 bathym-
etry samples to form the in-situ data to ne-tune the MuSRFM. These
selected points are excluded from the data used to test this approach to
avoid any data leakage that would weaken the credibility of the results.
To ne-tune the MuSRFM, the batch size is set to 64, the number of
epochs is 10, the initial learning rate is 5e-5, and a step learning rate
schedular with a step length of 2 epochs and gamma is 0.5 is used.
Table 5 shows the results of ne-tuning the MuSRFM on a small
amount of in-situ data in Kauai, from which it can be seen that various
metrics, especially the RMSE, have been greatly improved as the number
of samples increases from 100 to 10000. However, unlike the
geographical non-overlap between the training dataset and the testing
dataset, these sampling points are randomly selected from the entire
Kauai area, which simplies the difculty of SDB and achieves higher
accuracy metrics than the testing dataset. Fine-tuning the model here is
essentially introducing the prior knowledge learned in the training
dataset, which is why the MuSRFM can achieve signicant performance
improvements based on a very limited amount of data. Combined with
the changes in the density scatter plot in Fig. S5(a), Fig. S5(d), Fig. S5(g)
and Fig. S5(j), we can more intuitively see the signicant improvement
in the goodness of t between the predictions and the GT. We can
observe an enhancement in the performance of the MuSRFM even with
100 random samples, which means that it can be used in scenarios with
only a small amount of in-situ data, even if the target area may be quite
different from the training dataset. Another noteworthy point is that the
details in the inverted bathymetry map improved as the amount of data
used for ne-tuning increased, and the underwater terrain edges grad-
ually became clearer; by comparing Fig. S5(b), Fig. S5(e), Fig. S5(h) and
Fig. S5(k), it can be seen that there is a very signicant trend of
strengthening high-frequency information. However, from the
perspective of local magnication (Fig. S5(c), Fig. S5(f), Fig. S5(i) and
Fig. S5 (l)), although the embedded patch inconsistency caused by scale
shift has been alleviated, ne-tuning has not completely solved this
problem, which may be attributed to the deviation of the data distri-
bution and spectral characteristics of the Kauai area we discussed
before, and this issue needs further in-depth research. Based on the re-
sults obtained in this section, we believe that if there is a signicant
difference between the target area and the training dataset, ne tuning
the MuSRFM is the optimal solution currently.
5.2. Accurate water-land segmentation
Owing to the fused SR imageries here coming from the median of
multi-temporal SR imageries, an exact imaging date for tidal correction
cannot be determined, thus requiring the WLM to ascertain the water
area and permitting the conversion from DEM data to DBM data.
Consequently, an accurate water-land segmentation algorithm becomes
the fundamental requirement for this study. However, the 10 m spatial
resolution of Sentinel-2 SR data complicates the precise segmentation,
with the complex coastal ground distribution often leading to mis-
judgments. For example, inland water bodies in coastal zones with
indistinct borders can lead to segmentation inaccuracies if they are
misclassied as sea surface. Moreover, shadow occlusion from satellite
viewing angles may cause misclassication of land pixels as water. This
misclassication introduces erroneous elevations in water areas,
misleading the selection of a maximum elevation reference used for
conversion from the DEM to the DBM, and resulting in systematic
bathymetric bias. Such systematic bias may cause model training failure
or an incorrect evaluation of model performance, underscoring the
importance of an accurate water-land segmentation algorithm as an
essential component for enhancement.
5.3. Further improvement of the data and the SDB model
The training dataset utilized in this research is characteristically
constrained in both size and feature diversity, particularly when juxta-
posed with the broader real-world dataset comprising data aggregated
from multiple study areas worldwide. This setting aims to provide
Table 5
Results of the MuSRFM ne-tuned on the Kauai site.
Fine-Tune State RMSE ↓
(m)
MAE ↓
(m)
MedAE ↓
(m)
R
2
↑R2
R↑
100 random samples 1.4077 1.0050 0.7364 0.9476 0.9483
1000 random samples 1.0955 0.7567 0.5376 0.9683 0.9703
10,000 random samples 0.7654 0.5370 0.3919 0.9845 0.9846
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
167
sufcient content to evaluate the transferability of the SDB model for
rapid bathymetric inversion in inaccessible regions, which is one of its
intended applications. Although large cross-regional deployment sce-
narios are infrequent in practice, variations in IOPs or geological con-
ditions can result in signicant discrepancies in observations between
geographically adjacent regions, as exemplied by the notable differ-
ences between Oahu and Kauai within the Hawaiian archipelago. While
introducing multiple fused SR imageries enriched the training dataset’s
feature is merely a data enhancement rather than adding new content,
necessitating a larger, more complex dataset to introduce rich and novel
information. Throughout the inference experiment on the real-world
dataset, we witnessed diverse underwater terrain features across
various study areas, including large lagoons, and steep underwater
slopes—information that cannot be acquired by data enhancement. It is
worth noting that, the sites we chose to construct the real-world dataset
are distributed worldwide, but they are all located in tropical regions.
These areas all have clear water and the underwater topography is
visible by visual inspection, however, they may differ from the islands
situated in other parts of the earth. Consequently, assembling a more
extensive dataset with richer features has emerged as the primary
strategy to improve the accuracy and generalizability of the SDB models
in subsequent research.
The application scenario of bathymetry has many obvious charac-
teristics, which should be considered in subsequent research. First, for
some safety–critical scenarios, the bathymetry output alone is insuf-
cient, and the model should be able to evaluate the uncertainty of the
predicted bathymetry. Here, uncertainty is used to evaluate the reli-
ability of each output bathymetry value, which is essentially divided
into epistemic uncertainty attributed to the model weights and aleatoric
uncertainty arising from noise in the input (Kendall and Gal, 2017). Due
to various objective factors and inversion mechanisms, SDB inevitably
has a bias, and the signicance of uncertainty estimation lies in its use as
an explicit indicator of erroneously predicted bathymetry of SDB, which
can be used to indicate unreliable predictions and theoretically broaden
the available scenarios of SDB. Second, although the evaluation metrics
of the MuSRFM, especially the RMSE are signicantly improved, it can
be seen from the predicted bathymetry map that its high-frequency in-
formation is still reduced compared with that of the GT. On the one
hand, this may be due to the insensitivity of the MSE that is used as a loss
function to high-frequency components (Wang and Bovik, 2009); on the
other hand, the resampled patch of a large spatial scale, such as RES-
810, is blurred which might affect the nal output as the cross-
resolution fusion of the CAFM. The high-frequency details in the ba-
thymetry map correspond to areas where there are signicant changes in
underwater terrain or features such as steep slopes or underwater rocks,
and the importance of these terrain features requires SDB to be sensitive
to high-frequency information. In theory, it can be optimized by intro-
ducing spatial structural information or improving the high-frequency
information perception ability of the model, which deserves further
research.
6. Conclusion
Effectively integrating spatial information is a trending topic in SDB
model research, because combining spatial and spectral information can
theoretically further improve performance. In this study, we propose the
MuSRFM, a DL-based SDB model that employs multiple scale resolution
fusion, to accurately and robustly invert bathymetry in the offshore
shallow regions of islands with reduced computational complexity. The
MuSRFM employs an MCHR to resample the large-scale median fused
Sentinel-2 L2A SR imagery into hierarchical multiple scale resolution
patches, which contain spatial features across various ranges. The
MuSRFM’s multi-branch encoder processes inputs separately, merging
features from diverse resolutions through the CAFM, and the HFCM
integrates features from different stages to produce a bathymetry map
with a resolution of 10 m.
Given the aim of SDB research to enable broad bathymetry inversion
with limited data, we selected ve study areas worldwide, using only St.
Croix to construct the training and testing datasets and assembling a
real-world dataset on the basis of the remaining data. The mean RMSE of
the MuSRFM on the testing dataset is only 0.8131 m, exhibiting error
reduction rates exceeding 35 % and 60 % relative to those of the ML-
based and classic semi-empirical SDB models, respectively. Ablation
experiments are also conducted to validate the individual contributions
of the CAFM and HFCM to its nal accuracy, conrming the efcacy of
the multiple scale resolution fusion concept. The generalizability of the
MuSRFM is evaluated on a diverse real-world dataset, revealing per-
formance degradation yet outperforming other baseline models,
showing the robustness of deployment across different environments
and underwater terrains.
In summary, the MuSRFM outperforms conventional SDB models
such as ML-based and semi-empirical models in terms of both accuracy
and generalizability, demonstrating robust deployment potential across
various areas. We posit that the concept of multiple scale resolution
fusion, as exemplied by the MuSRFM, represents a promising avenue
for enhancing the performance of SDB. Indeed, the inversion objectivity
of the MuSRFM extends beyond bathymetry and, in theory, can be
adapted for obtaining other biophysical parameters in shallow coastal
areas. Nevertheless, the MuSRFM has shortcomings such as insufcient
high-frequency details and patch embedding anomalies, necessitating
further advancements through enhancements such as broadening
training dataset diversity and integrating high-frequency sensitive per-
ceivers into the model.
CRediT authorship contribution statement
Xiaoming Qin: Writing –review &editing, Writing –original draft,
Methodology, Data curation, Conceptualization. Ziyin Wu: Writing –
review &editing, Supervision, Funding acquisition, Conceptualization.
Xiaowen Luo: Writing –review &editing, Supervision, Conceptuali-
zation. Jihong Shang: Writing –review &editing, Supervision. Dineng
Zhao: Writing –review &editing, Software. Jieqiong Zhou: Writing –
review &editing. Jiaxin Cui: Writing –review &editing, Visualization.
Hongyang Wan: Visualization. Guochang Xu: Writing –review &
editing, Funding acquisition.
Declaration of competing interest
The authors declare that they have no known competing nancial
interests or personal relationships that could have appeared to inuence
the work reported in this paper.
Acknowledgements
This research was partially supported by the National Key Research
and Development Program of China under Grant 2022YFC2806600 and
2022YFC2806605, the Oceanic Interdisciplinary Program of Shanghai
Jiao Tong University under Grant SL2020ZD204 and SL2020ZD205, and
the National Key Research and Development Program of China under
Grant 2022YFC3003800.
Appendix A. Supplementary material
Supplementary data to this article can be found online at https://doi.
org/10.1016/j.isprsjprs.2024.09.007.
References
Ai, B., Wen, Z., Wang, Z., Wang, R., Su, D., Li, C., Yang, F., 2020. Convolutional neural
network to retrieve water depth in marine shallow water area from remote sensing
images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote
Sensing 13, 2888–2898.
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
168
Albert, A., Mobley, C.D., 2003. An analytical model for subsurface irradiance and remote
sensing reectance in deep and shallow case-2 waters. Opt. Express 11 (22),
2873–2890.
Ashphaq, M., Srivastava, P.K., Mitra, D., 2021. Review of near-shore satellite derived
bathymetry: Classication and account of ve decades of coastal bathymetry
research. J. Ocean. Eng. Sci. 6 (4), 340–359.
Cahalane, C., Magee, A., Monteys, X., Casal, G., Hanan, J., Harris, P., 2019.
A comparison of LandSat 8, RapidEye and Pleiades products for improving empirical
predictions of satellite derived bathymetry. Remote Sens. Environ. 233, 111414.
Ceyhun, ¨
O., Yalçın, A., 2010. Remote sensing of water depths in shallow waters via
articial neural networks. Estuar. Coast. Shelf Sci. 89 (1), 89–96.
Chen, A., Ma, Y., Zhang, J., 2021. Partition satellite derived bathymetry for coral reefs
based on spatial residual information. Int. J. Remote Sens. 42 (8), 2807–2826.
Han, W., Chen, J., Wang, L., Feng, R., Li, F., Wu, L., Tian, T., Yan, J., 2021. Methods for
small, weak object detection in optical high-resolution remote sensing images: A
survey of advances and challenges. IEEE Geosci. Remote Sens. Mag. 9 (4), 8–34.
Han, T., Zhang, H., Cao, W., Le, C., Wang, C., Yang, X., Ma, Y., Li, D., Wang, J., Lou, X.,
2023. Cost-efcient bathymetric mapping method based on massive active–passive
remote sensing data. ISPRS J. Photogramm. Remote Sens. 203, 285–300.
Hang, R., Yang, P., Zhou, F., Liu, Q., 2022. Multiscale progressive segmentation network
for high-resolution remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 60,
1–12.
He K., Zhang X., Ren S., Sun J., 2016. Deep residual learning for image recognition. In
Proceedings of the IEEE conference on computer vision and pattern recognition, 770-
778.
Kaloop, M.R., El-Diasty, M., Hu, J.W., Zarzoura, F., 2021. Hybrid articial neural
networks for modeling shallow-water bathymetry via satellite imagery. IEEE Trans.
Geosci. Remote Sens. 60, 1–11.
Kendall, A., Gal, Y., 2017. What uncertainties do we need in bayesian deep learning for
computer vision? Adv. Neural Inf. Proces. Syst. 30.
Knudby, A., Richardson, G., 2023. Incorporation of neighborhood information improves
performance of SDB models. Remote Sens. Appl.: Soc. Environ. 32, 101033.
Lee, Z., Carder, K.L., Mobley, C.D., Steward, R.G., Patch, J.S., 1998. Hyperspectral
remote sensing for shallow waters. I. A Semianalytical Model. Applied Optics 37
(27), 6329–6338.
Lee, Z., Carder, K.L., Mobley, C.D., Steward, R.G., Patch, J.S., 1999. Hyperspectral
remote sensing for shallow waters: 2. Deriving bottom depths and water properties
by optimization. Appl. Opt. 38 (18), 3831–3843.
Lee, Z., Weidemann, A., Arnone, R., 2012. Combined effect of reduced band number and
increased bandwidth on shallow water remote sensing: The case of WorldView 2.
IEEE Trans. Geosci. Remote Sens. 51 (5), 2577–2586.
Li, J., Knapp, D.E., Schill, S.R., Roelfsema, C., Phinn, S., Silman, M., Mascaro, J.,
Asner, G.P., 2019. Adaptive bathymetry estimation for shallow coastal waters using
Planet Dove satellites. Remote Sens. Environ. 232, 111302.
Liu, S., Wang, L., Liu, H., Su, H., Li, X., Zheng, W., 2018. Deriving bathymetry from
optical images with a localized neural network algorithm. IEEE Trans. Geosci.
Remote Sens. 56 (9), 5334–5342.
Lowell, K., Rzhanov, Y., 2024. Global and local magnitude and spatial pattern of
uncertainty from geographically adaptive empirical and machine learning satellite-
derived bathymetry models. Giscience &Remote Sensing 61 (1), 2297549.
Lumban-Gaol, Y.A., Ohori, K.A., Peters, R.Y., 2021. Satellite-derived bathymetry using
convolutional neural networks and multispectral sentinel-2 images. Int. Arch.
Photogramm. Remote. Sens. Spat. Inf. Sci. 43, 201–207.
Lumban-Gaol, Y., Ohori, K.A., Peters, R., 2022. Extracting Coastal Water Depths from
Multi-Temporal Sentinel-2 Images Using Convolutional Neural Networks. Mar.
Geod. 45 (6), 615–644.
Lyzenga, D.R., 1978. Passive remote sensing techniques for mapping water depth and
bottom features. Appl. Opt. 17, 379–383.
Mandlburger, G., K¨
olle, M., Nübel, H., Soergel, U., 2021. BathyNet: A deep neural
network for water depth mapping from multispectral aerial images. PFG–Journal of
Photogrammetry, Remote Sensing and Geoinformation. Science 89 (2), 71–89.
Mishra, D., Narumalani, S., Lawson, M., Rundquist, D., 2004. Bathymetric mapping using
IKONOS multispectral data. Giscience &Remote Sensing 41 (4), 301–321.
Misra, A., Vojinovic, Z., Ramakrishnan, B., Luijendijk, A., Ranasinghe, R., 2018. Shallow
water bathymetry mapping using Support Vector Machine (SVM) technique and
multispectral imagery. Int. J. Remote Sens. 39 (13), 4431–4450.
Monteys, X., Harris, P., Caloca, S., Cahalane, C., 2015. Spatial Predictions of Coastal
Bathymetry based on Multispectral Satellite Imagery and Multibeam data. Remote
Sens. (Basel) 7, 13782–13806.
Mudiyanselage, S.S.J.D., Abd-Elrahman, A., Wilkinson, B., Lecours, V., 2022. Satellite-
derived bathymetry using machine learning and optimal Sentinel-2 imagery in
South-West Florida coastal waters. Giscience &Remote Sensing 59 (1), 1143–1158.
Niroumand-Jadidi, M., Bovolo, F., Bruzzone, L., 2020. SMART-SDB: Sample-specic
multiple band ratio technique for satellite-derived bathymetry. Remote Sens.
Environ. 251, 112091.
Pan, X., Zhang, C., Xu, J., Zhao, J., 2021. Simplied object-based deep neural network
for very high resolution remote sensing image classication. ISPRS J. Photogramm.
Remote Sens. 181, 218–237.
Poursanidis, D., Traganos, D., Reinartz, P., Chrysoulakis, N., 2019. On the use of
Sentinel-2 for coastal habitat mapping and satellite-derived bathymetry estimation
using downscaled coastal aerosol band. Int. J. Appl. Earth Obs. Geoinf. 80, 58–70.
Riegl, B., Moyer, R.P., Walker, B.K., Kohler, K., Gilliam, D., Dodge, R.E., 2008. A tale of
germs, storms, and bombs: geomorphology and coral assemblage structure at
Vieques (Puerto Rico) compared to St. Croix (US Virgin Islands). J. Coast. Res. 24
(4), 1008–1021.
Stumpf, R.P., Holderied, K., Sinclair, M., 2003. Determination of water depth with high-
resolution satellite imagery over variable bottom types. Limnol. Oceanogr.
48.1part2, 547–556.
Su, H., Liu, H., Wang, L., Filippi, A.M., Heyman, W.D., Beck, R.A., 2013. Geographically
adaptive inversion model for improving bathymetric retrieval from satellite
multispectral imagery. IEEE Trans. Geosci. Remote Sens. 52 (1), 465–476.
Sun, S., Chen, Y., Mu, L., Le, Y., Zhao, H., 2023. Improving Shallow Water Bathymetry
Inversion through Nonlinear Transformation and Deep Convolutional Neural
Networks. Remote Sens. (Basel) 15 (17), 4247.
Via˜
na-Borja, S.P., Fern´
andez-Mora, A., Stumpf, R.P., Navarro, G., Caballero, I., 2023.
Semi-automated bathymetry using Sentinel-2 for coastal monitoring in the Western
Mediterranean. Int. J. Appl. Earth Obs. Geoinf. 120, 103328.
Wang, Z., Bovik, A.C., 2009. Mean squared error: Love it or leave it? A new look at signal
delity measures. IEEE Signal Process Mag. 26 (1), 98–117.
Wang, L., Liu, H., Su, H., Wang, J., 2019a. Bathymetry retrieval from optical images with
spatially distributed support vector machines. Giscience &Remote Sensing 56 (3),
323–337.
Wang, Y., Zhou, X., Li, C., Chen, Y., Yang, L., 2019b. Bathymetry model based on spectral
and spatial multifeatures of remote sensing image. IEEE Geosci. Remote Sens. Lett.
17 (1), 37–41.
Wilson, B., Kurian, N.C., Singh, A., Sethi, A., 2020, September.. Satellite-derived
bathymetry using deep convolutional neural network. In: IGARSS 2020–2020 IEEE
International Geoscience and Remote Sensing Symposium. IEEE, pp. 2280–2283.
Wu, Z., Mao, Z., Shen, W., Yuan, D., Zhang, X., Huang, H., 2022. Satellite-derived
bathymetry based on machine learning models and an updated quasi-analytical
algorithm approach. Opt. Express 30 (10), 16773–16793.
Xia, H., Li, X., Zhang, H., Wang, J., Lou, X., Fan, K., Shi, A., Li, D., 2019. A bathymetry
mapping approach combining log-ratio and semianalytical models using four-band
multispectral imagery without ground data. IEEE Trans. Geosci. Remote Sens. 58 (4),
2695–2709.
Xu, N., Wang, L., Zhang, H.S., Tang, S., Mo, F., Ma, X., 2024. Machine learning based
estimation of coastal bathymetry from ICESat-2 and Sentinel-2 data. IEEE J. Sel. Top.
Appl. Earth Obs. Remote Sens. 17, 1748–1755.
Zhang, X., Ma, Y., Li, Z., Zhang, J., 2022. Satellite derived bathymetry based on ICESat-2
diffuse attenuation signal without prior information. Int. J. Appl. Earth Obs. Geoinf.
113, 102993.
Zhong, J., Sun, J., Lai, Z., Song, Y., 2022. Nearshore bathymetry from icesat-2 lidar and
sentinel-2 imagery datasets using deep learning approach. Remote Sens. (Basel) 14
(17), 4229.
Zhu, J., Qin, J., Yin, F., Ren, Z., Qi, J., Zhang, J., Wang, R., 2021. An APMLP deep
learning model for bathymetry retrieval using adjacent pixels. IEEE J. Sel. Top. Appl.
Earth Obs. Remote Sens. 15, 235–246.
Zhu, J., Yin, F., Qin, J., Qi, J., Ren, Z., Hu, P., Zhang, J., Zhang, X., Wang, R., 2022.
Shallow water bathymetry retrieval by optical remote sensing based on depth-
invariant index and location features. Can. J. Remote. Sens. 48 (4), 534–550.
X. Qin et al. ISPRS Journal of Photogrammetry and Remote Sensing 218 (2024) 150–169
169