Large-scale Detection of Marine Debris in Coastal Areas
Marc Rußwurm, Sushen Jilla Venkatesa, Devis Tuia
aEPFL ECEO Laboratory, Rue de l’Industrie 17, Sion, 1950, Valais, Switzerland
Detecting and quantifying marine pollution and macro-plastics is an in-
creasingly pressing ecological issue that directly impacts ecology and human
health. Eﬀorts to quantify marine pollution are often conducted with sparse
and expensive beach surveys, which are diﬃcult to conduct on a large scale.
Here, remote sensing can provide reliable estimates of plastic pollution by
regularly monitoring and detecting marine debris in coastal areas. Medium-
resolution satellite data of coastal areas is readily available and can be lever-
aged to detect aggregations of marine debris containing plastic litter. In this
work, we present a detector for marine debris built on a deep segmentation
model that outputs a probability for marine debris at the pixel level. We train
this detector with a combination of annotated datasets of marine debris and
evaluate it on speciﬁcally selected test sites where it is highly probable that
plastic pollution is present in the detected marine debris. We demonstrate
quantitatively and qualitatively that a deep learning model trained on this
dataset issued from multiple sources outperforms existing detection models
trained on previous datasets by a large margin. Our experiments show, con-
sistent with the principles of data-centric AI, that this performance is due to
our particular dataset design with extensive sampling of negative examples
and label reﬁnements rather than depending on the particular deep learning
model. We hope to accelerate advances in the large-scale automated detec-
tion of marine debris, which is a step towards quantifying and monitoring
marine litter with remote sensing at global scales, and release the model
weights and training source code1.
Keywords: Marine Debris Detection, Plastic Pollution, Sentinel-2
Preprint submitted to ArXiv July 6, 2023
arXiv:2307.02465v1 [cs.CV] 5 Jul 2023
Marine litter is accumulating at alarming rates, with 19 to 23 million
metric tonnes dispersed in 2016 alone (Borrelle et al., 2020). Plastic artifacts
constitute 75% of marine litter, exceeding 5 trillion objects in numbers (Erik-
sen et al., 2014), and are causing a serious threat to marine ecosystems and
human health. Approximately 80% of marine litter originates from terres-
trial sources (Andrady, 2011). It accumulates in rivers (Van Emmerik et al.,
2019; van Emmerik and Schwarz, 2020) and lakes (Faure et al., 2012) and
eventually enters open oceans. Primary micro-plastics are purposefully man-
ufactured to carry out a speciﬁc function, like abrasive particles or powders
for injection molding. Secondary micro-plastics result from fragmentation
of larger objects (Kershaw et al., 2019). In particular transport in rivers
causes macro-plastics (>2.5 cm diameter) to decompose into meso- (5 mm −
2.5 cm) and micro-plastics (<5 mm diameter) (Kershaw et al., 2019; Hanke
et al., 2013), which then enter the food chain. Micro-plastics have been
found across the entire planet and have been detected in antarctic penguins
(Bessa et al., 2019), deep-sea sediments (Van Cauwenberghe et al., 2013),
and human stool (Schwabl et al., 2019) and have been shown to aﬀect the
growth of corals (Chapron et al., 2018). A range of economic costs can
also be associated with marine pollution, from clean-up expenses to loss of
tourism revenue (Beaumont et al., 2019). It is clear that monitoring and
mitigating water pollution is a major environmental, social, and economic
challenge, and systematic mapping is needed to both identify pollutants and
measure the success of awareness and clean-up programs. Continuous mon-
itoring and litter quantiﬁcation are often limited to individual surveys that
are labor-intensive and expensive to conduct regularly (Van Dyck et al.,
2016). These approaches can only cover a comparatively small area, even
when surveyors are supported by aerial UAV imagery, as explored by Wolf
et al. (2020); Goddijn-Murphy et al. (2022); Escobar-S´anchez et al. (2022);
Topouzelis et al. (2019). Eﬀectively, only a few developed countries, such as
the United Kingdom, can aﬀord a systematic monitoring program (Rees and
Pond, 1995). These programs still require support from the local population
in citizen science projects to collect ground data (Hidalgo-Ruz and Thiel,
2015). This level of engagement requires a public sensitivity to the problem,
awareness, and, eventually, the technological means to report pollutants.
Satellite imagery that provides data at reasonable spatial and high tempo-
ral resolution can support this monitoring in large marine areas (Hanke et al.,
2013). Even though it is a pressing issue, remote sensing-enabled monitoring
of marine debris has only relatively recently emerged as a major research
topic, as summarized by the broad reviews of Salgado-Hernanz et al. (2021)
and Topouzelis et al. (2021). Both reviews compared drone, aircraft, and
optical and radar satellite-based acquisition methods. In particular, machine
learning models have been increasingly used for this problem, as summarized
by Politikos et al. (2023), who aggregated a comprehensive list of approaches
and locations where machine learning algorithms have been deployed in the
last years across the globe. For optical sensors, high spatial (<3 m) and
spectral resolutions beyond RGB (400 nm to 2500 nm) were found optimal
for the detection of aggregations of marine debris. Synthetic Aperture Radar
(SAR) can be potentially suitable for detecting sea-slicks (Davaasuren et al.,
2018) that are associated with surfactants and change the surface tension of
the water, which in turn reduces the radar back-scatter. These slicks consist
of microbial bio-ﬁlms that can be connected with micro-plastics suspended in
the sea-surface microlayer (Salgado-Hernanz et al., 2021). However, a recent
study (Sun et al., 2023) demonstrated that only very high concentrations of
microplastics lead to a suﬃciently strong dampening of waves to be detectable
with radar satellites. Similarly to sea slicks, macro-plastics can aggregate in
lines driven by environmental forces, such as wind speed, waves, or coastal
fronts. For instance, windrows are accumulations of surface debris. Their ge-
ometry allows for eﬃcient ship-based collection eﬀorts, which can be highly
eﬀective, as demonstrated by Ruiz et al. (2020). Their collection campaign
lasted 68 working days during the spring and summer of 2018 and gathered
16.2 tons of ﬂoating marine litter in the Bay of Biscay. This work demon-
strated that detecting and collecting aggregated debris on the sea surface
in geographic areas with a high pollution level can be directly attributed to
macro plastic litter. Marine debris aggregations in windrows are suﬃciently
large to be detectable at medium resolutions of 10 m by 10 m achievable by
Sentinel-2 and can eﬀectively serve as a proxy for macro plastic litter in the
oceans (C´ozar et al., 2021; Arias et al., 2021). However, further distinguishing
ﬂoating objects of natural origins, such as driftwood, or patches of algae and
sargassum, from objects of human origins in large-scale medium-resolution
imagery remains challenging and is an ongoing topic of current research (Hu,
2021, 2022; Ciappa, 2021, 2022). This further ﬁne-grained distinction may re-
quire currently unavailable sensor technology (Salgado-Hernanz et al., 2021)
and is beyond the scope of this work. Instead, we study the eﬀectiveness of
detecting heterogeneous marine objects of both natural or anthropogenic ori-
gins at a large scale with globally available Sentinel-2 imagery. In this work,
we aim to monitor ﬂoating marine litter by detecting marine debris as a proxy
at a large scale. To do so, we evaluate our detector in selected areas where it
is likely that marine litter is present in marine debris due to local studies and
reports in the news and social media. This evaluation strategy ensures that
our detector is sensitive to plastic pollution if marine debris is detected. This
work follows the principles of data-centric AI (Whang et al., 2023), where
the methodological innovation is concentrated on carefully designing of the
dataset rather than the speciﬁcities of the particular deep learning model.
Throughout this work, we will use the term marine litter according to the
United Nations Environment Programme (2009) deﬁnition as any persistent,
manufactured, or processed solid material discarded, disposed of, or aban-
doned in the marine and coastal environment. We use marine debris more
broadly as any aggregation of ﬂoating materials on the sea surface that may
or may not contain marine litter of anthropogenic origins. The terms “lit-
ter”, “debris” and “plastic” have particular meanings to diﬀerent groups of
people depending on the scientiﬁc or technical context or cultural preference
(Kershaw et al., 2019) and “marine debris” is often, especially in US-English,
used synonymously with “marine litter”. However, we believe a distinction
is necessary for technical reasons in this application: visual inspection of the
current satellite imagery (without on-site knowledge) can not reliably distin-
guish marine litter of human origins from marine debris that may also be
of natural origins. Hence, any work relying on hand annotations of satellite
images can not resolve this conﬂict objectively, as on-site knowledge of the
composition of the visible marine debris is only available from dedicated cam-
paigns (Topouzelis et al., 2019, 2020a) that yield few thoroughly analyzed
pixels. In prior work (Mifdal et al., 2021), we used the generic term “ﬂoat-
ing object”, while others like Booth et al. (2022) chose the term “suspected
plastics”. Both terms entail their limitations by being either too broad, as
“ﬂoating objects” may include ships, or are too focused on plastics over other
forms of litter. Our deﬁnitions of anthropogenic marine litter and generic
marine debris follow the practices of Kikaki et al. (2022) who annotated sim-
ilar objects termed marine debris in the Marine Debris Archive (MARIDA)
and are used consistently throughout this work.
The rest of the paper is organized as follows: The next section summarizes
related work on detecting marine pollution with remote sensing technology.
Section 3 describes training, validation, and evaluation data used in this
study and details the implementation of the segmentation models in the Ma-
rine Debris Detector. Section 4 presents results compared to related work
and methodologies qualitatively and quantitatively. Further experiments test
the robustness of the Marine Debris Detector concerning atmospheric cor-
rection and test the transferability to higher-resolution PlanetScope imagery
that can supplement the Sentinel-2 imagery used primarily in this work. The
ﬁnal Section 5 discusses the results and provides conclusions for future work.
2. Related Work
Detecting marine debris with satellite imagery at high (typically 3 m to
7 m with PlanetScope imagery) and medium resolution (mainly at 10 m with
Sentinel-2) is a rising scientiﬁc question in remote sensing research. Initial
advances were made by pixel-wise classiﬁers using multi-spectral spectral
reﬂectance in combination with dedicated spectral indices, such as the Nor-
malized Diﬀerence Vegetation Index (NDVI). Themistocleous et al. (2020)
investigated the detection of ﬂoating plastic litter from space using Sentinel-2
imagery in Cypris and proposed plastic index as the ratio of near-infrared
reﬂectance to the sum of red and near-infred similar to NDVI. Similarly, Bier-
mann et al. (2020) proposed a Floating Debris Index (FDI), which is a modiﬁ-
cation of the Floating Algae Index (FAI) (Hu, 2009). They demonstrated the
eﬀectiveness of FDI with a na¨ıve Bayes classiﬁer in two-dimensional NDVI-
FDI feature space. However, this classiﬁer, originally ﬁtted on hand-selected
training and evaluation data under optimal conditions, was not accurate
enough on unﬁltered satellite imagery in practice, as demonstrated by Mif-
dal et al. (2021). Kikaki et al. (2022) achieved the best accuracies with
a pixel-wise random forest classiﬁer that utilized the Sentinel-2 reﬂectance
bands, a range of spectral indices, and textural features. In Mifdal et al.
(2021), we investigated the suitability of learned spatial features with a con-
volutional neural network for binary marine debris detection. While their
results showed general applicability towards detecting marine debris with
deep segmentation models, they identiﬁed several limitations and the sen-
sitivity to a range of false-positive detections that made their model not
employable in an automated way. Simultaneously, Shah et al. (2021) anno-
tated RGB PlanetScope imagery with bounding boxes and trained a deep
object detector on the localization of marine debris. Most recently, G´omez
et al. (2022) focused on detecting debris in rivers with Sentinel-2 and tested
likelihood of plastic debris and quality of annotations
Refined FloatingObjects Accra, Ghana 2018-10-31
Durban, SA 2019-04-24
Plastic Litter Projects
2021 & 2022, Greece
Training Validation Evaluation
dataset of multiple (annotated) scenes single Sentinel-2 scene
MARIDA validation MARIDA test
various objects, high
mixed, objects (likely of natural
origins), high point-wise
high plastic probability,
Figure 1: Overview of the datasets used for training, validation, and evaluation in this
work. We focus on quantity and diversity in the training datasets while prioritizing accu-
rate annotations in validation and evaluation data. The scenes in Accra and Durban likely
contain plastic litter in the visible marine debris and are explicitly used for evaluation.
several deep segmentation models to understand and predict ﬂoating debris
accumulations. Similar to this work, Booth et al. (2022) presents a super-
vised U-Net classiﬁer named MAP-Mapper which is learned on the MARIDA
dataset aimed to predict the density of marine debris.
Several public datasets were made available alongside the respective pub-
lications. Both the FloatingObjects dataset (Mifdal et al., 2021) and the
Marine Debris Archive (MARIDA) (Kikaki et al., 2022) contain Sentinel-2
imagery with a substantial number of hand-annotations of visually detected
marine debris hand-annotated. They diﬀer mostly in the binary (debris vs
other, i.e., non-debris) and multiclass (types of debris) nature of the anno-
tations. The NASA Marine Debris dataset (Shah et al., 2021) focused on
3-channel RGB PlanetScope imagery with coarse bounding box annotations.
In this paper, we extend initial work of Mifdal et al. (2021)and train
a deep segmentation model on the combined datasets of FloatingObjects
(Mifdal et al., 2021) and MARIDA (Kikaki et al., 2022). We further use
additional datasets to train our detector, which we detail in the next section.
3. Materials and Methods
Deﬁning and aggregating training data for marine debris detection is chal-
lenging due to the heterogeneous nature of objects, the novelty of the disci-
pline, and the scarcity of available datasets. This section ﬁrst outlines the
sources, aggregation choices, and design decisions to generate the training,
validation, and evaluation datasets used in this work. Speciﬁcally, Section 3.1
focuses on the datasets used for training, while Section 3.2 outlines the vali-
dation and evaluation sets. An overview of the datasets is provided in Fig. 1.
For training datasets, we focused on quantity and aggregated a large dataset
of heterogeneous marine debris and other ﬂoating materials alongside nega-
tive examples focused on ships (S2Ships). The quality of this large training
pool is variable, but this also reﬂects the inherent diﬃculty of the task. In the
validation and evaluation data, we focus more on the quality and accuracy
of annotations of marine debris. The evaluation scenes were chosen explic-
itly in areas where we were certain, due to manual veriﬁcation, that plastic
pollution is present among marine debris. After describing the dataset, the
models used are detailed in Sections 3.3 and 3.4, which describe our detector
and the comparison methods, respectively. Accuracy metrics are described
in Section 3.5.
3.1. Training Data
The available annotated data on the detection of marine debris is scarce.
To our knowledge, only two publicly available datasets focusing on Sentinel-2
imagery are available today. The Marine Debris Archive (MARIDA) (Kikaki
et al., 2022) provides multiple labels on polygon-wise hand-annotated Sentinel-
2 images, and the FloatingObjects (Mifdal et al., 2021) provides binary la-
bels (ﬂoating objects versus water annotations) in coarse hand-drawn lines
on Sentinel-2 scenes. We further improve the quality of these annotations
by an automated label reﬁnement heuristic deﬁned for this problem. Our
goal is to train a model that can predict marine debris from openly accessi-
ble satellite imagery in diﬀerent conditions and therefore making it possible
process both top-of-atmosphere and atmospherically corrected bottom-of-
atmosphere data. For atmospheric correction, we further chose to use prod-
ucts corrected with Sen2COR (Main-Knorn et al., 2017) that are readily
available to download in Google Earth Engine rather than products corrected
with ACOLITE (Vanhellemont and Ruddick, 2016), where the atmospheric
correction would have to be done individually at each raw image scene. To
study the eﬀect of atmospheric correction, we test our models on imagery at
diﬀerent atmospheric processing levels (see Section 4.2). To avoid confusion
of marine debris with ships, one of the major problems highlighted in Mif-
dal et al. (2021), we also include the S2Ships dataset (Ciocarlan and Stoian,
2021) that provides negative non-debris examples of class other. All three
datasets are detailed in the next subsections.
The FloatingObjects dataset originates from our prior work in Mifdal
et al. (2021) and contains 26 diﬀerent globally distributed Sentinel-2 scenes.
Overall, 3297 ﬂoating objects were annotated by lines when visually identiﬁed
as marine debris. In this work, we use this dataset exclusively for training,
as a certain level of label noise is present in the annotations. We decided to
exclude four regions accra 20181031,lagos 20190101,neworleans 20200202,
venice 20180630 to be re-annotated in the ReﬁnedFloatingObjects validation
dataset described later in Section 3.2.1. The remaining 22 regions were used
We follow the data sampling strategy of Mifdal et al. (2021) and crop
a small image patch of 128 px by 128 px centered on each line segment of
the available marine debris annotations. To obtain negative examples with-
out any marine debris, we select random points within the Sentinel-2 scenes
and extract equally sized image patches. We also use both processing levels
L1C (top-of-atmosphere) and L2A (bottom-of-atmosphere), where we always
select the L2A image available in the Google Earth Engine Archive (Gore-
lick et al., 2017) and resort to L1C if no atmospherically corrected image
is available. The eﬀect of atmospheric correction on the performance of the
detector is evaluated later in Section 4.2. In all cases, 12 Sentinel-2 bands are
used. These are all the available bands, excluding the haze-band B10, which
the Sen2COR atmospheric correction (Main-Knorn et al., 2017) algorithm
Label Reﬁnement Module. While the FloatingObjects dataset pro-
vides a large number of labels, the annotated lines do not always accurately
capture the width and geometry of the underlying marine debris. We im-
prove the hand annotations by an automated label reﬁnement module that
generates a mask that reﬂects more closely the geometry of the debris in the
proximity of the line annotations (Fig. 2). The module inputs a Sentinel-2
scene and the original line annotations mask. In the ﬁrst stage (left side of
Fig. 2), we buﬀer the hand-annotated line to obtain a region of potential
marine debris. Then, we calculate the Floating Debris Index (FDI) using the
Sentinel-2 scene and perform a segmentation of the FDI image with an Otsu
threshold (Otsu, 1979). The buﬀer and segmentation are then combined to
obtain a preliminary area of marine debris in the vicinity of the original an-
notations. In the second stage, we randomly sample potential marine debris
pixels, as well, as markers for non-debris pixels (class other) in the remaining
parts of the image. These markers are the starting points of a random walk
segmentation algorithm (Grady, 2006), which is a fast algorithm that requires
a few labeled pixels as markers. The markers are assumed to be accurately
annotated, while the pixels between the markers are uncertain and are then
annotated by an underlying anisotropic diﬀusion process that ensures that
homogeneous areas are assigned to the same class. Crucially, one set of pa-
rameters (homogeneity criterion, buﬀer size, marker sampling frequency) of
the random walker algorithm leads to one potential debris map. Therefore,
we vary those parameters and average all maps to capture the underlying
undeﬁnedness of the borders of marine debris, as shown in the bottom row
of Fig. 2.
The Marine Debris Archive (MARIDA) was collected by Kikaki et al.
(2022) for developing and evaluating machine learning algorithms for marine
debris detection. MARIDA contains 63 temporally overlapping Sentinel-2
scenes from 12 distinct regions. In total, 6672 polygons were annotated, of
which 1882 are marine debris and 2447 marine water. The remaining 2343
polygons are annotated in one of 13 further classes with between 24 and
356 annotations each that we do not use in this study. We use MARIDA
as additional training, validation, and evaluation data source, but consider
only patches annotated as marine debris (positive class) and treat instances
of marine water as negatives. The MARIDA dataset contains Sentinel-2
imagery with 11-bands that have been atmospherically corrected with the
ACOLITE (Vanhellemont and Ruddick, 2018) algorithm. In this work, we
want to apply our detector on 12-band Sentinel-2 imagery that had been
atmospherically corrected with Sen2COR (Main-Knorn et al., 2017), as is
readily available, for instance, in Google Earth Engine (Gorelick et al., 2017).
This avoids reprocessing additional imagery after download and simpliﬁes the
application on new scenes. To harmonize this dataset, we re-downloaded all
Sentinel-2 scenes from Google Earth Engine to retrieve 12-band imagery for
MARIDA compatible with the other datasets. Like FloatingObjects, we use
buffer refined mask
random walk segmentation
repeated refinement with different settings
Label Refinement Module
Figure 2: Label Reﬁnement Module for the FloatingObjects dataset. It inputs a Sentinel-2
image and the original hand annotation of the FloatingObjects dataset (left). An Otsu-
threshold segmentation buﬀered around the hand labels (Otsu, 1979) (center) is used to
sample marker points (shown on the right) for a random walk segmentation algorithm
(Grady, 2006) that results in a reﬁned annotated mask (right). By varying parameters,
we generate diﬀerent variants of the mask, whose average expresses the uncertainty and
fuzziness on the borders of the debris (second row).
the atmospherically corrected L2A Sentinel-2 imagery whenever available.
We also excluded one scene near Durban from MARIDA (named S2 24-4-
19 36JUN) to avoid spatial overlap, and potential positive biases with our
evaluation scene described later in Section 3.2.
Ships and their wakes can cause false positive predictions of marine debris,
as reported by Mifdal et al. (2021). We decided to explicitly add images of
ships without any annotated marine debris as negative examples. We use the
S2Ships dataset of Ciocarlan and Stoian (2021), which segmented ships with
Sentinel-2 imagery. In our training pipeline, we retrieve these ship positions,
load an image centered on each ship and show it to our detector during
training with a negative prediction mask indicating the class other.
3.2. Validation and Evaluation Sites
For ﬁnding the best neural network design and hyperparameters (i.e.,
validation), as well, as for the ﬁnal independent evaluation, we used datasets
with high-quality annotations. For both sets, we combine the MARIDA
datasets, according to their validation and evaluation partitioning schemes,
with a reﬁned version of the FloatingObjects dataset that we describe in
the next Section 3.2.1. For further qualitative evaluation, we additionally
use imagery from the Plastic Litter Projects 2021 and 2022, detailed further
in Section 3.2.2. For both validation and evaluation datasets, we focus on
using accurate annotations and we select only sites with a high probability
of plastic pollution speciﬁcally for ﬁnal evaluation, as detailed further in the
We create a reﬁned version of the FloatingObjects dataset (Section 3.1.1)
with less label noise, by re-annotating some a subset of FloatingObjects
regions by individual point locations of which we are certain that they are
localized accurately on visible marine debris in the imagery. We conduct
this annotation in Google Earth Engine (GEE) (Gorelick et al., 2017) and
select the subset of regions named lagos 20190101,neworleans 20200202,
venice 20180630,accra 20181031 . We also included two new areas, which
are marmara 20210519 and durban 20190424 . By carefully annotating these
areas, we are conﬁdent that we captured the precise location of the class
marine debris in these Sentinel-2 scenes. To train a model, we also need
examples for the negative other class to calculate accuracy scores that capture
a diverse set of negatives, like open water, land, coastline, and ships, that
likely confuse the model. To obtain these negative examples, we iteratively
added negative examples by monitoring the result of a smileCART (Breiman
et al., 1984) classiﬁer implemented online in Google Earth Engine. This
classiﬁer serves as a proxy antagonist to us as labelers, i.e., it will highlight
areas that appear like marine debris and will be checked by annotators. We
explicitly added new negative examples in locations where this proxy classiﬁer
incorrectly predicted marine debris. Hence, we captured meaningful negative
point locations of the other class that was diﬃcult to distinguish from the
annotated marine debris by the smileCart classiﬁer.
At validation and evaluation time, we extract a 128 px ×128 px patches
centered on each of these annotated points that are labeled as either marine
debris (positive) or other (negative). We can only be certain about the class
at the precise annotations of the point in the center of each image patch.
Hence, we ﬁrst segment the entire patch using the semantic segmentation
model but then extract the prediction only at the center pixel corresponding
to the annotated point for accuracy estimation. This selection eﬀectively
simpliﬁes the segmentation problem to a classiﬁcation problem at the center
of the image patch. It allows us to use standard classiﬁcation metrics to
measure the accuracy (described in Section 3.5).
Among the six regions in this dataset, we use the Sentinel-2 scenes la-
gos 20190101,neworleans 20200202,venice 20180630,marmara 20210519 for
validation, as we are not certain about the composition of the visible ma-
rine debris in these images. For instance, marmara 20210519 likely contains
ﬂoating algae (sea snot), as it coincides with reported algae blooms (Kue-
bler, 2021) which are often present in this area (Hu et al., 2022). We use the
accurate annotations of this generic marine debris in these areas to calibrate
the model hyperparameters, such as the classiﬁcation threshold, before ﬁnal
For evaluation, we use the scenes accra 20181031 and durban 2019042 ,
as these areas very likely contain plastics in the marine debris:
•Evaluation Scene Accra, Ghana, 2018-10-31. Beach surveys in
2013 showed that plastic materials made up the majority of 63.72% of
marine debris washed onto evaluated beaches (Van Dyck et al., 2016).
A recent study (Pinto et al., 2023) estimated the daily plastic mass
transport of plastic in the Odaw river running through Accra into the
sea between 140 and 380 kilogram per day. Qualitatively, one particular
area in this Sentinel-2 scene, shown in Fig. 3 (top), shows an outwash
of debris from the coast. In this image, the marine debris are visible
in yellow (high ﬂoating debris index FDI). We show a high-resolution
background map from Google Satellites for land and shoreline to pro-
vide a reference. Two zoomed-in areas (named 1 and 2 in Fig. 3) show
that coastal erosion is visible alongside waste and sewage outﬂows ag-
gregations. Finally, a Google Street View image (bottom row of Fig. 3)
further conﬁrms this area’s general pollution level. Only a Sentinel-2
image at the top-of-atmosphere processing level (L1C) is available in
Google Earth Engine in Accra.
•Evaluation Scene Durban, South Africa, 2019-04-24. This eval-
uation scene was ﬁrst identiﬁed by Biermann et al. (2020), who used
social media and news reports to select areas of plastic pollution. It
covers marine debris that likely contains plastic litter from a ﬂood event
in Durban following heavy rainfall starting on April 18th 2019. This
ﬂood discharged large quantities of debris into the harbor of the Dur-
ban Metropole, as shown in Fig. 4. We acquired one Sentinel-2 image
from April 24th, shown in Fig. 4c, where visible debris originates from
the harbor area (highlighted in gray). The debris in this image likely
contains plastic litter. This image is particularly diﬃcult to predict,
as clouds and haze from former precipitations are still visible in this
scene. The patches of marine debris visible in the FDI representation
are less pronounced than in the Accra scene, which has more clearly
identiﬁable objects. In this area both top-of-atmosphere (L1C) and
bottom-of-atmosphere (L2A) Sentinel-2 images are available. We com-
pare the model performance on both versions later in Section 4.2.
3.2.2. Plastic Litter Projects
The third evaluation area covers Sentinel-2 data showing explicitly de-
ployed debris targets in the Plastic Litter Projects of 2021 and 2022 (Topouzelis
et al., 2019, 2020b; Papageorgiou et al., 2022) on the island of Lesbos, Greece.
In 2021, one 28 m diameter high-density polyethylene (HDPE) mesh was de-
ployed on June 8th 2021, followed by a 28 m wooden target on June 17th
2021. Both were visible during 22 Sentinel-2 satellite overpasses until 7th of
October 2021. In the Plastic Litter Project 2022, one 5 m ×5 m inﬂatable
PVC target, alongside two 7m diameter HDPE meshes were deployed on
Figure 3: Evaluation scene in Accra, Ghana. The top row shows an FDI visualization of
the Sentinel-2 image of October 31st 2018, where marine debris is washed into the open
waters. Closer investigations with high-resolution satellite images (center row) show that
coastal erosion is present, and this area is generally polluted with human litter. This is
also conﬁrmed by a Google Street View image shown on the bottom row.
(a) Photo: Ash Erasmus (b) Photo: Ash Erasmus
(c) Sentinel-2 evaluation scene with debris annotations.
Figure 4: Evaluation scene from Durban, South Africa. Additional imagery shared by
local news and social media (top row) show the level of plastic pollution on 24th of April
2019. The Sentinel-2 image (bottom image) shows the corresponding Sentinel-2 scene we
use for evaluation.
June 16th 2022. One HDPE mesh was cleaned regularly, while the other was
subject to natural fouling and algae. The objects were deployed until the 11th
of October 2022 and were visible in 23 Sentinel-2 acquisitions. Additional
smaller 1 m2and 3 m2targets were also deployed throughout the project
phase to study visibility and the material’s decomposition in water but were
too small to be visible in the Sentinel-2 scenes. We use the Sentinel-2 data
of the 2021 campaign to qualitatively test the ability of our detector and
comparison models to detect the deployed targets in the Sentinel-2 imagery.
3.3. Marine Debris Detector Implementation
This section describes the implementation of the Marine Debris Detector
as a deep segmentation model that inputs a 12-channel Sentinel-2 image and
estimates the probability of marine debris’s presence for each pixel.
3.3.1. Segmentation Model Architectures
We implemented the UNet (Ronneberger et al., 2015) and Unet++
(Zhou et al., 2018) architectures, as shown in Fig. 5. The Unet segmen-
tation model of Ronneberger et al. (2015) was developed for medical image
segmentation and is heavily used in remote sensing due to the ﬁne-grained
segmentation masks it can produce. The success of the Unet is strongly
related to its early skip connections, which help maintain the details of the
image in the ﬁnal map. As such, skip connections enable the propagation of
a high-resolution representation of the input image through the entire net-
work. This network was the one used previously by Mifdal et al. (2021) for
marine debris detection.
The Unet++ (Zhou et al., 2018) variant extends the original Unet
by replacing the original encoder with a ResNet (He et al., 2016) with
four blocks (indicated as li). ResNets are the de-facto standard feature
extractor in computer vision, as they can learn complex representation while
requiring fewer weights than many earlier networks. The decoder consists of
three double-convolutional blocks (indicated with bi). Each block consists of
two convolution-batchnorm-relu transformations. While the original Unet
directly connects the output of each encoder layer with the corresponding
decoder layer of same resolution, the Unet++ adds additional double-conv
blocks in these skip pathways that are connected densely in the spirit of
DenseNet neural networks (Zhu and Newsam, 2017).
input image debris probability
Marine Debris Detector
double conv block
resnet encoder layer
deep segmentation model
Figure 5: Schematic of the Marine Debris Detector implementation with an underlying
Unet (Ronneberger et al., 2015) or Unet++ (Zhou et al., 2018) segmentation model.
A 12-channel input image (top-left) is encoded to hidden feature representations in sev-
eral levels of resolution (vertical pathways) and decoded to a probability of marine debris
(top-right). Higher-resolution pathways ensure that the resulting segmentation map is ﬁne-
grained, while lower-resolution encode global information on the entire scene. Unet++
(Zhou et al., 2018) extends the original unet (Ronneberger et al., 2015) by adding addi-
tional dense connections in the skip pathways indicated in blue.
3.3.2. Implementation and Training Details
We train Unet and Unet++ models with a learning rate of 0.01 and
weight decay 1×10−6for 100 epochs. The Unet implementation in this
work has 31 million trainable parameters, while the Unet++ has 26 mil-
lion parameters. Regarding the label reﬁnement module (Section 3.1.1), we
compute multiple reﬁned segmentation masks with diﬀerent parameters and
choose a buﬀer size of 0, 1, or 2 pixels, the β-parameter of the random
walker (a penalization coeﬃcient for the walker motion) of 1 or 10, and the
marker density for marine debris of 5%, 25%, 50% or 75% (the density of
other markers is ﬁxed at 5%). Combined with the original mask, this yields
25 diﬀerent target masks consistent with the hand annotations and the FDI
image but of varying shapes and sizes, as shown in the bottom row of Fig. 2.
During training, we choose one of these target masks randomly, which, in
our opinion, reﬂects best the undeﬁned borders of the marine debris that we
aim to detect and acts as a form of natural label-data augmentation. During
training, we monitor the area under the ROC curve (AUROC) on the reﬁned
FloatingObjects dataset (Section 3.2.1) and MARIDA validation set. We
store the model weights each time the highest (best) validation AUROC has
been reached. We observe that the model systematically underestimates the
probability of marine debris due to a heavy class imbalance in the training
data. This results in a low precision but high recall when we assign the class
marine debris for probability scores above 0.5. We counteract this imbalance
by calibrating the classiﬁcation threshold to balance precision and recall on
the validation set.
For the Unet++ model, we trained models from diﬀerent random seeds
with validation-optimal thresholds of 0.132 0.0639, and 0.0254 during the
experiments shown in this paper. For the Unet, the thresholds were 0.0895,
0.0712, and 0.0643.
Training a Unet++ and Unet took eight and nine hours on an NVIDIA
RTX 3090 graphics card with multi-threaded data loading with 32 workers.
The estimated carbon footprint for one model training run was 2.8 kg.eCO2.
3.4. Comparison Methods
We compare models trained within our training framework to approaches
from recent literature. In particular, the Unet trained by Mifdal et al. (2021)
on the original FloatingObjects dataset, and a Random Forest classiﬁer, de-
noted by rf, trained on the original MARIDA dataset (Kikaki et al., 2022).
For the Unet, we use the provided pre-trained weights for their model. Sim-
ilarly to our segmentation models, we also determine the best classiﬁcation
threshold based on the validation set to achieve results with balanced pre-
cision and recall, which is 0.039. For the random forest classiﬁer (rf), we
train the random forest on 11 Sentinel-2 bands, as in the original paper with
12 output classes, and combine the predictions into a binary scheme by con-
sidering marine debris as the positive class and treat all other 11 non-debris
classes as other. In the results section, we denote these two models as Unet
and rf and indicate that they have been trained on the “original data” of
their respective papers.
We also train the random forest on the combined training dataset de-
scribed in Section 3.1, which we denote as “trained on our dataset”. For the
random forest, we use an identical feature extraction pipeline as described
in Kikaki et al. (2022), which results in 26 features containing the original
spectral bands, spectral indices, and textural features. As the random forest
is a pixel-wise classiﬁer, we treat each pixel separately and create a roughly
balanced training pixel dataset set from our image training dataset. We se-
lect ﬁve positive pixels (annotated as marine debris) and ﬁve negative other
pixels from each image. This results in a 70 000 training pixels. As for to the
other comparison approaches, we tune the classiﬁcation threshold based on
the validation dataset, which is 0.663.
3.5. Evaluation Metrics
We compare all models trained on “original data” and “our dataset” on
several metrics on the evaluation sets of Durban, Accra, and the MARIDA
•We include the overall accuracy ratio of correct classiﬁcations to
total samples. It is straightforward to interpret, but susceptible to
class imbalance. Our selected validation and evaluation sets, however,
have a general balance between positive and negative samples.
•f-score is the harmonic mean between precision and recall that, in
contrast to individual precision and recall scores, is more robust to the
choice of the classiﬁcation threshold.
•The area under the receiver operator curve (auroc) is a metric that
is independent of the classiﬁcation thresholds but easily saturates for
relatively accurate classiﬁers with values close to 1.
•The jaccard index, also known as intersection over union, is com-
monly used for object detection and measures the number of intersec-
tions of two sets (predictions and ground truth) divided by their union.
•The kappa statistic compares two classiﬁers: the model and a ran-
domly guessing baseline. Values of zero indicate that the tested model
is not better than a random baseline, while positive correlations indi-
cate that the tested model outperforms the trivial baseline.
Higher values are better for all metrics, and values of 1 indicate a perfect
We ﬁrst compare the models quantitatively and qualitatively in Sec-
tion 4.1. We then predict one entire Sentinel-2 scene (Durban) in Section 4.2
and quantify the false positive predictions on both bottom-of-atmosphere and
top-of-atmosphere Sentinel-2 imagery. In the ﬁnal experiment Section 4.3, we
test how a re-trained 4-channel detector can predict marine debris on higher-
resolution PlanetScope imagery, which can complement Sentinel-2 imagery
4.1. Numerical Comparisons
Table 1 shows the quantitative results of rf and Unet models trained
on the respective original data in comparison to rf,unet, and unet++
trained with our training setting on the combined training dataset and re-
ﬁnement strategies described in Section 3.1. We see that models trained in
our combined training framework achieve the best accuracy metrics in all ex-
periments including those where the label reﬁnement is not used (column “no-
ref”). As expected, the deep learning-based UNet and the Unet++ models
outperform the pixel-wise random forest classiﬁer. This is likely due to the
advantage of convolutional neural networks to learn spatial patterns within
their convolutional perceptive ﬁeld. Both Unet and Unet++ achieve equal
accuracies within one standard deviation on the Marida test set, while the
Unet++ achieves a better accuracy on the Durban and Accra scenes. The
label reﬁnement module also improves the Unet++ performance on Marida-
test and Durban. However, on Accra, the best scores are achieved with a
Unet++ model without reﬁnement module (indicated by “no-ref”). For
trained on original data our train set
RF U NE T RF UNET UNET++ UNET+ + no-ref
ACC URACY 0.653 0.882 0.680 0.924 ±0.016 0.930 ±0.016 0.948 ±0.008
F-SCORE 0.464 0.871 0.545 0.920 ±0.018 0.926 ±0.018 0.948 ±0.008
AUROC 0.246 0.965 0.899 0.978 ±0.008 0.981 ±0.006 0.989 ±0.005
JAC CAR D 0.302 0.772 0.374 0.852 ±0.030 0.862 ±0.031 0.900 ±0.014
KA PPA 0.301 0.764 0.357 0.848 ±0.031 0.859 ±0.031 0.897 ±0.017
trained on original data our train set
RF UNET RF UNE T UNE T++ UNET ++ no-ref
ACC URACY 0.781 0.587 0.811 0.908 ±0.010 0.934 ±0.018 0.905 ±0.011
F-SCORE 0.105 0.497 0.708 0.756 ±0.032 0.837 ±0.053 0.776 ±0.026
AUROC 0.376 0.765 0.862 0.850 ±0.030 0.914 ±0.018 0.886 ±0.053
JAC CAR D 0.055 0.330 0.548 0.609 ±0.042 0.722 ±0.048 0.635 ±0.034
KA PPA 0.082 0.245 0.569 0.704 ±0.037 0.797 ±0.063 0.717 ±0.031
trained on original data our train set
RF UNET RF UNE T UNE T++ UNET ++ no-ref
ACC URACY 0.697 0.838 0.811 0.865 ±0.006 0.867 ±0.005 0.851 ±0.006
F-SCORE 0.288 0.701 0.708 0.741 ±0.012 0.749 ±0.009 0.710 ±0.015
AUROC 0.488 0.764 0.862 0.738 ±0.012 0.746 ±0.021 0.733 ±0.006
JAC CAR D 0.168 0.539 0.548 0.589 ±0.015 0.598 ±0.012 0.551 ±0.018
KA PPA 0.197 0.593 0.569 0.654 ±0.016 0.661 ±0.012 0.615 ±0.017
20 40 60 80 100
Table 1: Quantitative comparison of models trained on original data (rf (Kikaki et al.,
2022), Unet (Mifdal et al., 2021)), versus models trained on the training data compiled in
this work. We also test a Unet++ model without label reﬁnement module, indicated by
the “no-ref” suﬃx in the last column. The bottom plot shows the validation loss during
training of three Unet++ and Unet models, each. The Unet++ ﬁnds an optimum
earlier and has less variance (shown in 1σstandard deviation) between the models in the
early states of training.
input (12-bands) target model predictions
our training data FlObs-only
RGB FDI label UN ET ++ no-ref R F UNE T
Figure 6: Qualitative predictions of the three models on images covering each 2.56 km
by 2.56 km from the Accra and Durban sets. Our Unet++ produces marine debris
predictions similar to the hand annotations (target/label) with the fewest false positives.
An interactive qualitative comparison is available under https://marcrusswurm.users.
the remaining paper, we use the Unet++ model in the Marine Debris De-
tector, as it has fewer parameters and ﬁnds an optimum earlier and more
consistently between random seeds (1σstandard deviation shown) than the
Unet in the training process, as shown in the bottom plot of Table 1.
Figure 6 compares models qualitatively on selected 256 px ×256 px each
patches covering 2.56 km by 2.56 km. The tiles are from the Accra and Dur-
ban evaluation scenes, where it is highly plausible that plastic pollution is
present in marine debris. We compare the Unet++ model with and without
label reﬁnement, the random forest rf with features of (Kikaki et al., 2022),
trained on our dataset, and the Unet from Mifdal et al. (2021) trained on
the original FloatingObjects (FlObs) dataset only. The ﬁrst two columns
show RGB and FDI representations of the multi-spectral Sentinel-2 scenes.
The third column shows hand-annotated masks (shown in red). We generally
see the quantitative results mirrored in these qualitative examples, where the
deep learning model trained on our combined training set produces the most
truthful masks of ﬂoating marine debris. While none of the models captured
the hand annotations perfectly, the Unet++ produced the visually most
accurate predictions with the fewest false positives across most evaluation
scenes. The Unet++ without label reﬁnement (indicated by “no-ref”) pro-
vides generally thinner predictions than the Unet++ with reﬁnement mod-
ule, which we connect to the reﬁnement module always enlarging the target
mask of marine debris to some degree during training. In Accra-1, Unet++
and Unet (Mifdal et al., 2021) capture the general location of the objects,
while the random forest rf (Kikaki et al., 2022) detected natural waves along
the entire coastline as marine debris. The Unet++ without reﬁnement mod-
ule appears to merge multiple patches of debris here and does not accurately
capture the individual objects. Accra-2 shows several sargassum patches in
between ships. Generally, all models predict these patches well, while still
some ships are confused with marine debris. The Durban scenes are more
challenging and show more atmospheric perturbations through clouds and
haze. The Unet++ predicts the general locations of the annotated marine
debris well until the cloud coverage is too dense, as seen in Durban-3. The
original Unet (Mifdal et al., 2021) predicts a large number of false positives,
which was also stated as a limitation in their original work. The random
forest rf of Kikaki et al. (2022) tends to under-predict the marine debris in
all three Durban scenes and only identiﬁes a few individual ﬂoating object
patches in Durban-1.
Finally, we compare diﬀerent Unet++ models trained on diﬀerent ini-
tialization seeds, with and without label reﬁnement on images of the Plastic
Litter Projects 2021 (Fig. 7). Most models capture the general location of
the deployed targets on all scenes. However, some models (seed 3; no la-
bel reﬁnement and seed 2 with label reﬁnement) confuse the coastline and
some water areas for marine debris. Seed 1 with label reﬁnement appears to
miss the deployed targets on June 21st and July 1st, similarly to the model
trained on seed 2 with label reﬁnement on July 1st. Similarly to the previous
result, models trained with reﬁned labels predict larger but also less deﬁned
patches compared to models trained without. This experiment demonstrates
the challenges associated with detecting individual objects that span only
few pixels. However, we would like to highlight that these deployed targets
are not representative of the marine debris seen in open waters, on which the
RGB FDI with label reﬁnement no label reﬁnement
seed 1 seed 2 seed 3 seed 1 seed 2 seed 3
Figure 7: Classiﬁcation probabilities for Sentinel-2 scenes of deployed targets in during the
Plastic Litter Projects 2021 (Topouzelis et al., 2019). All models assign higher probabilities
to the deployed targets. Still, only few models detect both targets. Other pixels, such
as coastlines, are sometimes assigned a higher marine debris probability. Models trained
with the label reﬁnement module tend to predict larger patches with less spatial detail.
models have been trained on. These objects typically form long lines rather
than round shapes, and we believe that the diﬀerence in geometrical shape,
rather than spectral appearance, is a major feature that the deep learning
models use for their predictions.
4.2. Role of Atmospheric Correction
In this experiment, we follow a realistic deployment scenario and pre-
dict the entire Durban scene of 3122 px ×3843px with the Unet++ model
in overlapping 480px ×480 px patches. We then consider pixels predicted
with a probability higher than the prediction threshold and treat each local
maximum as a marine debris detection. We set a minimum distance of 3 px
between local maxima to avoid marine debris detections being too close to
each other. Furthermore, we compare predictions of the same model using ei-
ther a top-of-atmosphere (TOA) Sentinel-2 scene on a bottom-of-atmosphere
(BOA) atmospherically corrected Sentinel-2 scene, to assess the eﬀect of at-
mospheric correction on the model predictions.
We show both images alongside the locations of detections (scatter points)
in Figs. 8a and 8b, respectively. The red scatter points indicate correctly de-
(a) TOA: top-of-atmosphere Sentinel-2 scene (b) BOA: bottom-of-atmosphere Sentinel-2 scene
debris t. hz. d. hz. clouds ships land coast water
(c) number and confusions of detections. Classes evaluated are marine debris (the correct class) and con-
fusions with transparent haze (t.hz.), dense haze (d.hz), cummulus clouds (clouds), ships ,land,coastline
(coast), and water.
Figure 8: Analysis of confusions of detections in atmospherically corrected bottom-of-
atmosphere (BOA) and not correction top-of-atmosphere (TOA) Sentinel-2 imagery of
the Durban scene. In (a) and (b) panels, detections are colored according to the classes
of panel (c).
tected marine debris. Points of other colors indicate false-positives with other
classes transparent haze (t.hz.), dense haze (d.hz), cumulus clouds (clouds),
ships,land,coastline (coast), and water, alongside marine debris (debris).
Figure 8c further shows a quantitative summary of the confusion between
classes. We generally see a comparable number of marine debris detected
at both BOA (136 detections) and TOA (164 detections) processing levels.
This shows that the classiﬁer is sensitive to marine debris in both top-of-
atmosphere and bottom-of-atmosphere satellite imagery. However, predic-
tions based on top-of-atmosphere data had more false positive predictions
leading to a lower precision. This is especially visible in the transparent-
(t.hz.) and dense haze (d.hz) categories as well, as in water, as shown in
the bar plot of Fig. 8c. Overall and not shown in the ﬁgure: 609 objects
were detected in the bottom-of-atmosphere (BOA) scene, and 1484 objects
as marine debris in the top-of-atmosphere scene. For comparison, the Unet
trained only on the FloatingObjects dataset of (Mifdal et al., 2021) detected
20 830 objects in the BOA scene and 33 665 at TOA processing level, which is
more than one order of magnitude more false positive predictions compared
to the Unet++ shown in Fig. 8. This demonstrates ever more the impor-
tance of compiling larger and more precise training datasets with a rich pool
of negative examples that account for objects easily confused with marine
debris. It demonstrates the current limitations and general diﬃculty of de-
tecting marine debris automatically on Sentinel-2 imagery with the current
technology. The extreme imbalance between a very low number of marine de-
bris pixels (if any) and everything else visible in the Sentinel-2 scene poses a
severe challenge to the automated detection of marine debris. Overall in this
experiment, only 6448 of 11 997 846 pixels were annotated as marine debris,
which represents coverage of only 0.05%. In this circumstance, identifying
less than potential 1000 objects in a 31 km by 38 km is an achievement and
allows to validate these detections visibly with limited manual eﬀort in prac-
tice. This work can be further reduced by additional targeted post-processing
by masking clouds, land, and shoreline explicitly, which we consider outside
of the scope of this work.
4.3. Transferability to PlanetScope Resolution
In this ﬁnal experiment, we test how well the Unet++ model trained on
Sentinel-2 imagery can predict on PlanetScope without being ﬁne-tuned on
PlanetScope imagery speciﬁcally. For this experiment, we had to downsam-
ple the PlanetScope imagery from 3 m to 5 m as the resolution gap between
(a) Double-acquisition of Sentinel-2 and PlanetScope of sar-
gassum patches in Accra (2018-10-30) with 4 minutes 32-
(b) Daily PlanetScope imagery ﬁlls the observation gaps of Sentinel-2 (every 5 days) for the Plastic
Litter Project (Island of Lesbos, Greece) where marine plastic and wooden targets were deployed in 2022
Topouzelis et al. (2019)
Figure 9: A four-channel RGB+NIR model trained on Sentinel-2 imagery can classify
marine debris in 5 m ×5 m downsampled Planetscope images, while being trained on 4-
channel Sentinel-2 imagery. We showcase two use cases. In a) a simultaneous acquisition
of S2 and PS in Accra show the drift direction of Sargassum patches. In b) PlanetScope
images augment S2 observations in the Plastic Litter Project.
trained 10 m resolution and full 3 m PlanetScope imagery was too large. On
the original resolution, the model created artifacts in the predictions, which
disappeared at downsampled 5 m PlanetScope imagery. For the Sentinel-
2 image, we use the same model with 12 input channels as in the previ-
ous experiments. For the 4-channel PlanetScope imagery, we re-trained the
Unet++ model on the identical Sentinel-2 training data but removed all
spectral bands except B2, B3, B4, and B8 for RGB+NIR. This 4-channel
model achieves a slightly worse validation accuracy (0.01-0.03 in f-score)
than the 12-channel model. This slight decrease in accuracy also indicates
that the four high-resolution 10 m bands are the most informative for ma-
rine debris detection, which is reasonable given the small size of debris and
previous literature (Biermann et al. (2020)).
We consider two use cases in Fig. 9, where PlanetScope imagery comple-
•First, double acquisitions of Sentinel-2 and PlanetScope during the
same day can be used to determine the debris’s short-term surface drift
direction. It shows one PlanetScope with a corresponding Sentinel-2
image over Accra, Ghana, on 30th of October 2018, with four minutes
and 32 seconds time diﬀerence. Both models detected marine debris,
as visible in the probability map.
•Second, daily PlanetScope imagery can be used to gap-ﬁll the periods
in which the weekly Sentinel-2 imagery is unavailable. This is demon-
strated in Fig. 9b, where the deployed targets from the Plastic Litter
Project 2022 are predicted from Sentinel-2 and PlanetScope imagery
with the Unet++ model. The Sentinel-2 images are available only
on July 16th and 21st. Daily PlanetScope imagery can ﬁll this tempo-
ral gap and enable continuous monitoring of the deployed targets at
a higher spatial-, but lower spectral resolution. We can see that the
4-channel model successfully predicts marine debris for the rectangu-
lar 5 m ×5 m inﬂatable PVC target deployed during the Plastic Litter
Project. The two circular (7 m diameter) HDPE-mesh targets are not
Thanks to these two examples, we emphasize that the UNet++ model
in our Marine Debris Detector trained on Sentinel-2 imagery worked with
PlanetScope images without explicitly having seen annotated PlanetScope
imagery. This highlights the broader applicability of the Unet++ model on
both satellite modality and the synergy between PlanetScope and Sentinel-2
satellite constellations for marine debris detection.
This work presented and evaluated a training strategy including a dataset,
targeted negative sampling and a segmentation model to automatically iden-
tify marine debris of human or natural origins with readily available Sentinel-
2 imagery. Our main contribution is the aggregation and harmonization of all
annotated Sentinel-2 data for marine debris detection available today. We de-
signed a sampling rule to gather a large number of diverse negative examples
and a reﬁnement module to automatically improve hand-annotations present
in current datasets, which yields a combined training dataset in which deep
learning models achieve the best results across diﬀerent model architectures.
The model performances were compared quantitatively and qualitatively on
evaluation scenes where the visible marine debris in these scenes is highly
likely to contain plastic pollutants. The performance improvements observed
are consistent across datasets and model settings. They highlight the impor-
tance of designing good datasets for the tasks at hand and prove the necessity
to collect, aggregate and further reﬁne globally distributed datasets of marine
debris in future research.
Role of atmospheric correction. Atmospheric correction with Sen2Cor
has proven beneﬁcial in reducing the number of false positive examples and
improving precision. Still, the detector remained sensitive to marine de-
bris also with top-of-atmosphere data, which highlights the the sensitivity
of the model to marine debris. We believe that reliably detecting marine
debris from available satellite data is within reach with more annotation and
targeted post-processing, such as automatic masking of clouds, land, and
shoreline, which we considered beyond the scope of this work. In this work,
we trained the detector with Sentinel-2 images of both top-of-atmosphere
(L1C-level) and bottom-of-atmosphere (L2A-level with the Sen2Cor algo-
rithm) to ensure that the ﬁnal model is capable of detecting marine debris
from Sentinel-2 imagery at diﬀerent processing levels. However, further at-
mospheric correction speciﬁc for coastal and aquatic environments, as with
the ACOLITE algorithm (Vanhellemont and Ruddick, 2016), is likely to im-
prove the detection accuracy further.
Marine debris as proxy for marine litter. The detection of marine
debris remains a proxy objective targeted toward the long-term goal of en-
abling continuous monitoring of marine litter including plastics and other
anthropogenic pollutants from medium-resolution satellite data. Here, auto-
matically establishing the link between detected marine debris and marine
pollution is a key question to be addressed in the future. Similar to related
work (Biermann et al., 2020), we analyzed social media (Durban scene) and
in-situ studies (Accra scene) on a case-by-case basis to deduce that marine
plastics are present in marine debris visible in the satellite scenes. Automat-
ing this connection remains a challenge that may require integrating in-situ
knowledge (citizen science, or river monitoring) or a targeted acquisition and
analysis of high-resolution imagery. Studies (C´ozar et al., 2021; Ruiz et al.,
2020) have demonstrated that plastics are present in marine debris by on-
site ship-based collection. This establishes that marine debris detection is a
suitable, yet rough, proxy for plastic pollution mapping. Ongoing research
(Hu, 2021; Hu et al., 2022; Ciappa, 2021) in this ﬁeld demonstrates that
distinguishing anthropogenic marine litter from natural types of debris using
only features is possible, but remains challenging and is largely unsolved to-
day. Our work concentrated on the prior step of automating the detection of
generic marine debris at a large scale largely based on their geometric shape,
which can be seen as a ﬁrst step preceeding the aforementioned litter types
Relevance for of Algae and Sargassum Detection. While the eval-
uation datasets in our work aimed to measure the detector’s sensitivity to
marine litter, we see that the model is also sensitive to detections of ﬂoating
algae patches and sargassum. This sensitivity is inherently connected to the
annotations in the training dataset that were made by visually inspecting
the Floating Debris Index (Biermann et al., 2020) that is derived from the
Floating Algae Index (Hu, 2009). Hence, exploration and modiﬁcation of
the training framework presented in this work and initialization from model
weights and ﬁne-tuning towards detecting patches of algae and sargassum
would be an interesting follow-up work in an active research ﬁeld (Wang and
Hu, 2021; Cuevas et al., 2018).
Transfer to other satellite products. The synergy of Sentinel-2 with
daily available PlanetScope (or other high-resolution imagery) is particularly
suitable for further analysis of detected debris and establishing a connection
to marine litter. Large-scale monitoring with commercial high-resolution im-
agery may be infeasible due to the high image acquisition costs. However,
selecting a few images with PlanetScope in locations where a Sentinel-2 detec-
tor has identiﬁed potential marine debris appears feasible. We explored this
transferability in Section 4.3 where a model trained on 4-channel Sentinel-2
imagery was still sensitive to marine debris in (downsampled) planet scope
data. Targeted model training on annotated PlanetScope data will likely
improve this performance further, which we leave for future work.
Spatial and spectral features. A further direction to be explored is the
heterogeneous composition of objects in marine debris, which varies depend-
ing on circumstances (e.g., Flood event in Durban) or the general pollution
of the area (Accra scene). This heterogeneity in spectral response further
emphasizes the importance and descriptiveness of the shape and geometry
in marine debris, which often form elongated lines due to oceanic processes,
such as windrows and waterfronts. Further, the geometry of objects is also
a suitable descriptor to exclude a variety of negatives, such as ships, clouds,
coastline, and wakes, that can have similar spectral responses (e.g., a high
FDI index) to marine debris but are distinguishing from marine debris by
spatial context. In particular, convolutional neural networks are suitable
to learn these patterns in their ﬁlter banks if they are trained with large
annotated datasets with a diverse set of negative examples.
Remote sensing combined with current machine learning frameworks has
the potential to become an eﬃcient and reliable tool to monitor large marine
areas (Hanke et al., 2013). Still, the data quality used to learn detection
models is paramount. We are conﬁdent that automated detection of marine
debris with satellite remote sensing imagery will provide a repeatable low-
cost technology to detect and quantify the level of marine pollution on our
planet. Automated detection and quantiﬁcation will be necessary to inform
clean-up operations and measure local policy decisions’ eﬀect. Identifying
and quantifying pollution hotspots and addressing the drivers and sources are
crucial to create a cleaner environment to plant, animal, and human life in a
sustainable future. Still, further eﬀorts are needed in data collection and on-
site validation to build models that can reliably estimate the level of marine
pollution from readily available satellite data in a completely automated
way. In this research, we made a step toward automated satellite-based
monitoring of marine pollution via detecting marine debris in coastal waters
and providing model weights and training scripts in a dedicated package 2.
2The source code and data: https://github.com/MarcCoru/marinedebrisdetector
We hope this work helps accelerate the progress toward large-scale marine
litter monitoring within the canon of trans-disciplinary machine learning,
remote sensing, and marine science research.
Andrady, A.L., 2011. Microplastics in the marine environment. Marine
pollution bulletin 62, 1596–1605.
Arias, M., Sumerot, R., Delaney, J., Coulibaly, F., Cozar, A., Aliani, S.,
Suaria, G., Papadopoulou, T., Corradi, P., 2021. Advances on remote
sensing of windrows as proxies for marine litter based on Sentinel-2/MSI
datasets, in: 2021 IEEE International Geoscience and Remote Sensing
Symposium IGARSS, IEEE. pp. 1126–1129.
Beaumont, N.J., Aanesen, M., Austen, M.C., B¨orger, T., Clark, J.R., Cole,
M., Hooper, T., Lindeque, P.K., Pascoe, C., Wyles, K.J., 2019. Global
ecological, social and economic impacts of marine plastic. Marine pollution
bulletin 142, 189–195.
Bessa, F., Ratcliﬀe, N., Otero, V., Sobral, P., Marques, J.C., Waluda, C.M.,
Trathan, P.N., Xavier, J.C., 2019. Microplastics in gentoo penguins from
the antarctic region. Scientiﬁc reports 9, 1–7.
Biermann, L., Clewley, D., Martinez-Vicente, V., Topouzelis, K., 2020. Find-
ing plastic patches in coastal waters using optical satellite data. Scientiﬁc
reports 10, 1–10.
Booth, H., Ma, W., Karakus, O., 2022. High-precision density mapping of
marine debris and ﬂoating plastics via satellite imagery. arXiv preprint
Borrelle, S.B., Ringma, J., Law, K.L., Monnahan, C.C., Lebreton, L., Mc-
Givern, A., Murphy, E., Jambeck, J., Leonard, G.H., Hilleary, M.A., Erik-
sen, M., Possingham, P.H., De Frond, H., Gerber, L.R., Polidoro, B., Tahir,
A., Bernard, M., Mallos, N., Barnes, M., Rochmal, C.M., 2020. Predicted
growth in plastic waste exceeds eﬀorts to mitigate plastic pollution. Science
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 1984. Classiﬁcation
and regression trees. Routledge.
Chapron, L., Peru, E., Engler, A., Ghiglione, J., Meistertzheim, A., Pruski,
A., Purser, A., V´etion, G., Galand, P., Lartaud, F., 2018. Macro-and
microplastics aﬀect cold-water corals growth, feeding and behaviour. Sci-
entiﬁc reports 8, 1–8.
Ciappa, A.C., 2021. Marine plastic litter detection oﬀshore hawai’i by
sentinel-2. Marine Pollution Bulletin 168, 112457. URL: https://
Ciappa, A.C., 2022. Marine litter detection by sentinel-2: A case study in
north adriatic (summer 2020). Remote Sensing 14. URL: https://www.
Ciocarlan, A., Stoian, A., 2021. Ship detection in sentinel 2 multi-spectral
images with self-supervised learning. Remote Sensing 13, 4255.
Cuevas, E., Uribe-Mart´ınez, A., Liceaga-Correa, M.d.l.´
A., 2018. A satellite
remote-sensing multi-index approach to discriminate pelagic sargassum in
the waters of the yucatan peninsula, mexico. International Journal of
Remote Sensing 39, 3608–3627.
C´ozar, A., Aliani, S., Basurko, O.C., Arias, M., Isobe, A., Topouzelis, K.,
Rubio, A., Morales-Caselles, C., 2021. Marine litter windrows: A strategic
target to understand and manage the ocean plastic pollution. Frontiers in
Marine Science 8. doi:10.3389/fmars.2021.571796.
Davaasuren, N., Marino, A., Boardman, C., Alparone, M., Nunziata, F., Ack-
ermann, N., Hajnsek, I., 2018. Detecting microplastics pollution in world
oceans using sar remote sensing, in: 2018 IEEE International Geoscience
and Remote Sensing Symposium IGARSS, pp. 938–941.
van Emmerik, T., Schwarz, A., 2020. Plastic debris in rivers. Wiley Inter-
disciplinary Reviews: Water 7, e1398.
Eriksen, M., Lebreton, L.C., Carson, H.S., Thiel, M., Moore, C.J., Borerro,
J.C., Galgani, F., Ryan, P.G., Reisser, J., 2014. Plastic pollution in the
world’s oceans: more than 5 trillion plastic pieces weighing over 250,000
tons aﬂoat at sea. PloS one 9, e111913.
Escobar-S´anchez, G., Markfort, G., Berghald, M., Ritzenhofen, L., Sch-
ernewski, G., 2022. Aerial and underwater drones for marine litter mon-
itoring in shallow coastal waters: factors inﬂuencing item detection and
cost-eﬃciency. Environmental monitoring and assessment 194, 1–28.
Faure, F., Corbaz, M., Baecher, H., de Alencastro, L.F., 2012. Pollution due
to plastics and microplastics in lake geneva and in the mediterranean sea.
Archives de Science 65, 157–164.
Goddijn-Murphy, L., Williamson, B.J., McIlvenny, J., Corradi, P., 2022.
Using a uav thermal infrared camera for monitoring ﬂoating marine plastic
litter. Remote Sensing 14, 3179.
A.S., Scandolo, L., Eisemann, E., 2022. A learning approach for river
debris detection. International Journal of Applied Earth Observation and
Geoinformation 107, 102682.
Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., Moore,
R., 2017. Google earth engine: Planetary-scale geospatial analysis for
everyone. Remote Sensing of Environment 202, 18–27.
Grady, L., 2006. Random walks for image segmentation. IEEE transactions
on pattern analysis and machine intelligence 28, 1768–1783.
Hanke, G., Galgani, F., Werner, S., Oosterbaan, L., Nilsson, P., Fleet, D.,
Kinsey, S., Thompson, R., Palatinus, A., Van Franeker, J., et al., 2013.
Guidance on monitoring of marine litter in european seas: a guidance doc-
ument within the common implementation strategy for the marine strategy
framework directive. .
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image
recognition, in: Proceedings of the IEEE conference on computer vision
and pattern recognition, pp. 770–778.
Hidalgo-Ruz, V., Thiel, M., 2015. The contribution of citizen scientists to
the monitoring of marine litter. Marine Anthropogenic Litter 16, 429–447.
Hu, C., 2009. A novel ocean color index to detect ﬂoating algae in the global
oceans. Remote Sensing of Environment 113, 2118–2129.
Hu, C., 2021. Remote detection of marine debris using satel-
lite observations in the visible and near infrared spectral range:
Challenges and potentials. Remote Sensing of Environment 259,
112414. URL: https://www.sciencedirect.com/science/article/
Hu, C., 2022. Remote detection of marine debris using sentinel-2 imagery: A
cautious note on spectral interpretations. Marine Pollution Bulletin 183,
114082. URL: https://www.sciencedirect.com/science/article/
Hu, C., Qi, L., Xie, Y., Zhang, S., Barnes, B.B., 2022. Spectral characteristics
of sea snot reﬂectance observed from satellites: Implications for remote
sensing of marine debris. Remote Sensing of Environment 269, 112842.
Kershaw, P., Turra, A., Galgani, F., et al., 2019. Guidelines for
the monitoring and assessment of plastic litter and microplas-
tics in the ocean. URL: http://www.gesamp.org/publications/
Kikaki, K., Kakogeorgiou, I., Mikeli, P., Raitsos, D.E., Karantzalos, K., 2022.
Marida: A benchmark for marine debris detection from sentinel-2 remote
sensing data. PloS one 17, e0262247.
Kuebler, M., 2021. Turkey’s ’sea snot’ is part of a growing environmental
threat. URL: https://p.dw.com/p/3uZSb.
Main-Knorn, M., Pﬂug, B., Louis, J., Debaecker, V., M¨uller-Wilm, U., Gas-
con, F., 2017. Sen2cor for sentinel-2, in: Image and Signal Processing for
Remote Sensing XXIII, SPIE. pp. 37–48.
Mifdal, J., Long´ep´e, N., Rußwurm, M., 2021. Towards detecting ﬂoat-
ing objects on a global scale with learned spatial features using sen-
tinel 2. ISPRS Annals of the Photogrammetry, Remote Sensing
and Spatial Information Sciences V-3-2021, 285–293. doi:10.5194/
Otsu, N., 1979. A threshold selection method from gray-level histograms.
IEEE transactions on systems, man, and cybernetics 9, 62–66.
Papageorgiou, D., Topouzelis, K., Suaria, G., Aliani, S., Corradi, P., 2022.
Sentinel-2 detection of ﬂoating marine litter targets with partial spectral
unmixing and spectral comparison with other ﬂoating materials (plastic
litter project 2021) Under review.
Pinto, R., Barendse, T., van Emmerik, T., van der Ploeg, M., Annor, F.,
Duah, K., Udo, J., Uijlenhoet, R., 2023. Exploring plastic transport dy-
namics in the odaw river, ghana. Frontiers in Environmental Science 11.
Politikos, D.V., Adamopoulou, A., Petasis, G., Galgani, F., 2023. Using
artiﬁcial intelligence to support marine macrolitter research: A content
analysis and an online database. Ocean & Coastal Management 233,
106466. URL: https://www.sciencedirect.com/science/article/
Rees, G., Pond, K., 1995. Marine litter monitoring programmes—a review
of methods with special reference to national surveys. Marine Pollution
Bulletin 30, 103–108.
Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks
for biomedical image segmentation, in: International Conference on Med-
ical Image Computing and Computer-Assisted Intervention, pp. 234–241.
Ruiz, I., Basurko, O.C., Rubio, A., Delpey, M., Granado, I., Declerck, A.,
Mader, J., C´ozar, A., 2020. Litter windrows in the south-east coast of the
Bay of Biscay: an ocean process enabling eﬀective active ﬁshing for litter.
Frontiers in marine science 7, 308.
Salgado-Hernanz, P.M., Bauz`a, J., Alomar, C., Compa, M., Romero, L.,
Deudero, S., 2021. Assessment of marine litter through remote sensing:
recent approaches and future goals. Marine Pollution Bulletin 168, 112347.
Schwabl, P., K¨oppel, S., K¨onigshofer, P., Bucsics, T., Trauner, M., Reiberger,
T., Liebmann, B., 2019. Detection of various microplastics in human stool:
a prospective case series. Annals of Internal Medicine 171, 453–457.
Shah, A., Lillianne, T., Manil, M., 2021. Marine debris dataset for object de-
tection in planetscope imagery. URL: https://doi.org/10.34911/rdnt.
Sun, Y., Bakker, T., Ruf, C., Pan, Y., 2023. Eﬀects of microplastics and
surfactants on surface roughness of water waves. Scientiﬁc Reports 13,
Themistocleous, K., Papoutsa, C., Michaelides, S., Hadjimitsis, D., 2020.
Investigating detection of ﬂoating plastic litter from space using sentinel-2
imagery. Remote Sensing 12, 2648.
Topouzelis, K., Papageorgiou, D., Karagaitanakis, A., Papakonstantinou, A.,
Arias Ballesteros, M., 2020a. Remote sensing of sea surface artiﬁcial ﬂoat-
ing plastic targets with sentinel-2 and unmanned aerial systems (Plastic
Litter Project 2019). Remote Sensing 12, 2013.
Topouzelis, K., Papageorgiou, D., Karagaitanakis, A., Papakonstantinou, A.,
Arias Ballesteros, M., 2020b. Remote sensing of sea surface artiﬁcial ﬂoat-
ing plastic targets with sentinel-2 and unmanned aerial systems (plastic
litter project 2019). Remote Sensing 12, 2013.
Topouzelis, K., Papageorgiou, D., Suaria, G., Aliani, S., 2021. Float-
ing marine litter detection algorithms and techniques using optical
remote sensing data: A review. Marine Pollution Bulletin 170,
112675. URL: https://www.sciencedirect.com/science/article/
Topouzelis, K., Papakonstantinou, A., Garaba, S.P., 2019. Detection of
ﬂoating plastics from satellite and unmanned aerial systems (plastic lit-
ter project 2018). International Journal of Applied Earth Observation and
Geoinformation 79, 175–183.
United Nations Environment Programme, 2009. United nations environment
programme - annual report 2009: Seizing the green opportunity. URL:
Van Cauwenberghe, L., Vanreusel, A., Mees, J., Janssen, C.R., 2013. Mi-
croplastic pollution in deep-sea sediments. Environmental Pollution 182,
Van Dyck, I.P., Nunoo, F.K., Lawson, E.T., 2016. An empirical assessment
of marine debris, seawater quality and littering in Ghana. Journal of Geo-
science and Environment Protection 4, 21–36.
Van Emmerik, T., Tramoy, R., Van Calcar, C., Alligant, S., Treilles, R.,
Tassin, B., Gasperi, J., 2019. Seine plastic debris transport tenfolded
during increased river discharge. Frontiers in Marine Science 6, 642.
Vanhellemont, Q., Ruddick, K., 2016. Acolite for sentinel-2: Aquatic appli-
cations of msi imagery, in: Proceedings of the 2016 ESA Living Planet
Symposium, Prague, Czech Republic, pp. 9–13.
Vanhellemont, Q., Ruddick, K., 2018. Atmospheric correction of metre-scale
optical satellite data for inland and coastal water applications. Remote
Sensing of Environment 216, 586–597.
Wang, M., Hu, C., 2021. Satellite remote sensing of pelagic sargassum
macroalgae: The power of high resolution and deep learning. Remote
Sensing of Environment 264, 112631.
Whang, S.E., Roh, Y., Song, H., Lee, J.G., 2023. Data collection and quality
challenges in deep learning: A data-centric ai perspective. The VLDB
Journal , 1–23.
Wolf, M., van den Berg, K., Garaba, S.P., Gnann, N., Sattler, K., Stahl, F.,
Zielinski, O., 2020. Machine learning for aquatic plastic litter detection,
classiﬁcation and quantiﬁcation (aplastic-q). Environmental Research Let-
ters 15, 114042.
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J., 2018. Unet++:
A nested U-Net architecture for medical image segmentation, in: Deep
learning in medical image analysis and multimodal learning for clinical
decision support. Springer, pp. 3–11.
Zhu, Y., Newsam, S., 2017. Densenet for dense ﬂow, in: 2017 IEEE Interna-
tional Conference on Image Processing (ICIP), IEEE. pp. 790–794.