ArticlePDF Available

Multi-Scale Approach for Predicting Fish Species Distributions across Coral Reef Seascapes


Abstract and Figures

Two of the major limitations to effective management of coral reef ecosystems are a lack of information on the spatial distribution of marine species and a paucity of data on the interacting environmental variables that drive distributional patterns. Advances in marine remote sensing, together with the novel integration of landscape ecology and advanced niche modelling techniques provide an unprecedented opportunity to reliably model and map marine species distributions across many kilometres of coral reef ecosystems. We developed a multi-scale approach using three-dimensional seafloor morphology and across-shelf location to predict spatial distributions for five common Caribbean fish species. Seascape topography was quantified from high resolution bathymetry at five spatial scales (5-300 m radii) surrounding fish survey sites. Model performance and map accuracy was assessed for two high performing machine-learning algorithms: Boosted Regression Trees (BRT) and Maximum Entropy Species Distribution Modelling (MaxEnt). The three most important predictors were geographical location across the shelf, followed by a measure of topographic complexity. Predictor contribution differed among species, yet rarely changed across spatial scales. BRT provided 'outstanding' model predictions (AUC = >0.9) for three of five fish species. MaxEnt provided 'outstanding' model predictions for two of five species, with the remaining three models considered 'excellent' (AUC = 0.8-0.9). In contrast, MaxEnt spatial predictions were markedly more accurate (92% map accuracy) than BRT (68% map accuracy). We demonstrate that reliable spatial predictions for a range of key fish species can be achieved by modelling the interaction between the geographical location across the shelf and the topographic heterogeneity of seafloor structure. This multi-scale, analytic approach is an important new cost-effective tool to accurately delineate essential fish habitat and support conservation prioritization in marine protected area design, zoning in marine spatial planning, and ecosystem-based fisheries management.
Content may be subject to copyright.
Multi-Scale Approach for Predicting Fish Species
Distributions across Coral Reef Seascapes
Simon J. Pittman
*, Kerry A. Brown
1Biogeography Branch, Center for Coastal Monitoring and Assessment, National Oceanic and Atmospheric Administration (NOAA), Silver Spring, Maryland, United States
of America, 2Marine Science Center, University of the Virgin Islands, St. Thomas, United States Virgin Islands, 3School of Geography, Geology, and the Environment,
Kingston University London, Kingston-Upon-Thames, United Kingdom
Two of the major limitations to effective management of coral reef ecosystems are a lack of information on the spatial
distribution of marine species and a paucity of data on the interacting environmental variables that drive distributional
patterns. Advances in marine remote sensing, together with the novel integration of landscape ecology and advanced niche
modelling techniques provide an unprecedented opportunity to reliably model and map marine species distributions across
many kilometres of coral reef ecosystems. We developed a multi-scale approach using three-dimensional seafloor
morphology and across-shelf location to predict spatial distributions for five common Caribbean fish species. Seascape
topography was quantified from high resolution bathymetry at five spatial scales (5–300 m radii) surrounding fish survey
sites. Model performance and map accuracy was assessed for two high performing machine-learning algorithms: Boosted
Regression Trees (BRT) and Maximum Entropy Species Distribution Modelling (MaxEnt). The three most important predictors
were geographical location across the shelf, followed by a measure of topographic complexity. Predictor contribution
differed among species, yet rarely changed across spatial scales. BRT provided ‘outstanding’ model predictions (AUC = .0.9)
for three of five fish species. MaxEnt provided ‘outstanding’ model predictions for two of five species, with the remaining
three models considered ‘excellent’ (AUC = 0.8–0.9). In contrast, MaxEnt spatial predictions were markedly more accurate
(92% map accuracy) than BRT (68% map accuracy). We demonstrate that reliable spatial predictions for a range of key fish
species can be achieved by modelling the interaction between the geographical location across the shelf and the
topographic heterogeneity of seafloor structure. This multi-scale, analytic approach is an important new cost-effective tool
to accurately delineate essential fish habitat and support conservation prioritization in marine protected area design, zoning
in marine spatial planning, and ecosystem-based fisheries management.
Citation: Pittman SJ, Brown KA (2011) Multi-Scale Approach for Predicting Fish Species Distributions across Coral Reef Seascapes. PLoS ONE 6(5): e20583.
Editor: Brian Gratwicke, Smithsonian’s National Zoological Park, United States of America
Received September 11, 2010; Accepted May 6, 2011; Published May 26, 2011
Copyright: ß2011 Pittman, Brown. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: The field monitoring program was funded by National Oceanic and Atmospheric Administration Coral Reef Conservation Program (http://coralreef. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
* E-mail:
Rapid progress is being made in the development and im-
plementation of marine management strategies, including marine
spatial planning, to balance multiple conservation and resource
use objectives [1,2,3]. Although a shift towards managing eco-
system patterns and processes is occurring, for example in
ecosystem-based management, most strategies still have a focal
species component, with directives to manage and monitor specific
endangered, threatened, invasive, economically valuable, rare,
keystone or indicator species [4,5,6]. To be effective, these strat-
egies require spatially accurate ecological information on the
geographical distribution of species, as well as an understanding of
the key environmental drivers that determine species distributions.
Ecologically meaningful decision-making also requires a better
understanding of the statistical interactions between environmental
drivers and the presence of threshold effects which are rarely
modelled explicitly in marine ecology. An ecological threshold is
the point at which there is an abrupt change in an ecosystem
quality, property or phenomenon, or where small changes in an
environmental driver produce large responses [7].
Tropical coral reef ecosystems typically exist as spatial mosaics
of interconnected patches of coral reefs, seagrasses, unvegetated
sand and mangroves and represent one of the most biologically
diverse ecosystems on earth, but are also one of the most vul-
nerable to environmental change [8]. The highly heterogeneous
spatial patterning of patch types, each of which exhibit different
structural attributes, result in seascapes with complex seafloor
topography at a range of spatial scales. Fish species distributions
and diversity patterns are closely associated with structural
characteristics, particularly topographic complexity both within a
patch type [9] and across the seascape [10,11]. However, human
activity in the coastal zone combined with hurricanes, disease and
thermal stress have resulted in broad-scale loss and degradation of
biogenic structure created by reef forming scleractinian corals,
seagrasses and mangroves [12,13,14].
Over the past 20 years, coral reefs of the Caribbean region have
experienced a significant decline in coral cover [13] resulting in a
‘flattening’ of the topographic complexity [15]. A concurrent
decline in the abundance of a wide range of fish species has also
occurred, with greatest declines recorded for herbivorous, inver-
tivorous and carnivorous fish [16]. The decline is likely to have
PLoS ONE | 1 May 2011 | Volume 6 | Issue 5 | e20583
triggered cascading impacts throughout the ecosystem [17],
adding fresh impetus to the urgent need to understand broad-
scale environmental correlates, such as topographic complexity
that influence species distributions across tropical seascapes [17].
Recent research has demonstrated that individual fish species
distributions and fish diversity across coral reef ecosystems can be
reliably predicted using maps of seafloor structure or bathymetry
[18]. It is likely, however, that over broad spatial scales the rela-
tionship is more complex with other variables interacting with
bathymetry to influence the suitability of habitat for an organism.
Few studies examining habitat suitability, however, have consid-
ered the potential statistical interaction between physical structure
and the relative geographical location across broad spatial scales.
A greater understanding of the spatial patterning of species across
coral reef ecosystems will provide information on species-envi-
ronment relationships and spatial proxies for key ecological
processes, such as relative grazing or predation intensity or inter-
specific competition that can be inferred from maps of fish
Predicting species distributions in the marine environment is
problematic due to limited availability of biological survey data,
yet large amounts of marine data are available as species occur-
rence or presence only data. Presence-only modelling of species
distributions is extensively used for terrestrial species and is in-
creasingly being used for global modelling of fish, seabirds and
marine mammals [19]. In addition, comparative studies using
multiple algorithms have demonstrated that the choice of algo-
rithm can influence both the predictive accuracy and the relative
importance of individual predictor variables [20,21]. Elith et al.
[22] compared 16 predictive modelling techniques, including both
conventional and machine-learning algorithms using presence-
only data for 226 species from six regions. The authors showed
that two state-of-the-art machine-learning algorithms, Boosted
Regression Trees (BRT), also referred to as Stochastic Gradient
Boosted Regression Trees [23,24], and MaxEnt [25] consistently
outperformed other algorithms. Although machine-learning algo-
rithms are becoming more widely applied in terrestrial ecology,
few marine applications exist [21,26,27]. For marine fish, Knudby
et al. [21] showed that machine learning algorithms, particularly
tree-based ensembles provided significant increases in perfor-
mance over more conventional modelling techniques, such as
generalised additive models and linear regression.
Both BRT and MaxEnt algorithms have the ability to fit com-
plex functions including interactions between predictor variables
and employ strong regularisation techniques, including cross-
validation to avoid overfitting. These algorithms have character-
istics that make them appropriate to model complex fish-seascape
relationships, but have never before been comparatively evaluated
for marine species and environments.
We compared and evaluated the performance of two machine-
learning algorithms, boosted regression trees (BRT) and maximum
entropy modelling (MaxEnt), to model non-linear species-envi-
ronment relationships for five common fish species associated with
topographically complex Caribbean coral reef ecosystems. Envi-
ronmental data on seafloor structure was acquired from a single
remote sensing device, airborne laser altimetry (Light Detection &
Ranging or LiDAR), from which derivative spatial predictors were
generated to quantify seafloor geomorphology and across-shelf
location. Since little was known about the movement patterns
of fish species, a single appropriate spatial scale for measuring
functionally meaningful seafloor heterogeneity could not be select-
ed a priori, instead we used a multi-scale exploratory approach to
quantify seascape structure at five spatial scales (5, 25, 50, 100, &
300 metre radii). Although scale-dependency is well demonstrated
in marine ecosystems, few species distribution models have
incorporated quantitative data on seascape structure across a
range of spatial scales. The primary objectives of this study were
to: (1) Determine whether the influence of environmental pre-
dictors on species’ distribution was scale-dependent; (2) evaluate
the utility of environmental data from a single remote sensing
device combined with metrics for surface morphology to predict
and map fish species distributions across a complex coral reef
ecosystem; (3) determine which components of remotely sensed
seafloor structure contribute most to the species distribution
models; (4) identify threshold effects where changes in environ-
mental variables abruptly influence species occurrence; and (5)
evaluate the performance of two different machine-learning model-
ling algorithms for spatial predictions of marine fish distributions.
Materials and Methods
Study Area
The coral reef ecosystems of the insular shelf of southwestern
Puerto Rico (Fig. 1) exist as a spatial mosaic of habitat types
dominated by coral reefs, seagrasses, mangroves and patches of
sand. The seafloor is highly heterogeneous in assemblage com-
position and topographic structure resulting in a diverse and
productive fish community, with important ecological, economic
and cultural value. In 1979, the La Parguera region (327 km
) was
designated as a Natural Reserve (NR), Reserva Natural La
Parguera, becoming the second marine protected area in Puerto
Rico. The La Parguera NR is managed by the Puerto Rican
Department of Natural and Environmental Resources (DNER)
Bureau of Coastal, Reserves and Refuges (BCRR) as a multiple
use zone. Fishing is allowed throughout the Reserve. Like many
Caribbean coral reef ecosystems the study area has experienced
environmental changes on land and sea that have resulted in loss
of structural and functional integrity.
Fish surveys
Underwater visual surveys of fish and benthic habitat were
conducted semi-annually (Jan/Feb and Sept/Oct) across the
insular shelf at La Parguera (322 km
) between 2001 and 2008.
Survey sites (n = 1,018) were selected using a stratified-random
sampling design whereby sites were randomly located within two
mapped strata (i.e., hardbottom and softbottom) derived from
National Oceanic and Atmospheric Administration’s nearshore
benthic habitat map [28]. The sampling strategy provides a
spatially comprehensive and unbiased set of presence records
across a wide range of habitat types. Fish surveys were conducted
within a 25 m long and 4 m wide (100 m
) belt transect deployed
along a randomly selected bearing (0–360u). Constant swimming
speed was maintained for a fixed duration of fifteen minutes to
standardise the search time. All individuals were identified to
species level where possible and body lengths (fork length) were
visually estimated. To evaluate presence-only modelling algo-
rithms, abundance data for five common species were converted to
presence-only data, including (1) coney (Cephalopholis fulva) and (2)
red hind (Epinephelus guttatus) both piscivorous groupers; (3) Princess
parrotfish (Scarus taeniopterus), an abundant herbivore; (4) Queen
triggerfish (Balistes vetula), an invertebrate feeder and (5) threespot
damselfish (Stegastes planifrons), a specialist damselfish, which is
known to exhibit a strong positive relationship with several coral
species and a preference for topographically complex substrata
[29,30]. The fish species selected for this study minimized the
potentially confounding effect of spatial segregation of life stages.
For example, juvenile and adult S. planifrons and S. taeniopterus co-
occurred across the study area and C. fulva,E. guttatus and B. vetula
Predicting Fish Distributions across Seascapes
PLoS ONE | 2 May 2011 | Volume 6 | Issue 5 | e20583
were only represented by co-occurring sub-adults and adults. This
narrowed the potential niche width and facilitated identification
of meaningful environmental predictors. The occurrence varied
between species as follows: C. fulva (3% of samples); E. guttatus (4%),
S. taeniopterus (23%); B. vetula (5%) and S. planifrons (6%). Fish data
are available online at
Spatial predictors
Technological advances in sea- air- and space-borne remote
sensing devices now provide an unprecedented ability to map the
seafloor as a continuously varying three-dimensional surface or
bathymetry [31]. New techniques, such as airborne hydrographic
LiDAR [32] fire rapid pulses of laser light from an aircraft to the
seafloor and sea surface and then measure the difference in the
time of reflectance to estimate water depth and hence the vertical
height of the seafloor. This technique maps broad areas of shallow
water seascapes (,1 m to approx. 50 m) at high spatial resolution
(1–16 m
Bathymetry data were collected for southwestern Puerto Rico
between 7th and 15th May 2006 using a LADS (Laser Airborne
Depth Sounder) Mk II Airborne System operated by Tenix LADS
Incorporated. The laser system was mounted on a DeHavilland
Dash 8–200 aircraft flying at survey speeds of 72–90 metres per
Figure 1. Study area map showing underwater fish survey locations across the La Parguera region of SW Puerto Rico. The underlying
data shows the 4 m resolution LiDAR derived bathymetry depicting variability in water depth across the coral reef ecosystems from land to the
insular shelf edge in the south.
Predicting Fish Distributions across Seascapes
PLoS ONE | 3 May 2011 | Volume 6 | Issue 5 | e20583
second and at an altitude of 366–671 metres above the sea surface.
A 900 Hertz (1064 nm) Nd:Yaglaser acquired spot data at a rate of
900 pulses per second, with swath widths of 192 metres. This
provided post-processing spot data with 464 m spacing from
,1 m depth to approximately 50 m depth. Erroneous outlying
LiDAR returns were removed along with negative values (i.e.,
land) and mangroves and a seamless bathymetric surface was
exported as a GeoTIFF in ArcGIS 9.2 (Environmental Systems
Research Institute, Inc.). LiDAR data are available online
Quantifying surface morphology
Following Pittman et al. [26], six morphometrics were calcu-
lated from the bathymetric surface (mean water depth, aspect,
rugosity, slope, slope of the slope and planar curvature i.e.
convexities and concavities of the surface) in order to quantify a
range of structural attributes from the benthic terrain of south-
western Puerto Rico. To explore the influence of spatial scale on
predictive performance, the mean morphometric value of the
surrounding seascape was calculated at five spatial scales (5 m,
25 m, 50 m, 100 m and 300 m radius) using a circular moving
window within the focal statistics geoprocessing function of
ArcGIS’s Spatial Analyst (Environmental Systems Research
Institute, Inc.). In addition, spatial predictors representing the
relative geographical location across the shelf were quantified
using a distance to shoreline surface and a distance to shelf edge
surface based on Euclidean ‘straight line’ distance. The environ-
mental predictors encompass a comprehensive environmental
range from shallow nearshore (,1 m depth) to deeper (max 49 m)
shelf edge habitat and from very low relief sandy areas to high
relief coral reefs.
Modelling algorithms
Determination of variable importance and development of
predictive models was carried out using Stochastic Gradient
Boosting with the Boosted Regression Tree (BRT) code in R
software gbm package [33]. BRT is a machine learning algorithm
that uses many simple decision trees or ‘ensembles’ to iteratively
boost the predictive performance of the final model [24]. Each
subsequent regression tree predicts the residuals of the previous
thereby learning from the errors or ‘‘unsolved cases’’ of its
predecessors. The BRT models were fitted using presence-absence
data from 1018 surveyed sites and 11 environmental predictor
variables. The model was developed and evaluated using ten-fold
cross-validation (CV) to determine the optimal combinations of
the learning rate (lr) and tree complexity (tc), which provided the
optimal numbers of trees (nt) by minimizing a loss function (i.e.,
deviance reduction) [34]. lr controlled the contribution of each tree
to the model using a slow learning rate for all species (0.0001–
0.001); while tc determined the extent to which statistical
interactions were fitted; for instance, a tc of two fits a model with
two-way interaction. To control for overfitting, BRT uses a regula-
rization process that shrinks individual regression trees, while
providing sufficient flexibility to fit complex non-linear relation-
ships. Interaction strength was estimated using the techniques of
Elith et al. [30]. The relative contribution of the predictor
variables to the final models was determined using the variable
importance score based on the improvements of all splits
associated with a given variable across all trees in the model, then
rescaled so that the most important variable received a score of
100. Other variables received scores that were relative to their
contribution to the model’s predictive power [34].
Maximum entropy species distribution models were developed
with MaxEnt software (MaxEnt v3.3 beta) [25]. MaxEnt relies on
presence-only occurrence records to estimate the probability of
occurrence for a species, which can then be used to discriminate
suitable versus unsuitable areas. MaxEnt finds the probability
distribution of maximum entropy (i.e., that is most spread out, or
closest to uniform) and then constrains the distribution using a set
of environmental variables with a range of values defined by the
environment at locations where the species is known to occur [25].
MaxEnt is based on the premise that the unknown probability
distribution should have maximum entropy, but is constrained by
the environmental characteristics of the niche. MaxEnt controls
overfitting and variable selection using a regularisation that
smoothes the modelled distribution, with a penalised maximum
likelihood model that balances model fit with model complexity
[35,36]. The regularization used by MaxEnt allows it to manage
correlated variables [35], which is not the same for the BRT
models. However, neither modelling algorithm explicitly treats
spatial autocorrelation [37,38]. Ten-fold cross-validation was used
to assess model performance and jackknife resampling to measure
the importance of each predictor.
Receiver-operating characteristic curves (ROC) were construct-
ed and the area under the curve (AUC) was used to compare
prediction performance [39]. The AUC is a test statistic that uses
presence and absence records to assess model predictive
performance across a range of thresholds. MaxEnt is a presence-
only algorithm; therefore we used the Phillips et al. [25] approach
that applied randomly selected pseudo-absences instead of
observed absences to ROC AUC. We adopted the interpretation
offered by Hosmer and Lemeshow [40] whereby an AUC value of
0.7–0.8 is considered an acceptable prediction; 0.8–0.9 is
‘excellent’ and .0.9 is ‘outstanding’. A value of 0.5 is defined as
the predictive ability that could be achieved by chance alone.
Map accuracy was calculated using an independent set of
underwater survey data (n = 360) collected using an identical
technique to the original survey data used to build the models.
Predicted probability of presence sometimes referred to as habitat
suitability values were mapped to the 464 m cells of the pre-
dictors, with values scaled between 0 (absence) and 100 (highest
probability of presence). Mapped predictions were converted to
binary values (.10% probability = suitable habitat; ,10% =
unsuitable) and quantitatively assessed. Map accuracy was
calculated as the percentage of actual species sightings predicted
correctly by the predictive map.
We used generalized linear mixed models (GLMM) to analyse
variation in AUC, since the grouping structure for the data
consisted of modelling technique (i.e., BRT and MaxEnt), which
varied between five species at five different spatial scales. AUC was
included as the response variable (with Poisson error distribution)
and modelling technique was fitted as a fixed effect. Spatial scale
and species were fitted as random factors and an interaction
between species and scale was fitted as random factor. Models
were evaluated by model selection and likelihood ratio test (LRT).
The GLMM was developed using the glmer function of the lme4
library in the statistical software package R ver. 2.8.1 [41].
Additionally, simple linear regression was used to examine
relationships between fish body length and the spatial scale of
seascape structure that contributed most to models.
Comparison of BRT and MaxEnt models
BRT provided ‘outstanding’ model predictions (AUC = .0.9)
for three of five species and the remaining two considered
Predicting Fish Distributions across Seascapes
PLoS ONE | 4 May 2011 | Volume 6 | Issue 5 | e20583
‘excellent’ and ‘acceptable’. MaxEnt provided ‘outstanding’ model
predictions for two out of five species with the remaining three
models considered ‘excellent’ (AUC = 0.8–0.9) according to the
criteria of Hosmer and Lemeshow [40] (Table 1). At the species
level, BRT and MaxEnt models for C. fulva performed best (BRT
AUC = 0.97; MaxEnt AUC = 0.94) followed by BRT models for S.
taeniopterus and S. planifrons (AUC =0.93 and 0.92 respectively). The
lowest performing was a BRT model for E. guttatus (AUC = 0.74)
and MaxEnt models for S. taeniopterus (mean AUC = 0.84).
The GLMM analysis showed that there was no significant effect
of AUC on modelling technique (x2= 0.001; p.0.05; LRT); there
was also no effect of scale (x2= 0.002; P.0.05; LRT), species
(x2= 0.116; p.0.05; LRT), nor interaction between scale and
species (x2= 0.118; p.0.05; LRT) on model performance.
Additionally, model performance was not significantly (p =
.0.05) correlated with species prevalence for BRT or MaxEnt
models (r
= 0.07 and 0.24 respectively). For ease of presentation,
the following results focus on the best BRT models (Table 2).
Variable contributions and threshold effects
Our findings revealed that the single most influential predictor
was geographical location across the shelf, represented by distance
to the shelf edge and distance to the shoreline (Fig. 2). Distance to
shelf was the primary predictor for the two grouper species and
distance to shore for the Princess parrotfish and Queen triggerfish.
B. vetula,C. fulva and E. guttatus exhibited similar predictor
relationships: whereby species occurrence was predicted to be
higher in seascapes that were farthest offshore (Fig. 3). For C. fulva,a
threshold effect was evident at approximately 2000 metres from the
shelf edge, where species occurrence abruptly increased (Fig. 3). C.
fulva and B. vetula responded positively to areas with greater depths
(20–25 m). S. taeniopterus also showed a preference for offshore
habitat, but with a more gradual pattern of increasing occurrence
predicted across the shelf beyond 2000 meters from shore (Fig. 3).
Of the morphometrics, topographic complexity (i.e., slope of
slope) was most influential in determining occurrence of S.
Table 1. Cross-validation AUC values from BRT and MaxEnt
with best performing models for each algorithm highlighted
in bold.
Species Scale (m) BRT AUC MaxEnt AUC
B. vetula 5 0.846 0.861
B. vetula 25 0.831 0.855
B. vetula 50 0.838 0.854
B. vetula 100 0.852 0.858
B. vetula 300 0.867 0.862
Mean SE 0.847 (0.02) 0.858 (0.002)
C. fulva 5 0.952 0.833
C. fulva 25 0.972 0.936
C. fulva 50 0.970 0.940
C. fulva 100 0.973 0.937
C. fulva 300 0.962 0.939
Mean SE 0.966 (0.01) 0.917 (0.02)
E. guttatus 5 0.771 0.862
E. guttatus 25 0.774 0.848
E. guttatus 50 0.749 0.854
E. guttatus 100 0.759 0.848
E. guttatus 300 0.77 0.847
Mean SE 0.765 (0.02) 0.851 (0.003)
S. planifrons 5 0.916 0.886
S. planifrons 25 0.925 0.892
S. planifrons 50 0.920 0.900
S. planifrons 100 0.908 0.901
S. planifrons 300 0.894 0.891
Mean SE 0.913 (0.01) 0.894 (0.003)
S. taeniopterus 5 0.911 0.819
S. taeniopterus 25 0.928 0.834
S. taeniopterus 50 0.932 0.848
S. taeniopterus 100 0.931 0.848
S. taeniopterus 300 0.928 0.851
Mean SE 0.926 (0.009) 0.840 (0.006)
Total Model Mean 0.883 0.872
The models are for Balistes vetula (Queen triggerfish), Cephalopholis fulva
(coney), Epinephelus guttatus (red hind), Stegastes planifrons (threespot
damselfish) and Scarus taeniopterus (Princess parrotfish). The highest AUC for
each modelling technique is shown in bold.
Table 2. Optimal settings and predictive performance for
Boosted Regression Tree models.
rate tc
Deviance SE
B. vetula 5 4150 0.0005 5 0.33 0.014
B. vetula 25 2250 0.0009 5 0.326 0.015
B. vetula 50 2800 0.0008 5 0.326 0.015
B. vetula 100 3700 0.0008 4 0.321 0.020
B. vetula 300 6250 0.0004 5 0.308 0.017
C. fulva 5 3000 0.001 3 0.162 0.021
C. fulva 25 3700 0.0009 5 0.163 0.016
C. fulva 50 6900 0.0004 5 0.164 0.013
C. fulva 100 5500 0.0006 5 0.158 0.017
C. fulva 300 4700 0.0006 5 0.158 0.015
E. guttatus 5 3500 0.0003 4 0.297 0.008
E. guttatus 25 3400 0.0003 5 0.296 0.007
E. guttatus 50 4450 0.0002 5 0.301 0.008
E. guttatus 100 8350 0.0001 5 0.297 0.005
E. guttatus 300 9050 0.0001 5 0.293 0.008
S. planifrons 5 8050 0.0009 4 0.502 0.029
S. planifrons 25 8950 0.0007 5 0.461 0.015
S. planifrons 50 7450 0.0007 5 0.478 0.039
S. planifrons 100 6350 0.0007 4 0.509 0.040
S. planifrons 300 7700 0.0006 5 0.548 0.028
S. taeniopterus 5 4800 0.0009 5 0.592 0.026
S. taeniopterus 25 4750 0.0009 4 0.567 0.038
S. taeniopterus 50 6050 0.0008 4 0.549 0.021
S. taeniopterus 100 5650 0.0008 4 0.555 0.028
S. taeniopterus 300 6200 0.0008 4 0.552 0.029
The models are for Balistes vetula (Queen triggerfish), Cephalopholis fulva
(coney), Epinephelus guttatus (red hind), Stegastes planifrons (threespot
damselfish) and Scarus taeniopterus (Princess parrotfish). The bag fraction is 0.50
for all models unless indicated differently.
Predicting Fish Distributions across Seascapes
PLoS ONE | 5 May 2011 | Volume 6 | Issue 5 | e20583
planifrons,S. taeniopterus and E. guttatus. Slope of slope was the
primary predictor for S. planifrons with a strong interaction with
distance to the shelf edge. More specifically, S. planifrons
occurrence was predicted to be highest in high complexity areas
between 4,000 m and 7,500 m from the shelf edge and in depths
shallower than 15 m. The spatial prediction of probability of
presence or habitat suitability revealed a high density of highly
suitability habitat for S. planifrons over shallow water aggregated
patch reefs with high topographic complexity and along the
landward slopes of shallow linear reefs fringing offshore cays and
emergent reefs (Fig. 4). The two relationships that contributed
most to regulating the distribution of S. taeniopterus were proximity
to shore (negative relationship) and slope of slope (positive
relationship), suggesting that both geographic and topographic
variables are also important for this species.
The strength of the response curve and the location of the point
at which increasing topographic complexity no longer led to
increasing occurrence differed between species. For S. taeniopterus,a
gradual increase in occurrence with increasing slope of slope was
predicted with even very small increases in complexity greater
than zero (flat bottom) resulting in occurrence. Complexity
increased habitat suitability until slope of slope values reached
approximately 20, beyond which habitat suitability levelled off. A
steeper response curve was evident for S. planifrons, with occurrence
increasing with complexity up to a slope of slope value of 45
(Fig. 3). E. guttatus increased gradually with slope of slope until a
value of approximately 35, where a plateau in the response
occurred. These findings highlight the existence of species-specific
responses to topographic complexity, as well as some generality in
the importance of the interaction between geographical location
across the shelf and topographic complexity of the seascape for
predicting fish distributions across coral reef ecosystems.
Variable interactions
Although variable importance did not fluctuate across spatial
scales for the fish species investigated, the interactions between
topographic and geographic predictors led to a more ecologically
meaningful understanding of how multiple predictors interact to
determine habitat suitability. This was particularly true for B. vetula,
E. guttatus,S. planifrons and S. taeniopterus. For instance, the most
important interactions for S. planifrons were consistently between the
slope of slope and distance to shore (Fig. 5). A similar result was
exhibited for E. guttatus,S. planifrons and S. taeniopterus. Aside from the
expected interaction between the inversely related distance to shore
and distance to shelf edge, interaction strength was highest for: i.) B.
vetula – distance to shore and curvature and rugosity; ii.) C. fulva -
distance to shore and water depth; iii.) E. guttatus - distance to shore
and slope; iv.) S. taeniopterus - distance to shore and slope of slope;
and v.) S. planifrons - distance to shelf edge and slope of slope. The
model for C. fulva involved the strongest interactions among
predictors. In contrast, E. guttatus exhibited relatively weak
interactions. Moreover, the most important interactions for B. vetula
and E. guttatus tended to vary across spatial scales, suggesting that the
synergistic effects of different predictors are important for regulating
species’ distribution across scales in Caribbean coral reef seascapes.
Influential spatial scales
The best BRT models for B. vetula and S. taeniopterus were
developed using environmental predictors at the 300 m scale.
The best model for C. fulva was at the 100 m scale; S. planifrons
Figure 2. Boxplots of percentage contribution of each environmental predictor across all models and spatial scales for five fish
species. Horizontal lines in boxes show medians and boxes show upper and lower quartiles, with vertical lines showing minimum and maximum
values. Distance is abbreviated ‘‘Dist’’.
Predicting Fish Distributions across Seascapes
PLoS ONE | 6 May 2011 | Volume 6 | Issue 5 | e20583
Figure 3. Partial dependence plots for Boosted Regression Tree (BRT) analyses relating species occurrence to the top 4 most
influential geographical and morphological predictors. (A) Balistes vetula (Queen triggerfish); (B) Cephalopholis fulva (coney), (C) Epinephelus
guttatus (red hind); (D) Scarus taeniopterus (Princess parrotfish); and (E) Stegastes planifrons (threespot damselfish). The graphs show the effect of a
particular variable on the response: positive fitted function values suggest that species respond favorably and low values suggest the opposite. The
relative importance of each variable is shown in parentheses on the x-axis. Increasing negative values for planar curvature represent increasing
amount of convexity in the surface; positive values are concavity.
Predicting Fish Distributions across Seascapes
PLoS ONE | 7 May 2011 | Volume 6 | Issue 5 | e20583
at the 25 m scale; and E. guttatus at the 5 m scale (data not shown).
Although the strength of the response varied across spatial scales,
rarely did the relative importance of different environmental
variables change across spatial scales for any species’ models since
the primary and secondary predictors (i.e., the Euclidean distance
from shelf edge and shore) were scale-independent metrics.
Map accuracy of predicted species distributions
Independent map accuracy assessment demonstrated that
MaxEnt models produced more reliable spatial predictions of
species occurrence than did BRT models (Table 3). Map accuracy
for MaxEnt models was consistently high across all five species,
with highest accuracy calculated for S. taeniopterus (97% correct)
Figure 4. Predicted habitat suitability for
Stegastes planifrons
for the study area of SW Puerto Rico. (A) MaxEnt model of habitat suitability
for S. planifrons overlain on 4 m resolution LiDAR bathymetry; (B) Subset of the habitat suitability map for S. planifrons showing a high density of
highly suitable habitat (red) around the El Palo reef area within the La Parguera Natural Reserve; and (C) Subset of the habitat suitability map showing
highly suitable habitat (red) predicted along the shallow landward reef slopes near Corral and Romero cays. Sites of confirmed presence and absence
of S. planifrons are represented by white and black dots respectively.
Predicting Fish Distributions across Seascapes
PLoS ONE | 8 May 2011 | Volume 6 | Issue 5 | e20583
and C. fulva (95.8% correct). Predictive maps projected from BRT
models were less reliable for all species than MaxEnt and more
variable, ranging from 48% accuracy for B. vetula to 96.6% for S.
taeniopterus (Table 3).
The spatial modelling approach developed here integrates data
and novel tools and techniques from geographical information
science together with landscape ecology concepts and advanced
machine-learning algorithms to model complex non-linear species-
environment relationships. We have demonstrated that morpho-
logical characteristics of the seafloor and geographical predictors
interact to function as effective predictors of fish species dis-
tribution across topographically complex coral reef ecosystems.
Our results demonstrated that coral reef ecosystems exhibit high
spatial variability in habitat suitability at a range of scales for five
common fish species. We demonstrate that the location of coral
reefs across the insular shelf does matter to fish; and that coral
reefs of equally high topographic complexity will not necessarily
offer identical habitat suitability for fish.
Although species showed individualistic responses to predictors,
non-linear statistical interactions between the geographical
location across the shelf and the structural heterogeneity of the
seafloor produced reliable models of species distributions. Geo-
graphical threshold effects were evident in ecological responses for
Figure 5. Schematic models of predictor interactions from BRT models. (A) Balistes vetula (Queen triggerfish); (B) Cephalopholis fulva
(coney); (C) Epinephelus guttatus (red hind); (D) Scarus taeniopterus (Princess parrotfish); (E) Stegastes planifrons (threespot damselfish). Line thickness
is proportional to interaction strength with thicker lines indicating stronger interactions.
Predicting Fish Distributions across Seascapes
PLoS ONE | 9 May 2011 | Volume 6 | Issue 5 | e20583
several species indicative of distinct zonation in the spatial pattern
of habitat suitability. The ability to map the spatial patterns in
habitat quality for species and groups of species is valuable for
mapping essential fish habitat and conservation planning.
Most importantly, we highlight the importance of using
independent validation data to evaluate model predictions and
demonstrate that model performance may not necessary translate
to map accuracy when the predictions of habitat suitability are
projected across seascapes. Higher map accuracy from MaxEnt
model predictions may reflect the difference between an entropic
distribution with environmental constraints capable of modelling
very complex spatial distributions, versus a recursive partitioning
approach with splitting across variable values. Splitting may
perform better for species with more distinct zonation patterns of
distribution. More multi-species studies are required to examine
the distributional characteristics that specific algorithms are best
suited to predict. Moreover, rather than relying solely on AUC,
alternative metrics should be used to evaluate model performance,
particularly for MaxEnt which relies on pseudo-absences. Using
pseudo-absences may lead to biased AUC values, because this
index gives equal weights to omission and commission errors and
pseudo-absences tend to inflate the number of false absences [42].
However, for our data, using independent validation data to assess
model predictions favoured MaxEnt.
Variable contribution and interactions
Topographic complexity is widely recognised as an important
predictor of fish species distributions, with more complex patches
and seascapes supporting higher fish abundance and species
richness than less complex patches [10,11,21,43]. Although our
results support this hypothesis, with slope of the slope and surface
rugosity (measures of topographic complexity) identified as im-
portant predictors in distribution models of three of the five fish
species, we also show that not all coral reefs offer equal habitat
suitability, even if they do exhibit equal levels of topographic
complexity. At broad spatial scales, the suitability of coral reefs for
fish species in the study area was mediated by the interaction
between topographic complexity and geographical location across
the insular shelf. In fact, cross-shelf location measured by Euclid-
ean distance from both the shelf edge and shoreline explained
more of the variability in fish species occurrence than any other
individual predictor.
Several studies have highlighted the importance of cross-shelf
location for fish distributions [44,45,46], yet relative position
across the shelf is rarely directly quantified as a potential spatial
proxy in ecological studies of marine species distributions. Both
distance to coastline and distance to barrier reef emerged as the
most important predictors for a wide range of fish species on the
Great Barrier Reef, Queensland Australia [47,48]. A disadvantage
associated with use of a geographical predictor is that the exact
causal patterns and processes relevant to cross-shelf location are
ambiguous. An advantage, however, is that geographical predic-
tors provide a relatively static, easy to quantify proxy that may
indirectly represent changes across a wide range of dynamic
gradients in environmental conditions (e.g., depth, temperature,
salinity, turbidity, connectivity) including those that are problem-
atic to quantify accurately at appropriate spatial and temporal
Compared with geographical predictors and topographic
complexity, other predictors such as curvature, aspect and slope
each contributed less than 12% (mean variable contribution)
across all species. These variables have been found to be important
predictors of vegetation distribution in terrestrial landscapes, yet
very little is known about their importance as drivers of ecological
patterns across the seascape. Slope and aspect could influence
hydrodynamics and the amount of light irradiance received by
photosynthetic organisms (e.g., algae and scleractinian corals),
with implications for fish distributions; but these characteristics of
the terrain morphology have yet to be explored relative to
biological function in coral reef ecosystems.
Threshold effects
This study identified several thresholds in predictor responses to
geographical location, which defined discrete constraints on
habitat suitability across the shelf. This pattern is indicative of
the existence of ecologically meaningful zonation across the shelf
likely mediated by local coastal geomorphology. The existence of
geographical threshold effects may be related to life-history
strategies and tactics, such as whether a species is a habitat
specialist with a critical dependence on a single habitat type or
seascape generalist capable of using multiple habitat types and
geomorphological zones. Evidence from terrestrial species [49]
and a few marine examples [26,50] indicate that threshold effects
are species specific, a result that was supported by our findings.
Past studies have focused on changes or spatial differences in the
abundance of patch types represented as two dimensional flat
surfaces, rather than spatial gradients of three dimensional surfaces
as was accomplished with these analyses. Our study suggests that
understanding the three dimensional structural conditions under
which thresholds are likely to be exceeded and the mechanisms
underlying the threshold response is critical to predicting change
and for examining the options for management intervention and
setting targets for structural restoration.
Table 3. Comparison of map accuracy for predicted fish species distributions using BRT and MaxEnt algorithms.
Species Presence sites BRT MaxEnt
% Correct % Misclassified % Correct % Misclassified
B. vetula 44 48 52 90 10
C. fulva 24 54.2 45.8 95.8 4.2
E. guttatus 10 70 30 90 10
S. planifrons 30 73.3 26.7 90 10
S. taeniopterus 87 96.6 3.4 97 3
Mean 68.4 31.6 92.6 7.4
Prediction probability threshold of .10% used for mapping suitable habitat. Mean values are bold.
Predicting Fish Distributions across Seascapes
PLoS ONE | 10 May 2011 | Volume 6 | Issue 5 | e20583
Spatial scale
Our multiscale approach, adapted from landscape ecology,
allowed us to examine scale-dependent effects in species response
to environmental heterogeneity. A range of spatial scales emerged
for identifying the characteristic scale of response for the five fish
species. The scale of response was species specific with no positive
allometric scaling relationship evident between fish body size and
size of seascapes. For instance, one of the grouper species (E.
guttatus) was best predicted using spatial complexity quantified at
the 5 m radial extent, while the smallest bodied fish species (S.
planifrons) was best predicted at the 25 m radial extent. This may,
however, reflect a site specific preference for highly complex
structure in close proximity for E. guttatus. Limited information is
available on the scale of movements for most tropical species,
therefore limiting any meaningful scale selection in ecology studies.
Where data are available, behavioural studies have shown that S.
planifrons is a highly territorial site-attached fish with a home range
of several metres and no evidence for nocturnal migrations [51]. In
contrast, behavioural observations of S. taeniopterus in Barbados
found that fish moved (20 to 375 m migrations) to structurally
complex and deeper reef slopes or nearby areas with high coral
colony density to find night resting areas [52]. It is likely that
suitable habitat includes close proximity between day and night
use areas that together offer sufficient structural complexity to
provide abundant food and refuge from predators. The proximity
of suitable habitats determines both the spatial scales of the daily
home range and therefore the spatial scales at which individuals
respond to the environment. For species such as S. taeniopterus,an
exploratory, multi-scale approach that is inclusive of surrounding
structural heterogeneity at a range of scales is more likely to in-
clude the structurally complex night resting areas that exist within
a few hundred metres of the locations of daytime occurrence.
Management implications and future challenges
A systematic assessment of marine species’ distributions and
their responses to specific environmental variables at multiple
spatial scales provides valuable information for conservation
planning and fisheries management. The quantitative and spatially
explicit techniques demonstrated offer a cost-effective and reliable
tool for refining the spatial delineation of essential fish habitat
within a region and for identifying the suite of site characteristics
that are important for priority species [53]. Furthermore, a multi-
scale approach obviates some of the minor geopositional
inaccuracies that may occur in field surveys when linking response
variables to environmental structure. A multi-scale approach is
ecologically appropriate when insufficient information on actual
movements and habitat use patterns are available, and when it is
likely that species respond hierarchically to spatial structure and
respond at different scales to different components of structure
Future studies are now underway that examine the potential
applications of our modelling approach for forecasting the influ-
ence of differing levels of ‘topographic flattening’ on habitat
suitability and the associated contractions and expansions in fish
species distribution. Species distributions can also be used as
spatial proxies for key ecological processes such as herbivory.
Mapped distributions for multiple herbivorous species can be
spatially combined to map cumulative patterns of grazing inten-
sity, a key process controlling the dynamics and resilience of coral
reef ecosystems [55].
Additional future work is required to determine the portability
and generality of the models through application to geographically
different regions and to assess performance for a wider range of
species in both fished and unfished areas [56]. Spatial modelling
techniques can offer a cost-effective analytical solution to both
filling the spatial information gap and increasing our understand-
ing of macro-ecological relationships, even in relatively data poor
regions of the world. The results emphasise the importance of
understanding the architectural complexity of coral reefs,
particularly in the Caribbean and other sensitive seascapes that
have shown declines in coral cover and a ‘flattening’ of coral
topography as a result of catastrophic and sub-catastrophic events
including disease, hurricanes and bleaching.
We thank the scientific divers associated with NOAA Biogeography
Branch Coral Reef Ecosystem Monitoring Project (CREM) for the
collection of field data, and B. Costa for processing the LiDAR data.
Author Contributions
Conceived and designed the experiments: SJP KAB. Performed the
experiments: SJP KAB. Analyzed the data: SJP KAB. Contributed
reagents/materials/analysis tools: SJP KAB. Wrote the paper: SJP KAB.
Contributed to field data collection: SJP.
1. Fernandes L, Day J, Lewis A, Slegers S, Kerrigan B, et al. (2005) Establishing
representative no-take areas in the Great Barrier Reef: Large-scale implemen-
tation of theory on marine protected areas. Conservation Biology 19:
2. Douvere F, Ehler CN (2007) International workshop on marine spatial planning,
UNESCO, Paris, 8–10 November 2006: A summary. Marine Policy 31:
3. Klein CJ, Wilson KA, Watts M, Stein J, Carwardine J, et al. (2009) Spatial
conservation prioritization inclusive of wilderness quality: A case study of
Australia’s biodiversity. Biological Conservation 142: 1282–1290.
4. Lourie SA, Vincent ACJ (2004) Using biogeography to help set priorities in
marine conservation. Conservation Biology 18: 1004–1020.
5. Geselbracht L, Torres R, Cumming GS, Dorfman D, Beck M, et al. (2009)
Identification of a spatially efficient portfolio of priority conservation sites in
marine and estuarine areas of Florida. Aquatic Conservation-Marine and
Freshwater Ecosystems 19: 408–420.
6. Maxwell DL, Stelzenmuller V, Eastwood PD, Rogers SI (2009) Modelling the
spatial distribution of plaice (Pleuronectes platessa), sole (Solea solea) and
thornback ray (Raja clavata) in UK waters for marine management and
planning. Journal of Sea Research 61: 258–267.
7. Groffman P, Baron J, Blett T, Gold A, Goodman I, et al. (2006) Ecological
thresholds: The key to successful environmental management or an important
concept with no practical application? Ecosystems 9: 1–13.
8. Millennium Ecosystem Assessment (2005) Ecosystems and human well-being:
biodiversity synthesis. Washington, DC: World Resources Institute.
9. Friedlander AM, Parrish JD (1998) H abitat characteristics affecting fish
assemblages on a Hawaiian coral reef. Journal of Experimental Marine Biology
and Ecology 224: 1–30.
10. Pittman SJ, Caldow C, Hile SD, Monaco ME (2007) Using seascape types to
explain the spatial patterns of fish in the mangroves of SW Puerto Rico. Marine
Ecology-Progress Series 348: 273–284.
11. Wedding LM, Friedlander AM, McGranaghan M, Yost RS, Monaco ME (2008)
Using bathymetric lidar to defin e nearshore benthic habitat complexity:
Implications for management of reef fish assemblages in Hawaii. Remote
Sensing of Environment 112: 4159–4165.
12. Hughes TP, Baird AH, Bellwood DR, Card M, Connolly SR, et al. (2003)
Climate change, human impacts, and the resilience of coral reefs. Science 301:
13. Gardner TA, Cote IM, Gill JA, Grant A, Watkinson AR (2003) Long -term
region-wide declines in Caribbean corals. Science 301: 958–960.
14. Valiela I, Bowen JL, Cole ML, Kroeger KD, Lawrence D, et al. (2001)
Following up on a Margalevian concept: Interactions and exchanges among
adjacent parcels of coastal landscapes. Scientia Marina 65: 215–229.
15. Alvarez-Filip L, Dulvy NK, Gill JA, Cote IM, Watkinson AR (2009) Flattening
of Caribbean coral reefs: region-wide declines in architectural complexity.
Proceedings of the Royal Society Series B 276: 3019–3025.
Predicting Fish Distributions across Seascapes
PLoS ONE | 11 May 2011 | Volume 6 | Issue 5 | e20583
16. Paddack MJ, Reynolds JD, Aguilar C, Appeldoorn RS, Beets J, et al. (2009)
Recent Region-wide Declines in Caribbean Reef Fish Abundance. Current
Biology 19: 590–595.
17. Cheal AJ, Wilson SK, Emslie MJ, Dolman AM, Sweatman H (2008) Responses
of reef fish communities to coral declines on the Great Barrier Reef. Marine
Ecology-Progress Series 372: 211–223.
18. Pittman SJ, Christensen JD, Caldow C, Menza C, Monaco ME (2007)
Predictive mapping of fish species richness across shallow-water seascapes in the
Caribbean. Ecological Modelling 204: 9–21.
19. Ready JK, Kaschner K, South AB, Eastwood PD, Rees T, et al. (2010) Predicing
the distributions of marine organisms a the global scale. Ecological Modelling
221: 467–478.
20. Elith J, Graham CH, Anderson RP, Dudik M, Ferrier S, et al. (2006) Novel
methods improve prediction of species’ distributions from occurrence data.
Ecography 29: 129–151.
21. Knudby A, LeDrew E, Brenning A (2010) Predictive mapping of reef fish species
richness, diversity and biomass in Zanzibar using IKONOS imagery and
machine-learning techniques. Remote Sensing of Environment 114: 1230–1241.
22. Elith J, Graham CH, Anderson RP, Dudik M, Ferrier S, et al. (2006) Novel
methods improve prediction of species’ distributions from occurrence data.
Ecography 29: 129–151.
23. Friedman JH (2001) Greedy function approximation: a gradient boosting
machine. Annals of Statistics 29: 1189–1232.
24. Friedman JH (2002) Stochastic gradient boosting. Computational Stati stics and
Data Analysis 38: 367–378.
25. Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropy modeling of
species geographic distributions. Ecological Modelling 190: 231–259.
26. Pittman SJ, Costa BM, Battista TA (2009) Using Lidar Bathymetry and Boosted
Regression Trees to Predict the Diversity and Abundance of Fish and Corals.
Journal of Coastal Research 25: 27–38.
27. Leathwick J, Moilanen A, Francis M, Elith J, Taylor P, et al. (2008) Novel
methods for design and evaluation of marine protected areas in offshore waters.
Conservation Letters 1: 91–102.
28. Menza C, Ault J, Beets J, Bohnsack J, Caldow C, et al. (2006) A guide to
monitoring reef fish in the National Park Service’s South Florida/Caribbean
Network. In: NCCOS NTMN, editor. Silver Spring, Maryland. 166 p.
29. Booth DJ, Beretta GA (1994) Seasonal recruitment, habitat associations and
survival of pomacentrid reef fish in the US Virgin Islands. Coral Reefs 13:
30. Gratwicke B, Speight MR (2005) The relationship between fish species richness,
abundance and habitat complexity in a range of shallow tropical marine
habitats. Fish Biology 66: 650–667.
31. Costa BM, Battista TA, Pittman SJ (2009) Comparative evaluation of airborne
LiDAR and ship-based multibeam SoNAR bathymetry and intensity for
mapping coral reef ecosystems. Remote Sensing of Environment 113:
32. Brock JC, Purkis SJ (2009) The Emerging Role of Lidar Remote Sensing in
Coastal Research and Resource Management. Journal of Coastal Research 25:
33. Elith J, Leathwick JR, Hastie T (2008) A working guide to boosted regression
trees. Journal of Animal Ecology 77: 802–813.
34. Elith J, Leathwick JR, Hastie T (2008) A working guide to boosted regression
trees. Journal of Animal Ecology. pp 802–813.
35. Elith J, Phillips SJ, Hastie T, Dudik M, Chee YE, et al. (2011) A statistical
explanation of MaxEnt for ecologists. Diversity and Distributions 17: 43–57.
36. Phillips SJ, Dudik M (2008) Modeling of species distributions with Maxent: new
extensions and a comprehensive evaluation. Ecography 31: 161–175.
37. Elith J, Leathwick JR (2009) Species Distribution Models: Ecological
Explanation and Prediction Across Space and Time. Annual Review of Ecology
Evolution and Systematics 40: 677–697.
38. Phillips SJ, Dudik M, Elith J, Graham CH, Lehmann A, et al. (2009) Sample
selection bias and presence-only distribution models: implications for back-
ground and pseudo-absence data. Ecological Applications 19: 181–197.
39. Fielding AH, Bell JF (1997) A review of methods for the assessment of prediction
errors in conservation presence/absence models. Environmental Conservation.
pp 38–49.
40. Hosmer DW, Lemeshow S (2000) Applied Logistic Regression, second edition.
New York, USA: John Wiley & Sons.
41. R_Development_Core_Team (2010) R: A language and environment for
statistical computing. R Foundation for Statistical Computing. Vienna, Austria.
42. Lobo JM, Jimenez-Valverde A, Real R (2007) AUC: a misleading measure of the
performance of predictive distribution models. Global Ecology and Biogeogra-
phy 17: 145–151.
43. Gratwicke B, Speight MR (2005) The relationship between fish species richness,
abundance and habitat complexity in a range of shallow tropical marine
habitats. Journal of Fish Biology 66: 650–667.
44. Christensen JD, Jeffrey CFG, Caldow C, Monaco ME, Kendall MS, et al. (2003)
Cross-shelf habitat utilization patterns of reef fishes in southwestern Puerto Rico.
Gulf and Caribbean Research 14: 9–27.
45. Lindeman KC, Diaz GA, Serafy JE, Ault JS (1998) A spatial framework for
assessing cross-shelf habitat use among newly settled grunts and snappers.
Proceedings of Gulf & Caribbean Fisheries Institute 50: 385–416.
46. Williams DM, Hatcher AI (1983) Structure of fish communities on outer slopes
of inshore, mid-shelf and outer-self reefs of the Great Barrier Reef. Marine
Ecology Progress Series 10: 239–250.
47. Cappo M, De’ath G, Speare P (2007) Inter-reef vertebrate communitie s of the
Great Barrier Reef Marine Park determined by baited remote underwater video
stations. Marine Ecology-Progress Series 350: 209–221.
48. Mellin C, Bradshaw CJA, Meekan MG, Caley MJ (2010) Environmental and
spatial predictors of species richness and abundance in coral reef fishes. Global
Ecology and Biogeography 19: 212–222.
49. Andre´n H (1994) Effects of habitat fragmentation on birds and mammals in
landscapes with different proportions of suitable habitat: a review. Oikos 71:
50. Pittman SJ, McAlpine CA, Pittman KM (2004) Linking fish and prawns to their
environment: A hierarchical landscape approach. Marine Ecology Progress
Series 283: 233–254.
51. Williams AH (1978) Ecology of threespot damselfish: social organization, age
structure and population stability. Journal of Experimental Marine Biology and
Ecology 34: 197–213.
52. Dubin RE, Baker JD (1982) Two types of cover-seeking behavior at sunset by the
princess parrotfish, Scarus taeniopterus, at Barbados, West Indies. Bulletin Marine
Science 32: 572–583.
53. Valavanis VD, Pierce GJ, Zuur AF, Palialexis A, Saveliev A, et al. (2008)
Modelling of essential fish habitat based on remote sensing, spatial analysis and
GIS. Hydrobiologia 612: 5–20.
54. Pittman SJ, McAlpine CA (2003) Movement of marine fish and decapod
crustaceans: process, theory and application. Advances in Marine Biology 44:
55. Mumby P, Dahlgren C, Harborne A, Kappel C, Micheli F, et al. (2006) Fishing,
trophic cascades, and the process of grazing on coral reefs. Science 311: 98–101.
56. Mellin C, Ferraris J, Galzin R, Harmelin-Vivien M, Kulbicki M, et al. (2008)
Natural and anthropogenic influences on the diversity structure of reef fish
communities in the Tuamotu Ar chipelago (French Polynesia). Ecological
Modelling 218: 182–187.
Predicting Fish Distributions across Seascapes
PLoS ONE | 12 May 2011 | Volume 6 | Issue 5 | e20583
... Animal abundance and diversity is positively correlated with terrain complexity in most seascapes, as the distribution of many species (for example, cetaceans, fish, turtles and crustaceans) is typically concentrated over complex bathymetric features (for example, reefs, pinnacles, seamounts and artificial structures) (Rex and others 2006;Pittman and others 2007b;Schlacher and others 2010;Rees and others 2014;Borland and others 2022a). Prominent terrain features often harbour diverse faunal assemblages because they support an abundance of food (for example, photosynthetic organisms and invertebrate prey species) (Cameron and others 2014;Rees and others 2018), provide settlement and sheltering sites (Sabaté s and others 2007; Bejarano and others 2011) and offer refugia from various forms of disturbance (for example, hydrodynamic forces and fishing pressure) (Pittman and Brown 2011;Stamoulis and others 2018). These functions of complex terrain features are analogous to many of the services that are provided by nursery habitats (Sheaves and others 2015;Whitfield and Pattrick 2015), and for this reason areas of high bathymetric relief are often viewed as enhanced nurseries in continental shelf and rocky reef seascapes (Giannoulaki and others 2011;Farmer and others 2017;Pirtle and others 2017). ...
... Here, we investigated the importance of seafloor terrain variation for the ecological roles of mangrove and seagrass habitats that provide important nursery functions for a diversity of marine species in eastern Australia (Sheaves and others 2016; Hayes and others 2020). Terrain features in the subtidal zones of these seascapes can provide similar habitat functions to mangroves and seagrass for marine animals and might therefore serve as supplementary nursery habitats (Wedding and others 2008;Pittman and Brown 2011;Borland and others 2022a), but this hypothesis has not been tested with empirical data. Furthermore, many species use mangroves and seagrass habitats as juveniles (that is, nursery species), whilst other transient species (that is, non-nursery species) also occupy these ecosystems opportunistically (for example, during foraging and reproductive migrations) (Harborne and others 2017). ...
Full-text available
Mangroves and seagrasses are important nurseries for many marine species, and this function is linked to the complexity and context of these habitats in coastal seascapes. It is also connected to bathymetric features that influence habitat availability, and the accessibility of refuge habitats, but the significance of terrain variation for nursery function is unknown. To test whether seafloor terrain influences nursery function, we surveyed fish assemblages from mangrove and seagrass habitats in 29 estuaries in eastern Australia with unbaited underwater cameras and quantified the surrounding three-dimensional terrain with a set of complementary surface metrics (that is, depth, aspect, curvature, slope, roughness) applied to sonar-derived bathymetric maps. Terrain metrics explained variability in assemblages in both mangroves and seagrasses, with differing effects for the entire fish assemblage and nursery species composition, and between habitats. Higher depth, plan curvature (concavity or convexity) and roughness (backscatter) were negatively correlated with abundance and diversity in mangroves and positively linked to abundance and diversity in seagrass. Mangrove nursery species (6 species) were most abundant in forests adjacent to flats with concave holes, rough substrates and low-moderate depths, whereas seagrass nursery species (3 species) were most abundant in meadows adjacent to deep channels with soft mounds and ledges. These findings indicate that seafloor terrain influences nursery function and demonstrate contrasting effects of terrain variation in mangroves and seagrass. We suggest that incorporating three-dimensional terrain into coastal conservation and restoration plans could help to improve outcomes for fisheries management, but contrasting strategies might be needed for different nursery habitats.
... This search window can be increased to broaden the spatial scale at which the characteristic is calculated, in turn decreasing the detail that is captured. Studies examining the effect of changing the size of the search window have shown that while smaller windows are often more accurate when compared to in situ measurements, broad scale measurements can still be of ecological relevance to demersal fish communities (Wedding et al., 2008;Pittman and Brown, 2011). It is likely that both smalland large-scale seafloor characteristics contribute to patterns in fish assemblage metrics but that the relationship between any one characteristic and metric is likely to change across spatial scales (Kendall et al., 2011). ...
Full-text available
No-take marine reserves are often located in remote locations far away from human activity, limiting perceived impact on extractive users but also reducing their use for investigating impacts of fishing. This study aimed to establish a benchmark in the distribution of fished species across the Ningaloo Marine Park – Commonwealth (NMP-Commonwealth), and adjacent comparable habitats within the Ningaloo Marine Park - State (NMP-State), in Western Australia to test if there was evidence of an effect of recreational fishing, as no commercial fishing is allowed within either marine park. We also examined whether the remote location of the newly established (2018) No-take Zone (NTZ), in NMP-Commonwealth, limits its use for studying the effects of fishing. Throughout the NMP-Commonwealth and NMP-State, where recreational fishing is permitted, we expected the abundance of recreationally fished fish species to increase with increasing distance to the nearest boat ramp, as a proxy of recreational fishing effort. Conversely, we did not expect the abundance of non-fished species and overall species richness to vary in response to the proxy for human activity. Distance to the nearest boat ramp was found to be a strong predictor of fished species abundance, indicating that the effect of recreational fishing can be detected across the NMP-Commonwealth. The effect of the NTZ on fished species abundance was weakly positive, but this difference across the NTZ is expected to increase over time. Habitat composition predictors were only found to influence species richness and non-fished species abundance. This study suggests a clear footprint of recreational fishing across the NMP-Commonwealth and as a result the new NTZ, despite its remote location, can act as a control in future studies of recreational fishing effects.
... Additionally, seascape structure influences fish assemblages at multiple scales (Anderson et al., 2009;Pittman & Brown, 2011), including scales other than the cm-scale metrics captured in this study (Weijerman et al., 2019). ...
Full-text available
Benthic components of tropical mesophotic coral ecosystems (MCEs) are home to diverse fish assemblages, but the effect of multiscale spatial benthic characteristics on MCE fish is not well understood. To investigate the influence of fine‐scale benthic seascape structure and broad‐scale environmental characteristics on MCE fish, we surveyed fish assemblages in Seychelles at 30, 60 and 120 m depth using submersible video transects. Spatial pattern metrics from seascape ecology were applied to quantify fine‐scale benthic seascape composition, configuration and terrain morphology from structure‐from‐motion photogrammetry and multibeam echosounder bathymetry and to explore seascape–fish associations. Hierarchical clustering using fish abundance and biomass data identified four distinct assemblages separated by the depth and geographic location, but also significantly influenced by variations in fine‐scale seascape structure. Results further revealed variable responses of assemblage characteristics (fish biomass, abundance, trophic group richness, Shannon diversity) to seascape heterogeneity at different depths. Sites with steep slopes and high terrain complexity hosted higher fish abundance and biomass, with shallower fish assemblages (30–60 m) positively associated with aggregated patch mixtures of coral, rubble, sediment and macroalgae with variable patch shapes. Deeper fish assemblages (120 m) were positively associated with relief and structural complexity and local variability in the substratum and benthic cover. Our study demonstrates the potential of spatial pattern metrics quantifying benthic composition, configuration and terrain structure to delineate mesophotic fish–habitat associations. Furthermore, incorporating a finer‐scale perspective proved valuable to explain the compositional patterns of MCE fish assemblages. As developments in marine surveying and monitoring of MCEs continue, we suggest that future studies incorporating spatial pattern metrics with multiscale remotely sensed data can provide insights will that are both ecologically meaningful to fish and operationally relevant to conservation strategies. To investigate the influence of benthic seascape structure on MCE fish assemblages in Seychelles, this study surveyed benthic structure and fish assemblages using submersible video transects at mesophotic depths. Spatial pattern metrics measuring benthic habitat composition, configuration and terrain structure were extracted from Structure‐from‐Motion photogrammetry models to quantify fish‐habitat associations. The results revealed depth‐ and site driven grouping of mesophotic fish assemblages that show significant associations with fine‐scale (cm‐m) terrain structure, seascape composition and configuration.
... Although the loss functions can be chosen arbitrarily, for clarity's sake, if the error function is the conventional squarederror loss, the learning strategy will result in consecutive error fitting. In general, the researcher can choose the loss function, given the breadth of already determined loss functions and the possibility of constructing one's own task-specific loss (16,17). ...
Aim:The diagnosis of breast cancer can be accomplished using an algorithm or an early detection model of breast cancer risk via determining factors. In the present study, gradient boosting machines (GBM), extreme gradient boosting (XGBoost) and light gradient boosting (LightGBM) models were applied and their performances were compared.Methods:The open-access Breast Cancer Wisconsin Dataset, which includes 10 features of breast tumors and results from 569 patients, was used for this study. The GBM, XGBoost, and LightGBM models for classifying breast cancer were established by a repeated stratified K-fold cross validation method. The performance of the model was evaluated with accuracy, recall, precision, and area under the curve (AUC).Results:Accuracy, recall, AUC, and precision values obtained from the GBM, XGBoost, and LightGBM models were as follows: (93.9%, 93.5%, 0.984, 93.8%), (94.6%, 94%, 0.985, 94.6%), and (95.3%, 94.8%, 0.987, 95.5%), respectively. According to these results, the best performance metrics were obtained from the LightGBM model. When the effects of the variables in the dataset on breast cancer were assessed in this study, the five most significant factors for the LightGBM model were the mean of concave points, texture mean, concavity mean, radius mean, and perimeter mean, respectively.Conclusion:According to the findings obtained from the study, the LightGBM model gave more successful predictions for breast cancer classification compared with other models. Unlike similar studies examining the same dataset, this study presented variable significance for breast cancer-related variables. Applying the LightGBM approach in the medical field can help doctors make a quick and precise diagnosis.
... Artificial intelligence (AI) applications in aquatic and marine biodiversity and water resource optimize the conservation of aquatic and marine flora and fauna and water resources and attracted significant reserch attention since last decade. For instance, AI and ML models have been used to predict stream flow [98][99][100][101][102][103], water quality , water pollution and toxicology [126][127][128][129][130][131][132], aquatic and marine biodiversity diversity prediction and extinction [133][134][135][136][137][138][139][140][141][142][143][144][145][146][147][148][149], predicting species distribution and habitat mapping [150][151][152][153][154][155][156][157][158][159][160][161][162][163][164], and marine and aquatic species recognition and classification [165][166][167][168][169][170][171][172][173][174][175][176][177][178][179][180][181][182][183]. Above mentioned AI research in aquatic and marine biodiversity and water resource conservation highlight that AI will be key to developing new technology to uncover new aspects of conservation and potential threats to aquatic and marine ecosystems' structures and functions, thereby informing effective monitoring and conservation of aquatic and marine biodiversity and managing water resources. ...
Full-text available
The recent advancement in data science coupled with the revolution in digital and satellite technology has improved the potential for artificial intelligence (AI) applications in the forestry and wildlife sectors. India shares 7% of global forest cover and is the 8th most biodiverse region in the world. However, rapid expansion of developmental projects, agriculture, and urban areas threaten the country’s rich biodiversity. Therefore, the adoption of new technologies like AI in Indian forests and biodiversity sectors can help in effective monitoring, management, and conservation of biodiversity and forest resources. We conducted a systematic search of literature related to the application of artificial intelligence (AI) and machine learning algorithms (ML) in the forestry sector and biodiversity conservation across globe and in India (using ISI Web of Science and Google Scholar). Additionally, we also collected data on AI-based startups and non-profits in forest and wildlife sectors to understand the growth and adoption of AI technology in biodiversity conservation, forest management, and related services. Here, we first provide a global overview of AI research and application in forestry and biodiversity conservation. Next, we discuss adoption challenges of AI technologies in the Indian forestry and biodiversity sectors. Overall, we find that adoption of AI technology in Indian forestry and biodiversity sectors has been slow compared to developed, and to other developing countries. However, improving access to big data related to forest and biodiversity, cloud computing, and digital and satellite technology can help improve adoption of AI technology in India. We hope that this synthesis will motivate forest officials, scientists, and conservationists in India to explore AI technology for biodiversity conservation and forest management.
... The upper limit of 31 cm was selected to correspond with what can be acquired from other remote sensing sources (e.g., occupied aircraft imagery, satellite imagery). This method of varying the resolution scale of input variables (sometimes referred to as coarse-graining) has been used when studying marine environments, often within the context of habitat selection models [33][34][35][36], and was identified as the most appropriate method to use when wanting to characterize specific features or processes [17]. ...
Full-text available
Monitoring intertidal habitats, such as oyster reefs, salt marshes, and mudflats, is logistically challenging and often cost- and time-intensive. Remote sensing platforms, such as unoccupied aircraft systems (UASs), present an alternative to traditional approaches that can quickly and inexpensively monitor coastal areas. Despite the advantages offered by remote sensing systems, challenges remain concerning the best practices to collect imagery to study these ecosystems. One such challenge is the range of spatial resolutions for imagery that is best suited for intertidal habitat monitoring. Very fine imagery requires more collection and processing times. However, coarser imagery may not capture the fine-scale patterns necessary to understand relevant ecological processes. This study took UAS imagery captured along the Gulf of Mexico coastline in Florida, USA, and resampled the derived orthomosaic and digital surface model to resolutions ranging from 3 to 31 cm, which correspond to the spatial resolutions achievable by other means (e.g., aerial photography and certain commercial satellites). A geographic object-based image analysis (GEOBIA) workflow was then applied to datasets at each resolution to classify mudflats, salt marshes, oyster reefs, and water. The GEOBIA process was conducted within R, making the workflow open-source. Classification accuracies were largely consistent across the resolutions, with overall accuracies ranging from 78% to 82%. The results indicate that for habitat mapping applications, very fine resolutions may not provide information that increases the discriminative power of the classification algorithm. Multiscale classifications were also conducted and produced higher accuracies than single-scale workflows, as well as a measure of uncertainty between classifications.
... Gradient Boosting machine (GBM) is a highly flexible and powerful machine learning technique which was originally derived by Friedman (2001). It can cater to many particular data-driven task (Natekin & Knoll, 2013), and has shown superior predictive capacity and considerable success in both data-mining challenges and soil property modelling (Pittman and Brown, 2011;Johnson & Zhang, 2014;Mishra et al., 2020). GBM produces a regression or a classification model in the form of an ensemble of a series of weak and inaccurate learners. ...
Accurate soil organic carbon content estimation is critical as a proxy for carbon sequestration, and as one of the indicators for soil health. Here, we collected 497 soil samples during 2015 and 2019, as well as five environmental covariates (organic carbon (OC) input from the crops, normalized difference vegetation index (NDVI), elevation, clay content and precipitation) at a resolution of 30 m. We then aggregated these to represent agricultural fields and compiled a soil organic carbon (SOC) content map for the agricultural soils of Wallonia using Gradient Boosting Machine. We calculated OC input from both main crops and cover crops for each individual field. As the cover crops do not occur in the agricultural census, we identified cover crops based on long time-series of NDVI values obtained from the Google Earth Engine platform. The quality of the SOC predictions was assessed by validation data and we obtained an R² of 0.77. The Empirical Mode Decomposition indicated that OC input and NDVI were the dominant factors at field scale, whereas the remaining covariates determined the distribution of SOC at the scale of the entire Walloon region. The SOC map showed an overall northwest to southeast trend i.e. an increase in SOC contents up to the Ourthe river followed by a decrease further to the South. The map shows both regional trends in SOC and effects of differences in land use and/or management (including crop rotation and frequency of cover crops) between individual fields. The field-scale map can be used as a benchmark and reference to farmers and agencies in maintaining SOC contents at an appropriate level and optimizing decisions for sustainable land use.
... The GBM has been applied in many areas and used to tackle various statistical machine-learning challenges (Bissacco et al., 2007;Hutchinson et al., 2011;Pittman & Brown, 2011;Johnson & Zhang, 2014). Additionally, as pointed out in the introduction, this method had been implemented in GS for the prediction of continuous traits in plant breeding for maize phenotypic traits (Li et al., 2018) and in animal science for body weight phenotypes of Brahman cattle (Westhues et al., 2021), as well as for the prediction of complex phenotypes in outbred mice (Perez et al., 2022). ...
Full-text available
Genomic selection (GS) is a predictive methodology that is changing plant breeding. Genomic selection trains a statistical machine‐learning model using available phenotypic and genotypic data with which predictions are performed for individuals that were only genotyped. For this reason, some statistical machine‐learning methods are being implemented in GS, but in order to improve the selection of new genotypes early in the prediction process, the exploration of new statistical machine‐learning algorithms must continue. In this paper, we performed a benchmarking study between the Bayesian threshold genomic best linear unbiased predictor model (TGBLUP; popular in GS) and the gradient boosting machine (GBM). This comparison was done using four real wheat (Triticum aestivum L.) data sets with categorical traits measured in terms of two metrics: the proportion of cases correctly classified (PCCC) and the Kappa coefficient in the testing set. Under 10 random partitions with four different sizes of testing proportions (20, 40, 60, and 80%), we compared the two algorithms and found that in three of the four data sets, the GBM outperformed the TGBLUP model in terms of both metrics (PCCC and Kappa coefficient). In the larger data sets (Data Sets 3 and 4), the gain in terms of prediction accuracy of the GBM was considerably significant. For this reason, we encourage more research using the GBM in GS to evaluate its virtues in terms of prediction performance in the context of GS. Genomic‐enabled prediction was used for categorical traits to capture data patterns in different environments. Two different genome‐based models were used for predicting categorical traits. Genome‐based prediction with genotype × environment interaction was used.
... In recent years, the use of spatial planning has emerged as an essential tool to support decision-making in coastal and marine systems vulnerable to rapid changes (Sousa et al., 2011;Halpern et al., 2012;Collie et al., 2013;Pittman and Brown, 2011;Lagabrielle et al., 2018). The identification and development of targeted resource zones can help reduce user conflict, improve restoration and management outcomes, integrate variable and future environmental conditions, and help strategically develop sustainable restoration strategies (i.e., Agardy et al., 2011: Moura et al., 2013Pinto and Martins, 2013). ...
Eastern oysters (Crassostrea virginica) are a critical ecological and commercial resource in the northern Gulf of Mexico facing changing environmental conditions from river management and climate change. In Louisiana, USA, development of restored reefs, and off-bottom aquaculture would benefit from the identification of locations supportive of sustainable oyster populations (i.e., metapopulations) and high consistent production. This study defines four oyster resource zones across coastal Louisiana based on environmental conditions known to affect oyster survival, growth, and reproduction. Daily data from 2015 to 2019 were interpolated to generate salinity and temperature profiles across Louisiana's estuaries, which were then used to classify zones based on monthly and annual salinity mean and variance. Zones were classified as supportive of (1) broodstock sanctuary reefs (i.e., support reproductive populations), (2) productive reefs during dry (salty) years, (3) productive reefs during wet (fresh) years, and (4) off-bottom aquaculture development. Of the 38,000 km² investigated, over 11,000 km² of potential oyster zone area was identified across the Louisiana coast. The Broodstock Sanctuary Zone was the smallest (∼540 km²), as salinity variance limited this zone in many areas, as it is driven largely by riverine inputs across many estuaries. Located up-estuary (Dry Restoration Zone) and down-estuary (Wet Restoration Zone) of the Broodstock Sanctuary Zone, Dry and Wet Restoration Zone areas covered ∼2400 km² and ∼3900 km², respectively. Mapped reefs in Louisiana currently exist largely within the Dry Restoration zones, suggesting a potential strategy to focus reef development in Wet Restoration zones to ensure reef network sustainability through years with high precipitation and river inflow. The off-bottom Aquaculture Zone was the largest (∼6400 km²) zone identified, with much of this area located more down-estuary and off-shore. Accounting for variable water quality conditions enables the development of a network of reefs resilient to environmental variability, and more stable areas for consistent off-bottom aquaculture production. Spatial planning and identification of oyster resource zones reduces focus on individual reef success and supports management of oyster metapopulation outcomes, while identifying zones supportive of off-bottom aquaculture.
Full-text available
In this study, leaf area prediction models of Dendrobium nobile, were developed through machine learning (ML) techniques including multiple linear regression (MLR), support vector regression (SVR), gradient boosting regression (GBR), and artificial neural networks (ANNs). The best model was tested using the coefficient of determination (R2), mean absolute errors (MAEs), and root mean square errors (RMSEs) and statistically confirmed through average rank (AR). Leaf images were captured through a smartphone and ImageJ was used to calculate the length (L), width (W), and leaf area (LA). Three orders of L, W, and their combinations were taken for model building. Multicollinearity status was checked using Variance Inflation Factor (VIF) and Tolerance (T). A total of 80% of the dataset and the remaining 20% were used for training and validation, respectively. KFold (K = 10) cross-validation checked the model overfit. GBR (R2, MAE and RMSE values ranged at 0.96, (0.82–0.91) and (1.10–1.11) cm2) in the testing phase was the best among the ML models. AR statistically confirms the outperformance of GBR, securing first rank and a frequency of 80% among the top ten ML models. Thus, GBR is the best model imparting its future utilization to estimate leaf area in D. nobile.
Full-text available
The area under the receiver operating characteristic (ROC) curve, known as the AUC, is currently considered to be the standard method to assess the accuracy of predictive distribution models. It avoids the supposed subjectivity in the threshold selection process, when continuous probability derived scores are converted to a binary presence–absence variable, by summarizing overall model performance over all possible thresholds. In this manuscript we review some of the features of this measure and bring into question its reliability as a comparative measure of accuracy between model results. We do not recommend using AUC for five reasons: (1) it ignores the predicted probability values and the goodness-of-fit of the model; (2) it summarises the test performance over regions of the ROC space in which one would rarely operate; (3) it weights omission and commission errors equally; (4) it does not give information about the spatial distribution of model errors; and, most importantly, (5) the total extent to which models are carried out highly influences the rate of well-predicted absences and the AUC scores.
Full-text available
In June 2000, the National Ocean Service and University of Puerto Rico initiated a long-term reef-fish-monitoring program in La Parguera, Puerto Rico. Objectives of this ongoing work are to: 1) develop spatially-explicit estimates of reef fish habitat utilization patterns to aid in defining essential habitats, and 2) provide a quantitative and ecologically sound foundation to delineate marine reserve boundaries. Central to this effort are recently completed digital and georeferenced benthic habitat maps for the near-shore waters of Puerto Rico. The GIS-based map served as a framework for development of a spatially stratified reef-fish-monitoring program across the shelf. Simultaneous collections of fish size and abundance data, and micro-scale habitat distribution and quality data were taken along a 25 x 4 m transect for each monitoring station. Sampling included coral reef, mangrove, and seagrass habitats within three cross-shelf zones unique to the insular shelf of La Parguera (inner lagoon, outer lagoon, and bank-shelf). A total of 106 stations were surveyed during the first year of sampling. Over 50,000 fishes, representing 123 species and 36 families were counted. Analyses showed clear patterns of habitat utilization across the seascape, and ontogenetic shifts in habitat selection within some species. Results also indicated that habitat type was more important than cross-shelf location in determining spatial patterns among reef fishes in the study area. Mesoscale spatially-explicit logistic models were developed to estimate distribution and expected density of some species among habitats.
Full-text available
Many of the most abundant fish species using mangroves in the Caribbean also use other habitat types through daily home range movements and ontogenetic habitat shifts. Few studies, however, have considered the structure of the surrounding seascape when explaining the spatial distribution of fish within mangroves. This study develops an exploratory seascape approach using the geographical location of mangroves and the structure of the surrounding seascape at multiple spatial scales to explain the spatial patterns in fish density and number of species observed within mangroves of SW Puerto Rico. Seascape structure immediately surrounding mangroves was most influential in determining assemblage attributes and the density of juvenile Haemulon flavolineatum, which were significantly higher in mangroves with high seagrass cover (>40%) in close proximity (< 100 m) than mangroves with low (<40%) or no adjacent seagrasses. Highest mean density of juvenile Ocyurus chrysurus was found in offshore mangroves, with high seagrass and coral reef cover >40 and >15%, respectively) in close proximity (<100 m). In contrast, juvenile Lutjanus griseus responded at much broader spatial scales, and with highest density found in extensive onshore mangroves with a large proportion (> 40%) of seagrass within 600 m of the mangrove edge. We argue that there is an urgent need to incorporate information on the influence of seascape structure into a wide range of marine resource management activities, such as the identification and evaluation of critical or essential fish habitat, the placement of marine protected areas and the design of habitat restoration projects.
Function estimation/approximation is viewed from the perspective of numerical optimization iti function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent "boosting" paradigm is developed for additive expansions based on any fitting criterion. Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such "TreeBoost" models are presented. Gradient boosting of regression trees produces competitives highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.
Some decades ago Margalef speculated that study of the exchanges across boundaries that separate different types of ecological systems would provide significant insights about properties and processes within the units that make up ecological mosaics. Although such boundaries might be difficult to define, it seemed likely that such exchanges among units would influence the function and structure of the adjoined systems. In this paper we explore exchanges across such ecological boundaries in coastal ecosystems in Cape Cod, Massachusetts, and elsewhere. We find that, indeed, definition of such boundaries is ambiguous, but study of the exchanges is more useful. In the Cape Cod system, water transport down-gradient is the dominant mechanism exerting influence on down-gradient systems. The direction of ecological control across such boundaries is largely asymmetrical, and properties of up-gradient units exert significant influence on down-gradient units. General properties of donor and receptor parcels are hard to discern, but clearly, parcels making up an ecological mosaic are not independent units, but are coupled by transfers from upgradient tesserae. Studies of controls of ecological systems need to include inter-unit influences as well as internal mechanisms.
IntroductionSummary Measures of Goodness-of-FitLogistic Regression DiagnosticsAssessment of Fit via External ValidationInterpretation and Presentation of the Results from a Fitted Logistic Regression ModelExercises
The area under the receiver operating characteristic (ROC) curve, known as the AUC, is currently considered to be the standard method to assess the accuracy of predictive distribution models. It avoids the supposed subjectivity in the threshold selection process, when continuous probability derived scores are converted to a binary presence-absence variable, by summarizing overall model performance over all possible thresholds. In this manuscript we review some of the features of this measure and bring into question its reliability as a comparative measure of accuracy between model results. We do not recommend using AUC for five reasons: (1) it ignores the predicted probability values and the goodness-of-fit of the model; (2) it summarises the test performance over regions of the ROC space in which one would rarely operate; (3) it weights omission and commission errors equally; (4) it does not give information about the spatial distribution of model errors; and, most importantly, (5) the total extent to which models are carried out highly influences the rate of well-predicted absences and the AUC scores.