Andrew O. Finley’s research while affiliated with Michigan State University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (191)


Figure 2: Analysis of spatial effect w. (Left): Posterior 95% credible interval widths for w from simulated datasets. (Right): Error in estimates w * for w calculated as w * − w.
Figure 3: Posterior samples for w * (s) minus true spatial effect w(s), ∆(s) = w * (s) − w(s). The 2.5, 50, and 97.5 percentiles of ∆(s) are shown in each column as ∆(s) 2.5 , ∆(s) 50 , and ∆(s) 97.5 , respectively.
Figure 6: (Top): Predicted biomass values for all GEDI locations. (Bottom): Standard error of predicted biomass values.
Figure A4: Model comparison for GEDI holdout data. (Left): Predicted biomass credible interval widths for cNNGP and NNGP. (Right): Error in biomass estimates calculated as predicted (posterior mean) minus true biomass values.
Clustering the Nearest Neighbor Gaussian Process
  • Preprint
  • File available

January 2025

·

6 Reads

Ashlynn Crisp

·

·

Andrew O. Finley

Gaussian processes are ubiquitous as the primary tool for modeling spatial data. However, the Gaussian process is limited by its O(n3)\mathcal{O}(n^3) cost, making direct parameter fitting algorithms infeasible for the scale of modern data collection initiatives. The Nearest Neighbor Gaussian Process (NNGP) was introduced as a scalable approximation to dense Gaussian processes which has been successful for n106n\sim 10^6 observations. This project introduces the clustered Nearest Neighbor Gaussian Process\textit{clustered Nearest Neighbor Gaussian Process} (cNNGP) which reduces the computational and storage cost of the NNGP. The accuracy of parameter estimation and reduction in computational and memory storage requirements are demonstrated with simulated data, where the cNNGP provided comparable inference to that obtained with the NNGP, in a fraction of the sampling time. To showcase the method's performance, we modeled biomass over the state of Maine using data collected by the Global Ecosystem Dynamics Investigation (GEDI) to generate wall-to-wall predictions over the state. In 16% of the time, the cNNGP produced nearly indistinguishable inference and biomass prediction maps to those obtained with the NNGP.

Download

Model estimated mean carbon by county and year.
(a) Model estimated carbon trend (Mg/ha/yr). (b) Counties from (a) that have 95% credible intervals that exclude zero. (c) Model estimated carbon trend (Tg/yr). (d) Counties from (c) that have 95% credible intervals that exclude zero.
Direct and model carbon estimates along with TCC values for counties with the greatest negative and positive values given in table 1. County and year sample size used for the direct is given across the top of each subpanel. Estimate means and medians are shown as points with 95% confidence and credible interval bars for direct and model estimates, respectively. When sample size is ⩽2 or all observations are zero, the direct estimate is not available, e.g. Escambia, Alabama in 2012 and Lake, California in 2008, respectively.
Toward spatio-temporal models to support national-scale forest carbon monitoring and reporting

December 2024

·

46 Reads

Elliot S Shannon

·

Andrew O Finley

·

·

[...]

·

National forest inventory (NFI) programs provide vital information on forest parameters’ status, trend, and change. Most NFI designs and estimation methods are tailored to estimate status over large areas but are not well suited to estimate trend and change, especially over small spatial areas and/or over short time periods (e.g. annual estimates). Fine-scale space-time indexed estimates are critical to a variety of environmental, ecological, and economic monitoring efforts. In the United States, for example, NFI data are used to estimate forest carbon status, trend, and change to support national, state, and local user group needs. Increasingly, these users seek finer spatial and temporal scale estimates to evaluate existing land use policies and management practices, and inform future activities. Here we propose a spatio-temporal Bayesian small area estimation modeling framework that delivers statistically valid estimates with complete uncertainty quantification for status, trend, and change. The framework accommodates a variety of space and time dependency structures, and we detail model configurations for different settings. The proposed framework is used to quantify forest carbon dynamics at an annual county-level across a 14 year period for the contiguous United States. Also, using an analysis of simulated data, we compare the proposed framework with traditional NFI estimators and offer computationally efficient algorithms, software, and data to reproduce results for benchmarking.



Calibrating Satellite Maps With Field Data for Improved Predictions of Forest Biomass

November 2024

·

1 Read

·

2 Citations

Environmetrics

Spatially explicit quantification of forest biomass is important for forest‐health monitoring and carbon accounting. Direct field measurements of biomass are laborious and expensive, typically limiting their spatial and temporal sampling density and therefore the precision and resolution of the resulting inference. Satellites can provide biomass predictions at a far greater density, but these predictions are often biased relative to field measurements and exhibit heterogeneous errors. We developed and implemented a coregionalization model between sparse field measurements and a predictive satellite map to deliver improved predictions of biomass density at a 1 resolution throughout the Pacific states of California, Oregon and Washington. The model accounts for zero‐inflation in the field measurements and the heterogeneous errors in the satellite predictions. A stochastic partial differential equation approach to spatial modeling is applied to handle the magnitude of the satellite data. The spatial detail rendered by the model is much finer than would be possible with the field measurements alone, and the model provides substantial noise‐filtering and bias‐correction to the satellite map.


Model-assisted estimation of domain totals, areas, and densities in two-stage sample survey designs

August 2024

·

29 Reads

·

1 Citation

Model-assisted, two-stage forest survey sampling designs provide a means to combine airborne remote sensing data, collected in a sampling mode, with field plot data to increase the precision of national forest inventory estimates, while maintaining important properties of design-based inventories, such as unbiased estimation and quantification of uncertainty. In this study, we present a comprehensive set of model-assisted estimators for domain-level attributes in a two-stage sampling design, including new estimators for densities, and compare the performance of these estimators with standard poststratified estimators. Simulation was used to assess the statistical properties (bias, variability) of these estimators, with both simple random and systematic sampling configurations, and indicated that (1) all estimators were generally unbiased and (2) the use of lidar in a sampling mode increased the precision of the estimators at all assessed field sampling intensities, with particularly marked increases in precision at lower field sampling intensities. Variance estimators are generally unbiased for model-assisted estimators without poststratification, while variance estimators for model-assisted estimators with poststratification were increasingly biased as field sampling intensity decreased. In general, these results indicate that airborne remote sensing data, collected as an intermediate level of sampling, can be used to increase the efficiency of national forest inventories in remote regions.


Toward improved uncertainty quantification in predictions of forest dynamics: A dynamical model of forest change

July 2024

·

53 Reads

Models of forest dynamics are an important tool to understand and predict forest responses to global change. Despite recent model development, predictions of forest dynamics under global change remain highly variable reflecting uncertainty in future conditions, forest demographic processes, and the data used to parameterize and validate models. Quantifying this uncertainty and accounting for it when making adaptive management decisions is critical to our ability to conserve forest ecosystems in the face of rapidly changing conditions. Dynamical spatiotemporal models (DSTMs) are a particularly powerful tool in this setting given they quantify and partition uncertainty in demographic models and noisy forest observations, propagate uncertainty to predictions of forest dynamics, and support refinement of predictions based on new data and improved ecological understanding. A major challenge to the application of DSTMs in applied forest ecology has been the lack of a scalable, theoretical model of forest dynamics that generates predictions at the stand level—the scale at which management decisions are made. We address this challenge by integrating a matrix projection model motivated by the well-known McKendrick-von Foerster partial differential equation for size-structured population dynamics within a Bayesian hierarchical DSTM informed by continuous forest inventory data. The model provides probabilistic predictions of species-specific demographic rates and changes in the size-species distribution over time. The model is applied to predict long-term dynamics (60+ years) within the Penobscot Experimental Forest in Maine, USA, quantifying and partitioning uncertainty in inventory observations, process-based predictions, and model parameters for nine Acadian Forest species. We find that uncertainty in inventory observations drives variability in predictions for most species and limits the inclusion of ecological detail within the DSTM. We conclude with a discussion of how DSTMs can be used to reduce uncertainty in predictions of forest dynamics under global change through informed model refinement and the assimilation of multiple forest data sources.


Spatio-temporal areal models to support small area estimation: An application to national-scale forest carbon monitoring

July 2024

·

40 Reads

National Forest Inventory (NFI) programs can provide vital information on the status, trend, and change in forest parameters. These programs are being increasingly asked to provide forest parameter estimates for spatial and temporal extents smaller than their current design and accompanying design-based methods can deliver with desired levels of uncertainty. Many NFI designs and estimation methods focus on status and are not well equipped to provide acceptable estimates for trend and change parameters, especially over small spatial domains and/or short time periods. Fine-scale space-time indexed estimates are critical to a variety of environmental, ecological, and economic monitoring efforts. Estimates for forest carbon status, trend, and change are of particular importance to international initiatives to track carbon dynamics. Model-based small area estimation (SAE) methods for NFI and similar ecological monitoring data typically pursue inference on status within small spatial domains, with few demonstrated methods that account for spatio-temporal dependence needed for trend and change estimation. We propose a spatio-temporal Bayesian model framework that delivers statistically valid estimates with full uncertainty quantification for status, trend, and change. The framework accommodates a variety of space and time dependency structures, and we detail model configurations for different settings. Through analysis of simulated datasets, we compare the relative performance of candidate models and a traditional direct estimator. We then apply candidate models to a large-scale NFI dataset to demonstrate the utility of the proposed framework for providing unique quantification of forest carbon dynamics in the contiguous United States. We also provide computationally efficient algorithms, software, and data to reproduce our results and for benchmarking.


Calibrating satellite maps with field data for improved predictions of forest biomass

July 2024

·

11 Reads

Spatially explicit quantification of forest biomass is important for forest-health monitoring and carbon accounting. Direct field measurements of biomass are laborious and expensive, typically limiting their spatial and temporal sampling density and therefore the precision and resolution of the resulting inference. Satellites can provide biomass predictions at a far greater density, but these predictions are often biased relative to field measurements and exhibit heterogeneous errors. We developed and implemented a coregionalization model between sparse field measurements and a predictive satellite map to deliver improved predictions of biomass density at a 1-by-1 km resolution throughout the Pacific states of California, Oregon and Washington. The model accounts for zero-inflation in the field measurements and the heterogeneous errors in the satellite predictions. A stochastic partial differential equation approach to spatial modeling is applied to handle the magnitude of the satellite data. The spatial detail rendered by the model is much finer than would be possible with the field measurements alone, and the model provides substantial noise-filtering and bias-correction to the satellite map.


Spatial Prediction of Diameter Distributions for the Alpine Protection Forests in Ebensee, Austria, Using ALS/PLS and Spatial Distributional Regression Models

June 2024

·

55 Reads

·

1 Citation

A novel Bayesian spatial distributional regression model is presented to predict forest structural diversity in terms of the distributions of the stem diameter at breast height (DBH) in the protection forests in Ebensee, Austria. The distributional regression approach overcomes the limitations and uncertainties of traditional regression modeling, in which the conditional mean of the response is regressed against explanatory variables. The distributional regression addresses the complete conditional response distribution, instead. In total 36,338 sample trees were measured via a handheld mobile personal laser scanning system (PLS) on 273 sample plots each having a 20 m radius. Recent airborne laser scanning (ALS) data were used to derive regression covariates from the normalized digital vegetation height model (DVHM) and the digital terrain model (DTM). Candidate models were constructed that differed in their linear predictors of the two gamma distribution parameters. In the distributional regression approach, covariates can enter the model in a flexible form, such as via nonlinear smooth curves, cyclic smooths, or spatial effects. Supported by Bayesian diagnostics DIC and WAIC, nonlinear smoothing splines outperformed linear parametric slope coefficients, and the best implementation of spatial structured effects was achieved by a Gaussian process smooth. Model fitting and posterior parameter inference was achieved by using full Bayesian methodology and MCMC sampling algorithms implemented in the R-package BAMLSS. With BAMLSS, spatial interval predictions of the DBH distribution at any new geo-locations were enabled via straightforward access to the posterior predictive distributions of the model terms and by offering simple plug-in solutions for new covariate values. A cross-validation analysis validated the robustness of the proposed method’s parameter estimation and out-of-sample prediction. Spatial predictions of stem count proportions per DBH classes revealed that regeneration of smaller trees was lacking in certain areas of the protection forest landscape. Therefore, the intensity of final felling needs to be increased to reduce shading from the dense, overmature shelter trees and to promote sunlight for the young regeneration trees.


Species‐specific effects of forest cover on density (a) and relationship between detection probability and distance from the observer (b) in the central Florida bird case study. Panel (a) shows the estimated mean (dark line), 50% credible interval (box), and 95% credible interval (whiskers) for the effect of forest cover on the overall community (COMM) and 16 individual species. In Panel (b), lines show the posterior mean detection probabilities for each species. The black line represents the average across the community (i.e. the community‐level effect), and the grey region is the associated 95% credible interval. See Supplemental Information S1 for species code definitions.
Data and predictions from the forest biomass case study. Panel (a) shows the observed locations of the 86,933 Forest Inventory and Analysis plots. Note these are the publicly available perturbed locations in which FIA adds a small amount of random noise to the true plot locations. Panel (b) shows the estimated random effect of tree canopy cover on forest biomass within distinct ecoregions. Panel (c) shows predicted biomass (posterior median) across the continental USA (tons per acre), with associated uncertainty (95% credible interval [CI] width) depicted in Panel (d).
spAbundance: An R package for single‐species and multi‐species spatially explicit abundance models

May 2024

·

1,614 Reads

·

5 Citations

Numerous modelling techniques exist to estimate abundance of plant and animal populations. The most accurate methods account for multiple complexities found in ecological data, such as observational biases, spatial autocorrelation, and species correlations. There is, however, a lack of user‐friendly and computationally efficient software to implement the various models, particularly for large data sets. We developed the spAbundance R package for fitting spatially explicit Bayesian single‐species and multi‐species hierarchical distance sampling models, N‐mixture models, and generalized linear mixed models. The models within the package can account for spatial autocorrelation using Nearest Neighbour Gaussian Processes and accommodate species correlations in multi‐species models using a latent factor approach, which enables model fitting for data sets with large numbers of sites and/or species. We provide three vignettes and three case studies that highlight spAbundance functionality. We used spatially explicit multi‐species distance sampling models to estimate density of 16 bird species in Florida, USA, an N‐mixture model to estimate black‐throated blue warbler (Setophaga caerulescens) abundance in New Hampshire, USA, and a spatial linear mixed model to estimate forest above‐ground biomass across the continental USA. spAbundance provides a user‐friendly, formula‐based interface to fit a variety of univariate and multivariate spatially explicit abundance models. The package serves as a useful tool for ecologists and conservation practitioners to generate improved inference and predictions on the spatial drivers of abundance in populations and communities.


Citations (53)


... It highlighted the need for local models and the integration of GEDI with Sentinel-1, ALOS-2 PALSAR-2, and Sentinel-2 data for accurate mapping. A coregionalization model was developed to combine sparse field data with satellite maps, enhancing biomass density predictions at a 1 km² resolution in the Pacific states of the USA, addressing zero-inflation and heterogeneous errors for better accuracy and spatial detail (67,68). ...

Reference:

A critical review of exploring the recent trends and technological advancements in forest biomass estimation
Calibrating Satellite Maps With Field Data for Improved Predictions of Forest Biomass
  • Citing Article
  • November 2024

Environmetrics

... In a novel approach, we used multi-species hierarchical distance sampling in a Bayesian framework with custom Markov Chain Monte Carlo samplers implemented in package spAbundance (Doser et al., 2023) to analyse the effect of pre-and post-disturbance management and post-disturbance forest succession on abundance patterns in bird communities. We considered only species with a minimum of 10 recorded individuals (Doser et al., 2023), which resulted in a species set of 37 species. ...

spAbundance: An R package for single‐species and multi‐species spatially explicit abundance models

... To meet these evolving user needs, estimation methods for inference on small areas-known as small area estimation (SAE)-have recently been applied to NFI data [9][10][11][12][13]. Here, 'small area' refers to any domain of interest that contains too few observations to deliver accurate direct (i.e. ...

Models to Support Forest Inventory and Small Area Estimation Using Sparsely Sampled LiDAR: A Case Study Involving G-LiHT LiDAR in Tanana, Alaska
  • Citing Article
  • March 2024

Journal of Agricultural Biological and Environmental Statistics

... More broadly, these phenomena result in residual spatiotemporal variability (non-stationarity) in the occurrences and abundances of marine organisms that is unexplained by local environmental conditions but can nonetheless be associated with indices such as the El Niño Southern Oscillation (ENSO), AMO, or sea ice extent. Where sufficient data exist (Doser et al. 2024), these effects can be approximated inside an expanded correlative SDM framework using spatially varying coefficients (SVCs, Hastie and Tibshirani 1993;Thorson 2019aThorson , 2019bThorson et al. 2023). Yet, few studies in marine or aquatic systems leverage SVCs ): In nearly 3000 SDM studies in aquatic ecosystems, we found only 10 (< 0.5%) employing SVCs to model species range shifts, yet 38 mention oceanographic indices and 100 mention biogeographic or movement barriers in their abstract or keywords. ...

Guidelines for the use of spatially varying coefficients in species distribution models

Global Ecology and Biogeography

... For example, in hidden Markov models (HMMs; Rabiner, 1989), the observables are indeed independent conditionally on the latent indicators, but the latent indicators themselves form a Markov chain: the probability of the current state depends on the previous state(s). Other types of mixture models might exhibit other kinds of dependencies between observables or states -depending on the exact nature of these dependencies, other forms of factorizations might be possible (for examples, see Ambroise, Dang, & Govaert, 1997;Hadj-Amar et al., 2023;May, Finley, & Dubayah, 2024;Samé, 2020). In any case, estimating mixture parameters and mixture memberships requires evaluating the likelihood density. ...

A Spatial Mixture Model for Spaceborne Lidar Observations Over Mixed Forest and Non-forest Land Types
  • Citing Article
  • January 2024

Journal of Agricultural Biological and Environmental Statistics

... Additionally, spatially varying coefficient (SVC) models have been applied to reveal ecological relationships with spatial variations among different mammalian species (Pease et al. 2022). These methods have been successfully used to evaluate spatially varying trends for forest birds, assess the spatially heterogeneous impact of land cover on grasshopper sparrow occurrence, and quantify the probability of occurrence of grassland bird species in the United States (Doser et al. 2024a(Doser et al. , 2024b. SVC models, which are similar to geographically weighted models, are used to construct a series of local spatially varying models to handle spatial heterogeneity. ...

Modeling Complex Species-Environment Relationships Through Spatially-Varying Coefficient Occupancy Models

Journal of Agricultural Biological and Environmental Statistics

... Finally, we addressed the problem of geolocation error of GEDI measurements, which some studies consider to be the main factor affecting GEDI accuracy (Milenković et al., 2017;Roy et al., 2021), by subsetting analysed GEDI data by preprocessed forest subcompartments and collocating them with simulated waveforms. However, methods to minimize geolocation error require precise ALS point clouds Shannon et al., 2024) or high-resolution DTM (Schleich et al., 2023), which limits their application to areas where such data is available. An obvious limitation of the application of GEDI forest height measurements is mission coverage, which is confined between 51.6°N and 51.6°S latitudes meaning that wide areas of boreal forests on northern hemisphere are not covered by GEDI data (e.g. ...

Quantifying and correcting geolocation error in spaceborne LiDAR forest canopy observations using high spatial accuracy data: A Bayesian model approach

Environmetrics

... The methodology for the simulation study is presented in section S2 and is based on a population generated using fixed and known parameters that mimic qualities of the FIA annual county-level carbon data. The simulation is designed to evaluate estimator performance at the county-level, which is the level of inferential interest (rather than at the unit-level as pursued in other related studies [51]). Direct and model estimates for carbon status, trend, and change are generated from each of a large number of independent samples taken from the simulated population. ...

An approach to estimating forest biomass while quantifying estimate uncertainty and correcting bias in machine learning maps
  • Citing Article
  • September 2023

Remote Sensing of Environment

... More technically, eDNA sampling makes it more feasible to collect multiple sample replicates, which would allow combining a JSDM with a detection model to account for observation error (Guillera-Arroita et al. 2017, Tobler et al. 2019, Doser et al. 2023, Diana et al. 2024, Hartig et al. 2024. Also, only two species in our dataset showed strong signals of dispersal limitation (Fig. 2), but this low number could be because near-neighbour ponds were not sampled in our dataset, removing the possibility of detecting fine-scale spatial autocorrelation and thereby possibly reducing the relative importance of dispersal that would support source-sink relations among closely adjacent ponds. ...

Joint species distribution models with imperfect detection for high‐dimensional spatial data

... Table A1 in Appendix B also shows that, even for large offsets in each direction, the obtained R 2 is still low. Moreover, Shannon et al. (2022) [62] found that extending the limitation from 10 m to 20 m improves the geolocation's accuracy. This suggests that, for even better co-location, a higher maximum offset should be considered. ...

Quantifying and correcting geolocation error in sampling LiDAR forest canopy observations using high spatial accuracy ALS: A case study involving GEDI