Article

How to Evaluate Models: Observed vs. Predicted or Predicted vs. Observed?

Authors:
  • Facultad de Agronomía, Universidad de Buenos Aires- IFEVA- CONICET, and Facultad de Agronomía, Universidad de la República, Uruguay
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

A common and simple approach to evaluate models is to regress predicted vs. observed values (or vice versa) and compare slope and intercept parameters against the 1:1 line. However, based on a review of the literature it seems to be no consensus on which variable (predicted or observed) should be placed in each axis. Although some researchers think that it is identical, probably because r2 is the same for both regressions, the intercept and the slope of each regression differ and, in turn, may change the result of the model evaluation. We present mathematical evidence showing that the regression of predicted (in the y-axis) vs. observed data (in the x-axis) (PO) to evaluate models is incorrect and should lead to an erroneous estimate of the slope and intercept. In other words, a spurious effect is added to the regression parameters when regressing PO values and comparing them against the 1:1 line. Observed (in the y-axis) vs. predicted (in the x-axis) (OP) regressions should be used instead. We also sh

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The percentage contribution (2), referred to as the contribution ratio or ρ, is calculated by dividing the total sum of squared deviations (SS P ) by the total sum of squares (SS T ) and determines the impact of a particular factor on the process being tested 16 : ...
... The predicted AUDPC values were linearly regressed against the AUDPC values from the original experiments. The fitted model, statistical significance (P = 0.05) and R 2 are presented, with the original (explanatory variable) values on the y-axis and the predicted (response variable) values on the x-axis, as advocated in Piñeiro et al. (2008). 16 Linear regression and ANOVA were done using Genstat 22nd edition (VSN International Ltd, UK). ...
... The fitted model, statistical significance (P = 0.05) and R 2 are presented, with the original (explanatory variable) values on the y-axis and the predicted (response variable) values on the x-axis, as advocated in Piñeiro et al. (2008). 16 Linear regression and ANOVA were done using Genstat 22nd edition (VSN International Ltd, UK). ...
Article
Full-text available
BACKGROUND Identifying robust integrated pest management (IPM) strategies requires the testing of multiple factors at the same time and assessing their combined effects e.g., on disease control. This makes field‐based experiments large, resource intensive and expensive. Hence, there are limits to the number of treatment combinations that can be practically tested under field conditions. Taguchi approach to design of experiments (DOE) or the Taguchi approach is commonly employed to enhance the quality of industrial products. It uses smaller experiments than classical DOE but its applicability to late blight research, and agricultural research, has not been widely evaluated. RESULTS Two existing datasets, following the same protocol and investigating the effectiveness of different IPM treatments to control late blight, caused by Phytophthora infestans, on potato, were used to test the Taguchi approach. Disease severity was quantified as area under the disease progress curve (AUDPC). The method could accurately predict the performance of a cultivar and fungicide‐based integrated disease management strategy from a small dataset and identified cultivar as a key factor for disease control. Linear regression demonstrated a strong and statistically significant relationship between AUDPC values collected during the original experiments and the predicted disease severity values generated using the Taguchi method. CONCLUSIONS The Taguchi approach can accurately predict disease severity, with predicted values similar to those collected during the original experiments. Moreover, associated analyses identified the most effective treatment combinations and the factors that exert the greatest influence on disease control. The relevance of this approach when designing and interpreting IPM strategies is discussed. © 2025 Society of Chemical Industry.
... Most often, a common approach to measure the performance of a quantitative model is to plot the scatter diagram of predicted and observed values, and fit them using a simple linear regression model Y observed = aY predicted + b and then compare slope and intercept parameters with the 1:1 line (Piñeiro et al., 2008). In this simple linear regression, if the least squares method is used for parameter estimation, the square of the PCC value between the independent variable and dependent variable (corresponding to the predicted values and observed values in the original quantitative prediction model, respectively) is exactly equal to the R 2 score of this simple linear regression model (not the R 2 score of the original quantitative prediction model). ...
... For example, in crop breeding scenarios, the THR@P%, or BHR@P% may be more suitable for measuring the performance of the model compared to PCC, as breeders are more concerned with how to select the top-K individuals or eliminate the bottom-K individuals. It is recommended to employ a combination of multiple metrics such as MAE, RMSE, R² score, NDCG and root mean squared deviation (RMSD) (Piñeiro et al., 2008) rather than just using the PCC as a sole metric to assess the accuracy of a quantitative trait prediction model. In addition, the Bland-Altman method (Bland and Altman, 2003) and visual assessment such as scatter plot of predicted and observed values are also valuable supplement for evaluating the accuracy of the model (Piñeiro et al., 2008). ...
... It is recommended to employ a combination of multiple metrics such as MAE, RMSE, R² score, NDCG and root mean squared deviation (RMSD) (Piñeiro et al., 2008) rather than just using the PCC as a sole metric to assess the accuracy of a quantitative trait prediction model. In addition, the Bland-Altman method (Bland and Altman, 2003) and visual assessment such as scatter plot of predicted and observed values are also valuable supplement for evaluating the accuracy of the model (Piñeiro et al., 2008). To improve the operability in practical applications, the clear guidance and detailed steps on how to select and apply evaluation metrics in several typical scenarios are provided (Supplementary Table S42). ...
Article
Full-text available
How to evaluate the accuracy of quantitative trait prediction is crucial to choose the best model among several possible choices in plant breeding. Pearson’s correlation coefficient (PCC), serving as a metric for quantifying the strength of the linear association between two variables, is widely used to evaluate the accuracy of the quantitative trait prediction models, and generally performs well in most circumstances. However, PCC may not always offer a comprehensive view of predictive accuracy, especially in cases involving nonlinear relationships or complex dependencies in machine learning-based methods. It has been found that many papers on quantitative trait prediction solely use PCC as a single metric to evaluate the accuracy of their models, which is insufficient and limited from a formal perspective. This study addresses this crucial issue by presenting a typical example and conducting a comparative analysis of PCC and nine other evaluation metrics using four traditional methods and four machine learning-based methods, thereby contributing to the improvement of practical applicability and reliability of plant quantitative trait prediction models. It is recommended to employ PCC in conjunction with other evaluation metrics in a targeted manner based on specific application scenarios to reduce the likelihood of drawing misleading conclusions.
... These stations were selected because they had comparatively more extensive data coverage for the study period, ensuring a representative sample across various agroclimatic zones. The validation process involved comparing the satellite data with ground-based observations using statistical metrics such as root mean square error (RMSE), mean error (ME), coefficient of determination (R 2 ), and Nash-Sutcliffe efficiency coefficient (NSE) (Hodson 2022;Piñeiro et al. 2008;Singh and Chowdhury 2007). R "Metrics" and "hydroGOF" packages (Hamner and Frasco 2018;Zambrano-Bigiarini 2020) were used to apply and carry out various statistical methods. ...
... R "Metrics" and "hydroGOF" packages (Hamner and Frasco 2018;Zambrano-Bigiarini 2020) were used to apply and carry out various statistical methods. According to Piñeiro et al. (2008), the following equation is used to calculate the coefficient of determination (R 2 ): ...
Article
Full-text available
This research investigates rainfall variability and drought patterns in West Africa and their consequential impacts on rainfed agriculture, with a particular focus on vulnerability linked to weather extremes Utilizing NASA POWER/Agro-climatology data, cross-validated against observed meteorological records in the targeted countries, this study spans the years 1981 to 2021, with a particular focus on Ghana and Burkina Faso. The Standardized Precipitation Index (SPI), Standardized Precipitation Evapotranspiration Index (SPEI), and different statistical methods were employed to evaluate the variations in rainfall, including intensity and frequency, as well as analyze drought patterns in the study areas. Despite increased rainfall in the last decade, seasonal and decadal shifts have been noticed, and drought and irregular patterns still threaten the study areas. Temporal analysis reveals fluctuations in temperature and rainfall. SPI and SPEI results indicated a decline in drought frequency, aligned with global trends, though the monthly scale showed no evident decline. The spatial analysis highlights regional variations in rainfall and drought dynamics. The study emphasizes the importance of region-specific mitigation and adaptation strategies, providing valuable insights for informed decision-making in West Africa's agriculture and water resource management under climate change. The findings underscore the continued threat of irregular rainfall patterns and drought, emphasizing the need for tailored approaches to address these challenges.
... Evaluating model predictions is very important in researches evaluation. Pineiro et al. (2008) have analyze how to correctly use regression analysis to evaluate model accuracy. In this paper, we follow Pineiro et al. (2008) suggestion which construct Observed vs predicted regression model to execute model evaluation. ...
... Pineiro et al. (2008) have analyze how to correctly use regression analysis to evaluate model accuracy. In this paper, we follow Pineiro et al. (2008) suggestion which construct Observed vs predicted regression model to execute model evaluation. The summary of parameters and metric values is given in Table 4. ...
Article
Full-text available
Objective: To investigate the nonlinear relationship between Gross Domestic Product (GDP) and Foreign Direct Investment (FDI) in Malaysia, with the aim of providing insights into their bidirectional interactions. Theoretical Framework: The main concepts and theories that underpin the research are nonlinear regression techniques and economic growth models. These frameworks provide a solid basis on understanding the dynamic interaction between GDP and FDI in the context of Malaysia’s economic. Method: The methodology comprises writing Scilab coding to analyze nonlinear regression models. Malaysia's economic data on GDP and FDI were utilized as inputs for the analysis. The study involves modeling the dynamics between these indicators and evaluating their relationship over time. Results and Discussion: The results obtained revealed a significant nonlinear relationship between GDP and FDI in Malaysia. These results are contextualized within the theoretical framework, highlighting the bidirectional nature of the relationship. Possible discrepancies and limitations, including data constraints and assumptions of the nonlinear models, are also considered. Research Implications: This research offers valuable insights for policymakers, helping shape economic planning,and investment strategies. It highlights the importance of FDI in promoting sustainable economic growth and industrial development towards United Nation SDG 8 and SDG 9. Originality/Value: This study contributes by using nonlinear regression techniques and Scilab programming to analyze the complex relationship between GDP and FDI, an approach not widely applied to Malaysia's context. The relevance and value of this research are evidenced by its potential impact on economic policy and its contributions to a deeper understanding of Malaysia's economic development.
... This reference line signifies an ideal scenario where the model's predictions and actual values perfectly coincide. When data points cluster closely around this line, it signifies the model's precision in making predictions [51]. In Figure 6, the predicted and actual compressive strength values derived from the MLR model are compared. ...
... To evaluate the spatial error structure of the models concerning uncertainty in variogram (Marchant and Lark 2004) across all scenarios, we divided our data into two parts during the model development: 70% for training and 30% for validation. We employed root mean squared error (RMSE) and mean absolute error (MAE) to validate the fitted model by comparing predicted and observed values (Piñeiro et al. 2008). RMSE measures the square root of the average squared differences between predicted and observed values. ...
Article
Full-text available
Understanding phosphate distribution and its interactions with heavy metals using multivariate and geostatistical analyses is essential for sustainable water resources management in semi-arid basins. This study aims to investigate the potential sources of phosphate and its relationship with heavy metals in shallow groundwater over sixty-three (63) sampling locations during wet, dry, and intermediate seasons (Physical parameters TDS, EC, DO, pH, and Turbidity) were analysed in situ using hand-held meters, while phosphates and heavy metals were analysed in the laboratory. Results were further analysed using Several Samples Anova (i.e., Kruskal–Wallis Test), Ordinary Kriging (OK), Correlation (Pearson’s (r)) and Principal Component Analysis (PCA). Results revealed a significant difference in pH level and dissolved oxygen (DO) concentration (p = ≤ 0.001). However, based on seasonal analysis, there is no significant difference in phosphate, EC, TDS, and turbidity (p = ≥ 0.001); DO and phosphate concentrations are above WHO and Nigeria's drinking water quality guidelines. Pearson’s Correlation (r) revealed a positive relationship between phosphate, vanadium, nickel, and cobalt. In contrast, phosphate correlates weakly and negatively with heavy metals (Zn, Cd, Ti, Ba, Cu, Ni, As, Co, Pb, Mn, Cr, and Al). The weak negative correlation between phosphate and Cr, Cd, and Zn suggests phosphate can increase metal immobilisation via precipitation as insoluble metal-phosphate compounds. The positive correlation between phosphate and heavy metals (Co and Ni) suggests competing geochemical processes. The PCA revealed that most of the variability in phosphate concentration is explained by the seasonality, which accounts for 88.196% of the variance. Shallow groundwater classification using ordinary kriging (OK) revealed poor-quality water. Phosphate concentrations exceed the WHO reference guideline (≤ 1.00 mg/l) value. Theoretically, this study has increased our understanding of how high phosphate concentrations correlate with heavy metals in shallow groundwater aquifers. This study’s results lay the foundation for future studies over broader geographical and temporal scopes, increasing coverage of water quality and sustainable drinking water quality management in semi-arid environments. Thus, a sustainable water quality management framework has been proposed to guide future assessment and effectively implement remediation strategies concerning water quality pollution remediation in semi-arid regions.
... The coefficient of variation (CV) of the predicted soil biogeochemistry attributes was calculated by 1000 parallel runs of Random Forest model. We followed the method from Piñeiro et al. 63 to validate modeling approach by returning the predicted values (y-axis) vs. the observed values (x-axis). All the gridded data were aggregated to 1°× 1°spatial resolution. ...
Article
Full-text available
Fires alter the stability of organic matter and promote soil erosion which threatens the fundamental coupling of soil biogeochemical cycles. Yet, how soil biogeochemistry and its environmental drivers respond to fire remain virtually unknown globally. Here, we integrate experimental observations and random forest model, and reveal significant divergence in the responses of soil biogeochemical attributes to fire, including soil carbon (C), nitrogen (N), and phosphorus (P) contents worldwide. Fire generally decreases soil C, has non-significant impacts on total N, while it increases the contents of inorganic N and P, with some effects persisting for decades. The impacts of fire are most strongly negative in cold climates, conifer forests, and under wildfires with high intensity and frequency. Our work provides evidence that fire decouples soil biogeochemistry globally and helps to identify high-priority ecosystems where critical components of soil biogeochemistry are especially unbalanced by fire, which is fundamental for the management of ecosystems in a world subjected to more severe, recurrent, and further-reaching wildfires.
... The average value of each metric was then calculated for both training and test datasets. According to Piñeiro et al. (2008), the coefficient of determination (R 2 ) and the significance of linear regression between measured (y) and predicted (x) N canopy content and were calculated. Furthermore, to assess the agreement between predicted (N) and observed (N) data, the root mean square deviation (RMSD) and the index of agreement (d) were calculated as follows: ...
Article
Full-text available
Remote sensing with Unpiloted Aerial Systems can provide information on the Nitrogen status of forage crops more quickly than destructive sampling techniques, which are not compliant with the need for fast and sustainable methodologies to support farmers' decisions on livestock feeding. The study aimed to assess a remote sensing algorithm based on the Canopy Chlorophyll Content Index (CCCI) and the Canopy Nitrogen Index (CNI) to predict the canopy N content of forage crops under Mediterranean rainfed conditions. A dataset from a two-year field experiment on four forage crops, as both pure stands and mixtures, under two different mowing intensities was used to calculate CNI from plant N concentration and aboveground biomass. Multispectral data from an Unpiloted Aerial System were collected during the two-year cropping system to calculate CCCI. The N canopy content was then predicted based on the relationship between CNI and CCCI. A good agreement (RMSD = 4.72 g m − 2 , d = 0.92; P < 0.001) between the predicted and observed N canopy content (g m − 2 of N) was found. The estimation of canopy N content improved under high cover of rigid ryegrass (RMSD = 5.56 g m − 2 , index of agreement = 0.95) and in frequently mowed plots. Overall, the agreement between observed and predicted N content improved under the threshold of 12.4 g m − 2. The N content of different forage crops can be predicted from the remote-sensed CCCI starting from N dilution curves. The prediction accuracy is influenced by the mowing intensity and the differences in the relative abundance of species, and it is limited over a threshold of N corresponding to a high biomass level. The results can represent a basis for developing decision support tools for livestock farmers for a real-time field estimation of the forage quality in extensively managed grasslands. Further insights are needed to assess the predictive ability in relation to the relative abundance of legumes in mixtures and above the saturation threshold.
... For both hypotheses, the residuals of the regression models were inspected visually to check homoscedasticity (residual plot) and their distribution (Q-Q plot and residual histogram). Following the suggestions of Piñeiro et al. (2008), the observed values were plotted against the predicted values using the intercept as an indicator of model bias. ...
Article
Full-text available
This study examined whether supermarkets can be considered patches in the marginal value theorem (MVT) sense despite their particular features and whether they are models of human food foraging in resource-dense conditions. On the basis of the MVT, the quantitative relationship between gains in the Euro and patch residence time was modeled as an exponential growth function toward an upper asymptote, allowing the choice of an optimal strategy under diminishing returns. N = 61 participants were interviewed about their current shopping trip and contextual variables at a German supermarket and provided data to estimate relevant model parameters. A nonlinear model of the patch residence time and resulting gain based on an exponential function was fitted via nonlinear orthogonal distance regression. The results generally revealed the relationships predicted by the model, with some uncertainty regarding the estimation of the upper asymptote due to a lack of data from participants with long residence times. Despite this limitation, the data support the applicability of the MVT-based model. The results show that approaches from optimal foraging theory, such as the MVT, can be used successfully to model human shopping behavior even when participants’ verbal reports are used.
... Here outliers were masked from the 0.95 quantile of the chi-square distribution with 11 degrees of freedom 69 . The modelling approach was then validated by returning the predicted values (x axis) versus the observed values (y axis 70 ). ...
Article
Full-text available
Dryland grazing sustains millions of people worldwide but, when poorly managed, threatens food security. Here we combine livestock and wild herbivore dung mass data from surveys at 760 dryland sites worldwide, representing independent measurements of herbivory, to generate high-resolution maps. We show that livestock and wild herbivore grazing is globally disconnected, and identify hotspots of herbivore activity across Africa, the Eurasian grasslands, India, Australia and the United States. Wild herbivore dung mass was negatively correlated with total organic nitrogen, yet strong site-level correlations exist between our livestock dung estimates and total soil organic nitrogen. Using dung mass as a proxy of herbivore abundance enables standardized, field-based measures of grazing pressure that account for different herbivore types. This can improve herbivore density modelling and guide better management practices for populations that rely on dryland-grazing livestock for food.
... In all cases, we evaluated the final prediction model by plotting the observed values on the Y-axis and the model-predicted values on the Xaxis, following the approach recommended by Piñeiro et al. (2008). Besides, residuals are plotted against fitted values to assess the model's fit and potential bias. ...
Article
Full-text available
Crown trait-based ecology has significantly advanced, yet a substantial knowledge gap remains regarding its predictive power on internal tree morphology and future growth potential across different species. Our research addressed this gap by examining how crown traits of various species manifest in temporal tree growth dynamics by answering three research questions (QI): Are there general relationships between crown structure and secondary growth across species? (QII) How do crown structure and secondary growth links vary among species and functional groups? (QIII) To what extent does a tree's crown structure explain the variation in secondary growth over different periods? We conducted a comprehensive study to answer these questions, utilizing high-resolution 3D crown structure data from terrestrial laser scanning and growth ring data from tree coring. Our research explored the relationship between crown structure and growth rings in six dominant tree species in Europe (Picea abies (L.) H. Karst., Pinus sylvestris L., Pseudotsuga menziesii (Mirbel) Franco, Larix decidua Mill., Fagus sylvatica L., and Quercus robur L.) across Germany and Spain. These species, representing diverse functional identities, provide a comprehensive spectrum for analysis, ensuring the robustness of our findings. Our findings demonstrated a consistent link between crown structure (explained by projection area, crown length, ratio, slenderness, top-heaviness, etc.) and secondary growth, varying across species and functional identities. Coniferous species (e.g., P. abies, P. menziesii, L. decidua) generally exhibited stronger associations between crown structure and tree ring width than broadleaf; in contrast, P. sylvestris and Q. robur demonstrated weaker predictive relationships due to their lower crown plasticity and more episodic growth. The current crown structure was a reliable predictor of tree secondary growth over decadal scales. However, stronger relationships emerged when considering the entire tree ring series, suggesting a long-term morphological legacy effect of crown structure. These results show the importance of understanding how specific traits related to different tree species and their functional roles impact tree structure and growth. By better understanding how trees adapt (plasticity) and how their past growth affects future development (crown legacy), we can manage environmental challenges more effectively. Therefore, this knowledge is valuable for improving forest management practices and offers useful insights for professionals working in forestry.
... Der Vergleich der mit den verschiedenen Formeln errechneten und tatsächlich gemessenen Lebendgewichten geschah durch Regressionsanalyse. Es wurde dabei getestet, inwieweit die Beziehung der beobachteten Werte zu den vorhergesagten Werte von der Steigung 1 abwich (observed vs. predicted regression nachPiñeiro & al. 2008). Weitere Details zu den Methoden sind in Weiss & Linde (2022) nachzulesen. ...
Conference Paper
Full-text available
Size-weight equations as an effective method to determine carabid biomass: An evaluation of existing equations, suggestions for further improvement and recommendations for practical application Insect biomass has been used as an ecological indicator in the past, but in recent years has become a key metric in the study of insect population trends, especially since the first reports about the so-called insect decline. However, directly measuring insect biomass can be methodologically challenging, labour intensive and, in the case of purely digital data, not possible; depending on the methods used, it can even introduce bias. Size-weight equations provide a straightforward method for estimating insect biomass when the insect's body length is known. Although, they are widely used in insect research, they have rarely been tested with independent data so far. We evaluated two size-weight equations for carabid beetles (Coleoptera:Carabidae), by Szyszko (1983) and Booij & al. (1994), drawing on independent data by comparing model predictions with actual measurements of biomass. We found that both models produced systematically biased results: Szyszko´s equation yielded more accurate results for larger species, while the equation of Booij & al. did so for smaller species. Moreover, we found that the inclusion of additional taxonomic parameters, in this case subfamily, could generally improve results. However, limited data availability currently does not permit the development of a taxonomically informed equation with practical applicability. Meanwhile, we recommend a combined use of both evaluated equations: Szyszko's for carabids ≥ 11.8 mm, and the equation of Booij & al. for carabids < 11.8 mm, respectively, and give further advice for the practical application of size-weight equations.
... The PUK kernel exhibits excellent flexibility and can serve as a versatile kernel for SVMs, outperforming common kernels such as linear and RBF kernels, in both classification and regression tasks 53 . The terminal nodes of M5P trees consist of linear regression functions, allowing M5P trees to generate continuous numerical values and be applicable to regression problems 52 . In addition to the success of the machine learning models, these results demonstrate the natural characteristics of the plum fruit, enabling the non-destructive determination of its weight based solely on its two-dimensional components. ...
Article
Full-text available
Plum fruit fresh weight (FW) estimation is crucial for various agricultural practices, including yield prediction, quality control, and market pricing. Traditional methods for estimating fruit weight are often destructive, time-consuming, and labor-intensive. In this study, we addressed the problem of predicting plum FW using artificial intelligence (AI) methods based on fruit dimensions. We aimed to evaluate various machine learning (ML) techniques for this purpose. Images of fruit samples were captured using a smartphone camera, processed to extract binary images, and used to calculate dimensions. We tested several ML methods, including Support Vector Regression (SVR), Multivariate Linear Regression (MLR), Multi-Layer Perceptron (MLP), and Decision Tree (DT). The SVR model with a Pearson-VII kernel (PUK) function and penalty value (c) of 0.1 was the most accurate, achieving an R² of 0.9369 and root mean squared error (RMSE) of 0.4850 (gr) during training, and 0.9267 and 0.4863 (gr) during testing. This method is important for researchers and practitioners seeking efficient, quick, and non-destructive ways to estimate fruit weight. Future research can build on these findings by applying the model to other fruit types and conditions.
... The lower R 2 of 0.46 when UAS was used with Sentinel-2 compared to an R 2 of 0.56 when Sentinel was used alone may be due to compounded errors (in satellite prediction of biomass, drone imagery prediction of pasture height, smaller field biomass datasets, and subsequent conversion to pasture biomass). Lower RMSE of the combined UAS-S2RF approach suggests that predictions from this approach are more accurate overall [33,54,56]. It is important to note that although the RMSE of the UAS model is slightly higher (1240 kg DM/ha) than the SEM of the field biomass (1020 kg DM/ha) ( Figure 5), integrating the calibrated drone data with Sentinel-2 reduced the SEM of the S2RF model from 1642 kg DM/ha to 1473 kg DM/ha (Figures 6 and 7). ...
Article
Full-text available
Effective agricultural management hinges upon timely decision-making. Here, we evaluated whether drone and satellite imagery could improve real-time and remote monitoring of pasture management. Using unmanned aerial systems (UAS), we quantified grassland biomass through changes in sward height pre- and post-grazing by sheep. As optical spectral data from Sentinel-2 satellite imagery is often hindered by cloud contamination, we assessed whether machine learning could help improve the accuracy of pasture biomass prognostics. The calibration of UAS biomass using field measurements from sward height change through 3D photogrammetry resulted in an improved regression (R² = 0.75, RMSE = 1240 kg DM/ha, and MAE = 980 kg DM/ha) compared with using the same field measurements with random forest-machine learning and Sentinel-2 imagery (R² = 0.56, RMSE = 2140 kg DM/ha, and MAE = 1585 kg DM/ha). The standard error of the mean (SEM) for the field biomass, derived from UAS-measured sward height changes, was 1240 kg DM/ha. When UAS data were integrated with the Sentinel-2-random forest model, SEM reduced from 1642 kg DM/ha to 1473 kg DM/ha, demonstrating that integration of UAS data improved model accuracy. We show that modelled biomass from 3D photogrammetry has significantly higher accuracy than that predicted from Sentinel-2 imagery with random forest modelling (S2-RF). Our study demonstrates that timely, accurate quantification of pasture biomass is conducive to improved decision-making agility, and that coupling of UAS with satellite imagery may improve the accuracy and timeliness of agricultural biomass prognostics.
... The validation process consisted in comparing the coefficient of determination (R 2 ) resulting from the scatter plots of measured vs. predicted values, the root mean square deviation (RMSD), the Nash-Sutcliffe efficiency (NSE) and the mean absolute error (MAE) among models. The RMSD indicates the mean deviation of predicted values with respect to the measured values and was calculated according to Piñeiro et al. (2008). The NSE indicates how well the relationship between measured vs. predicted values fits the 1:1 line (the closer to 1 the better) and was calculated according to Moriasi et al. (2007). ...
Article
Being able to predict soil moisture dynamics offers water managers the possibility to better plan irrigation events and prevent soil moisture deficits from reaching levels that reduce crop production. Machine learning (ML) model predictions can potentially assist farmers in managing irrigation water more efficiently. In this study, we aimed to assess the accuracy of a set of ML models in predicting soil matric potential seven days ahead in gravity-surface irrigated cotton paddocks and evaluate the models' performance for longer term predictions (14 days). The ML models used past soil moisture, weather, and satellite-derived crop-related data as features for the input parameters. Input data were structured in tuples that were organised following a 20-day 'window' approach that 'slid' one position forward after each training round. A convolutional neural network (CNN) model outperformed a Long Short-Term Memory, Dense Multilayer Perceptron, and Linear Regression model, the latter of which produced the least accurate predictions. The accuracy of the soil matric potential predictions with the CNN model was stable over time (R 2 ≥ 0.92 and root mean square deviation ≤ 7.5 kPa). However, less accurate predictions were obtained for a short period after emergence and at crop senescence. This study demonstrates the feasibility of producing accurate predictions of soil matric potential in cotton fields at 0.20 m soil depth with a CNN model, which can be integrated into irrigation decision support systems.
... The models in this study were evaluated using MAE and percentage root mean squared error (PRMSE) as shown in Equations (2) and (3), respectively. The MAE was used because, compared to the RMSE, it is less sensitive to outliers [46,47]. ...
Article
Full-text available
Tree- and block-level prediction of mango yield is important for farm operations, but current manual methods are inefficient. Previous research has identified the accuracies of mango yield forecasting using very-high-resolution (VHR) satellite imagery and an ’18-tree’ stratified sampling method. However, this approach still requires infield sampling to calibrate canopy reflectance and the derived block-level algorithms are unable to translate to other orchards due to the influences of abiotic and biotic conditions. To better appreciate these influences, individual tree yields and corresponding canopy reflectance properties were collected from 2015 to 2021 for 1958 individual mango trees from 55 orchard blocks across 14 farms located in three mango growing regions of Australia. A linear regression analysis of the block-level data revealed the non-existence of a universal relationship between the 24 vegetation indices (VIs) derived from VHR satellite data and fruit count per tree, an outcome likely due to the influence of location, season, management and cultivar. The tree-level fruit count predicted using a random forest (RF) model trained on all calibration data produced a percentage root mean squared error (PRMSE) of 26.5% and a mean absolute error (MAE) of 48 fruits/tree. The lowest PRMSEs produced from RF-based models developed from location, season and cultivar subsets at the individual tree level ranged from 19.3% to 32.6%. At the block level, the PRMSE for the combined model was 10.1% and the lowest values for the location, seasonal and cultivar subset models varied between 7.2% and 10.0% upon validation. Generally, the block-level predictions outperformed the individual tree-level models. Maps were produced to provide mango growers with a visual representation of yield variability across orchards. This enables better identification and management of the influence of abiotic and biotic constraints on production. Future research could investigate the causes of spatial yield variability in mango orchards.
... Model applications come inevitably with model evaluation, which is the process of building trust in the model input, model structure, and model output. Model validation is the most common strategy to evaluate models, in which models are compared against observations: observed data (y-axis) against model predictions (x-axis) on a 1:1 line [26,27]. Validation is often a benchmark for scientists to determine the quality of the model performance (e.g., [28]). ...
Article
Full-text available
Validating large-scale water quality models is challenging because of the variety of water quality constituents, and scales for which observations are limited. Here, in this perspective, we propose 13 alternative strategies to build trust in large-scale water quality models beyond validation and discuss their strengths and weaknesses regarding their validity, reliability, and applicability. Our alternative strategies aim to evaluate separately model inputs (Strategies 1–4), outputs (Strategies 5–6) and structures (Strategy 7) as well as these aspects together (Strategies 8–13). This is done via methods such as comparisons (Strategies 1–3, 6–8, 12–13), sensitivity analysis (Strategy 5), use of innovations (Strategy 9), expert knowledge (Strategy 11) and local models (Strategy 13). The proposed strategies vary in their validity, reliability, and applicability. Validation is an important starting point but should be used in combination with other strategies. Our proposed list opens the discussion to improve methods to evaluate global water quality models.
... The selection of models to estimate volume was based on the following adjustment and precision criteria: correlation coefficient (r), standard error of the estimate in percentage (Syx%), and verification of homoscedasticity and normality through the graphic analysis of the residues [52]. The observed versus predicted graph [53] and the frequency histogram in relative error classes [27] were also used to select the best model. The determination coefficient was replaced by the correlation coefficient (r) for the nonlinear models. ...
Article
Full-text available
The Cerrado has high plant and vertebrate diversity and is an important biome for conserving species and provisioning ecosystem services. Volume equations in this biome are scarce because of their size and physiognomic diversity. This study was conducted to develop specific volumetric models for the phytophysiognomies Gallery Forest, Dry Forest, Forest Savannah, and Savannah Woodland, a generic model and a model for Cerrado forest formation. Twelve 10 m × 10 m (100 m²) (National Forest Inventory) plots were used for each phytophysiognomy at different sites (regions) of the Federal District (FD) where trees had a diameter at breast height (DBH; 1.30 m) ≥5 cm in forest formations and a diameter at base height (Db; 0.30 m) ≥5 cm in savanna formations. Their diameters and heights were measured, they were cut and cubed, and the volume of each tree was obtained according to the Smalian methodology. Linear and nonlinear models were adjusted. Criteria for the selection of models were determined using correlation coefficients, the standard error of the estimates, and a graphical analysis of the residues. They were later validated by the chi-square test. The resultant models indicated that fit by specific phytophysiognomy was ideal; however, the generic and forest formation models exhibited similar performance to specific models and could be used in extensive areas of the Cerrado, where they represent a high potential for generalization. To further increase our understanding, similar research is recommended for the development of specific and generic models of the total volume in Cerrado areas.
... A.K. Osei et al. Geoderma Regional 39 (2024) e00866 representing estimate of the mean deviation of simulated values with respect to the measured values; coefficient of residual mass (CRM), which measured the tendency of the model to over-or underestimate the measurements; and modeling efficiency (EF) which compared measured values to simulated data (Steel and Torrie, 1980;Piñeiro et al., 2008). ...
... 34,35,36,37,38,39,40,41 and 42). The observed data were placed on the y-axis and the simulated values on the chi-axis (Pineiro et al. 2008). Each plot depicts the y = x line (45 degrees or slope equal to 1) with black colour, the trendline (y = a + b x) with red colour and the regression formulas (a is the intercept, b the slope, r 2 the coefficient of determination). ...
Article
Full-text available
Sediment rating curves (SRCs) are tools of satisfactory reliability in the attempt to describe the sediment regime in catchments with limited or poor-quality records. The study valorised the most suitable SRC development method for the estimation of the coarse suspended sediment load at the outlet of nine Mediterranean sub-watersheds. Four established grouping techniques were assessed, to minimize the uncertainty of the results, namely simple rating curve, different ratings for the dry and wet season of the year, hydrographic classification, and broken line interpolation, at three major Greek rivers (Aliakmon, Acheloos – upper route, Arachthos). The methods’ performance was benchmarked against sediment discharge field records, utilizing statistical measures and graphical analyses. The necessary observations were conducted by the Greek Public Power Corporation. The results were site/station dependent, and no methodology emerged as universally accepted. The analysis designated that the simple rating curve performs best at the cross-sections Moni Ilarion, Moni Prodromou, and Arta bridge, the different ratings for the dry and wet season of the year at Grevena bridge and Gogo bridge, the hydrographic classification at Velventos and Plaka bridge, and the broken line interpolation at Avlaki dam and Tsimovo bridge. In this regard, the study advocates the use of multiple SRC methods. Despite its limitations, the method merits a rather simple and cost-effective generation of a (continuous, detailed, sufficiently accurate) synthetic suspended sediment discharge timeseries, with high interpolating, extrapolating and reproducibility potential. The success of the application could benefit, among others, water quality restoration and dam management operations.
... For non-linear and generalized least square regressions we report the R 2 of the ordinary least square regression of observed vs. predicted values and denote it R 2 p . We tested for deviation of this regression from the 1:1 line as indicated by a significant intercept of this regression, and a significant slope of the regression of measured values minus predicted values vs. predicted values (Piñeiro et al., 2008). For mixed effects models, we report the fraction of variance explained by fixed effects (marginal R 2 , R 2 m ) and by fixed and random effects (conditional R 2 , R 2 c ), using the function 'r. ...
Article
Full-text available
Tree stems exchange greenhouse gases with the atmosphere but the magnitude, variability and drivers of these fluxes remain poorly understood. Here, we report stem fluxes of carbon dioxide (CO2), methane (CH4) and nitrous oxide (N2O) in a boreal riparian forest, and investigate their spatiotemporal variability and ecosystem level importance. For two years, we measured CO2 and CH4 fluxes on a monthly basis in 14 spruces (Picea abies) and 14 birches (Betula pendula) growing near a headwater stream affected by historic ditching. We also measured N2O fluxes on three occasions. All tree stems were net emitters of CO2 and CH4, while N2O fluxes were around zero. CO2 fluxes correlated strongly with air temperature and peaked in summer. CH4 fluxes correlated modestly with air temperature and solar radiation and peaked in late winter and summer. Trees with larger stem diameter emitted more CO2 and less CH4 and trees closer to the stream emitted more CO2 and CH4. The CO2 and CH4 fluxes did not differ between spruce and birch, but correlations of CO2 fluxes with stem diameter and distance to stream differed between the tree species. The absence of vertical trends in CO2 and CH4 fluxes along the stems and their low correlation with groundwater levels and soil CO2 and CH4 partial pressures suggest tree internal production as the primary source of stem emissions. At the ecosystem level, the stem CO2, CH4 and N2O emissions represented 52 ± 16 % of the forest floor CO2 emissions and 3 ± 1 % and 11 ± 40 % of the forest floor CH4 and N2O uptake, respectively, during the snow-free period (median ± SE). The six month snow-cover period contributed 11 ± 45 % and 40 ± 29 % to annual stem CO2 and CH4 emissions, respectively. Overall, the stem gas fluxes were more typical for upland rather than wetland ecosystems likely due to historic ditching and subsequent groundwater level decrease.
... The t was evaluated by plotting observed with predicted values against each other and comparing with to the 1:1 line as suggested byPiñeiro et al. (2008) and by comparing the measures for root mean squared error (RMSE), and mean absolute percentage error (MAPE), where the measure of decision was the MAPE. The index of agreement (IA) as suggested byWillmott (1981) was used as a more general indication of model t. ...
Preprint
Full-text available
Recycling nutrients contained in urban wastes to agriculture is essential in a circular society. This study simultaneously compares different recycled fertilizers (household waste compost, sewage sludge, human urine) with mineral fertilization and animal manures. Tested were their long-term effects on yield, nutrient budgets, potentially toxic element (PTE) accumulation, and nitrogen (N)/carbon cycle (a.o. N efficiency, N losses, soil carbon). Therefore, data from a long-term field trial and predictions from the soil-plant-atmosphere model DAISY were evaluated. Based on trial data, human urine performed similar to the mineral fertilization for yield, N efficiency (MEF = 81%), and nutrient budget, while sewage sludge and compost were more like animal manures with lower yields, N efficiencies (MEF 70% & 19% respectively) and higher nutrient imbalances, especially P and S surpluses. Compost and sewage sludge applications resulted in net PTE inputs. Yet, plant uptake and soil accumulation seemed neglectable. Model outputs predicted N losses of 34–55% of supplied N. Losses were highest for compost, followed by deep litter, manure, sewage sludge, human urine, mineral fertilization, and slurry. Nitrate leaching was the main loss pathway (14–41% of N input). Within the compost and straw-rich manure treatments, about 25% of applied N, were stored in the soil which was accompanied by an increase in soil carbon. The study suggests substitution of established fertilizers with recycled ones is feasible. Thereby each fertilizer has advantages and disadvantages, and thus should be utilized according to their strength or in mixtures.
... We assessed a variety of predictor variables (table S2) in the models by examining their influence on the Adjusted R 2 , mean absolute error (MAE), regression diagnostics, and coefficient t-statistic/effect size and p-values [57]. Final model performance was assessed using conventional methods that include the coefficient of determination (R 2 ), F-statistic, MAE, observed vs. predicted plots (figures S6-S8 [58]), and residual diagnostics. Spatial plots of the final predictor variables are shown in figure S4. ...
Article
Full-text available
Streamflow droughts are receiving increased attention worldwide due to their impact on the environment and economy. One region of concern is the Midwestern United States, whose agricultural productivity depends on subsurface pipes known as tile drains to improve trafficability and soil conditions for crop growth. Tile drains accomplish this by rapidly transporting surplus soil moisture and shallow groundwater from fields, resulting in reduced watershed storage. However, no work has previously examined the connection between tile drainage and streamflow drought. Here, we pose the question: does the extent of watershed-level tile drainage lead to an increased susceptibly and magnitude of streamflow droughts? To answer this, we use daily streamflow data for 122 watersheds throughout the Midwestern United States to quantify streamflow drought duration, frequency, and intensity. Using spatial multiple regression models, we find that agricultural tile drainage generates statistically significant (p < 0.05) increases in streamflow drought duration and intensity while significantly reducing drought frequency. The magnitude of the effect of tile drainage on streamflow drought characteristics is similar to that of water table depth and precipitation seasonality, both of which are known to influence streamflow droughts. Furthermore, projected changes in regional precipitation characteristics will likely drive the installation of additional tile drainage. We find that for each 10% increase in tile-drained watershed area, streamflow drought duration and intensity increase by 0.03 d and 12%, respectively, while frequency decreases by 0.10 events/year. Such increases in tile drainage may lead to more severe streamflow droughts and have a detrimental effect on the socio-environmental usage of streams throughout the Midwest.
... This value of R 2 ranges from 0 to 1, with a higher value indicating a smaller difference between the predicted and observed data (Frost, 2017). Smith and Rose (1995), Mesplé et al. (1996), and Piñeiro et al. (2008) have all discussed this concept in detail. For our validation, we separated our data into the training set (set of data used to create a model) and testing set (set of validation data that you use to compare your model's accuracy) (Grant, 2021). ...
Article
Maple syrup is an important part of the economy in various regions of the United States. Studies on maple syrup production potential mostly use climatic factors as determinants and, therefore, fail to account for non-climatic factors. In this study, we applied a stochastic production function framework to establish a relationship between maple syrup yield and a set of climatic (temperature and tapping season length) and non-climatic determining factors, such as the number of maple trees and utilization rate of the potential number of taps. Tree characteristics, climatic, and other factors had mixed effects on syrup yield. The number of maple trees, the number of taps, and the minimum temperature had marginal negative effects on average syrup yield, while the length of the season and the maximum temperature had positive effects. A predictive model was developed and used to estimate the potential production of maple syrup under low, medium and high utilization levels in Kentucky, a likely region for maple syrup production. This model could be useful for maple syrup research, education, and extension in maple-producing states.
... We used Pearson's correlation coefficient (r > .7) as an indicator of agreement between observed (y-axis) and predicted (x-axis) values and fitted a simple linear regression to provide information on predictions' bias and consistency (Piñeiro et al., 2008;Potts & Elith, 2006). ...
Article
Full-text available
Species–environment relationships have been extensively explored through species distribution models (SDM) and species abundance models (SAM), which have become key components to understand the spatial ecology and population dynamics directed at biodiversity conservation. Nonetheless, within the internal structure of species' ranges, habitat suitability and species abundance do not always show similar patterns, and using information derived from either SDM or SAM could be incomplete and mislead conservation efforts. We gauged support for the abundance–suitability relationship and used the combined information to prioritize the conservation of South American dwarf caimans (Paleosuchus palpebrosus and P. trigonatus). We used 7 environmental predictor sets (surface water, human impact, topography, precipitation, temperature, dynamic habitat indices, soil temperature), 2 regressions methods (Generalized Linear Models—GLM, Generalized Additive Models—GAM), and 4 parametric distributions (Binomial, Poisson, Negative binomial, Gamma) to develop distribution and abundance models. We used the best predictive models to define four categories (low, medium, high, very high) to plan species conservation. The best distribution and abundance models for both Paleosuchus species included a combination of all predictor sets, except for the best abundance model for P. trigonatus which incorporated only temperature, precipitation, surface water, human impact, and topography. We found non‐consistent and low explanatory power of environmental suitability to predict abundance which aligns with previous studies relating SDM‐SAM. We extracted the most relevant information from each optimal SDM and SAM and created a consensus model (2,790,583 km²) that we categorized as low (39.6%), medium (42.7%), high (14.9%), and very high (2.8%) conservation priorities. We identified 279,338 km² where conservation must be critically prioritized and only 29% of these areas are under protection. We concluded that optimal models from correlative methods can be used to provide a systematic prioritization scheme to promote conservation and as surrogates to generate insights for quantifying ecological patterns.
... Model validation was conducted by comparing predicted values (x axis) and observed values (y axis), following ref. 44. ...
Article
Full-text available
Terrestrial ecosystems are subjected to multiple global changes simultaneously. Yet, how an increasing number of global changes impact the resistance of ecosystems to global change remains virtually unknown. Here we present a global synthesis including 14,000 observations from seven ecosystem services (functions and biodiversity), as well as data from a 15-year field experiment. We found that the resistance of multiple ecosystem services to global change declines with an increasing number of global change factors, particularly after long-term exposure to these factors. Biodiversity had a higher resistance to multiple global changes compared with ecosystem functions. Our work suggests that we need to consider the combined effects of multiple global changes on the magnitude and resistance of ecosystem services worldwide, as ecosystem responses will be enhanced by the number of environmental stressors and time of exposure.
... Differences in the first quartile (based on the distribution of extracted canopy heights) were also determined to test for negligible changes in short-stature vegetation as found in the forest understory or in open upland sites. Subsequently, to quantify deviations between the sensors' characterizations of height, the covariance of each corresponding Titan (observed) and ALTM 3100 (predicted) lidar metric was derived for sampled grid cells by comparing the slope against the 1:1 line of correspondence between observed and predicted values (Piñeiro et al. 2008). Lidar metrics retained as candidates for the final bi-temporal shrub-to-tree AGB model development were selected based on demonstrating ≤ ±2% deviation from unity (1:1) in slope and ≥95% in explained variance (R 2 ). ...
Article
Full-text available
Monitoring aboveground biomass (AGB) is critical for carbon reporting and quantifying ecosystem change. AGB from field data can be scaled to the region using airborne lidar. However, lidar-based AGB products emphasize upland forests, which may not represent the conditions in rapidly changing peatland complexes in the southern Taiga of western Canada. In addition, to ensure that modeled AGB changes do not incorporate systematic error due to differences between older and newer lidar technologies, model transfer tests are required. The aim of this study was to develop one bi-temporal lidar-based AGB model applicable to (1) vegetation structures at varying vertical and horizontal continuity in this region and to (2) data collected with an earlier generation lidar system for which Canada-wide aerial coverage is available. Goodness-of-fit metrics show that AGB can be modeled with moderate (R2 = 48%–58% Taiga Shield, peatlands) to high accuracies (R2 = 83%–89% Taiga Plains, upland/ permafrost plateau forests including ecotones) by using the point clouds average height and 90th height percentile within a weighted approach as function of modeled AGB and calibrating the earlier lidar data. These results are important for quantifying climate change effects on forest to peatland ecotones.
... In our comparisons, there is measurement variance in both the observed and predicted values. For this reason, we use Median Deviation (Dv50) and the square root of the sum of squared deviations between x and y values (RMSD), rather than regression fit (R 2 and RMSE) as our primary measures of the combined accuracy and precision of measured vs calculated data or predicted vs observed data (see Kobayashi and Salam, 2000;Gauch et al., 2003;Piñeiro et al., 2008). Dv50 is calculated as the 50th percentile of the absolute values of deviations (Dv) of predicted values (x) from observed values (y), both expressed as the Log 10 of the discharge values being evaluated. ...
... The fit proves to be suitable for describing the distribution of the obtained data ( 2 = 0.999, adj-2 = 0.994, and RMSE = 10 −5 ). Hypothesis tests applied to the parameters obtained from the linear regression of observed vs. predicted data [36] have positive results at the 5% significance level. ...
Article
Full-text available
Exposure to ambient ultraviolet radiation is associated with various ocular pathologies. Estimating the irradiance received by the eyes is therefore essential from a preventive perspective and to study the relationship between light exposure and eye diseases. However, measuring ambient irradiance on the ocular surface is challenging. Current methods are either approximations or rely on simplified setups. Additionally, factors like head rotation further complicate measurements for prolonged exposures. This study proposes a novel numerical approach to address this issue by developing an analytical model for calculating irradiance received by the eye and surrounding ocular area. The model takes into account local ambient irradiance, sun position, and head orientation. It offers a versatile and cost-effective means of calculating ocular irradiance, adaptable to diverse scenarios, and serves both as a predictive tool and as a way to compute correction factors, such as the fraction of diffuse irradiance received by the eyes. Furthermore, it can be tailored for prolonged durations, facilitating the calculation of radiant dose obtained during extended exposures.
... Next, we sought to validate our modeling approach by evaluating the ability of SGM parameters obtained automatically via SBI to predict age, PDR, and the aperiodic exponent. We regressed predicted vs observed values, on the x and y axes, respectively, following Piniero et al. for prediction of age, PDR, and aperiodic exponent [126]. The PDR and aperiodic exponent were automatically detected using the Fitting Oscillations & One Over F (FOOOF) Python package as demonstrated in Supplementary Figure 1A [18]. ...
Article
Full-text available
The spectral content of macroscopic neural activity evolves throughout development, yet how this maturation relates to underlying brain network formation and dynamics remains unknown. Here, we assess the developmental maturation of electroencephalogram spectra via Bayesian model inversion of the spectral graph model, a parsimonious whole-brain model of spatiospectral neural activity derived from linearized neural field models coupled by the structural connectome. Simulation-based inference was used to estimate age-varying spectral graph model parameter posterior distributions from electroencephalogram spectra spanning the developmental period. This model-fitting approach accurately captures observed developmental electroencephalogram spectral maturation via a neurobiologically consistent progression of key neural parameters: long-range coupling, axonal conduction speed, and excitatory:inhibitory balance. These results suggest that the spectral maturation of macroscopic neural activity observed during typical development is supported by age-dependent functional adaptations in localized neural dynamics and their long-range coupling across the macroscopic structural network.
... Although the Century model partitions SOC stocks into active, slow, and passive fractions, all the field-measured SOC stocks from our sites were total stocks, not categorized into these fractions. Hence, measured total SOC stocks in 2021 and 2022, along with values from previous studies at the same site (Abohassan, 2004;Mann, 2012;Coleman et al., 2018;Bazrgar et al., (Steel et al., 1997;Piñeiro et al., 2008) (Table 5.4). ...
... In order to assess the monthly accuracy of climate models, this study employed five statistical metrics, including the coefficient of determination (R 2 ) (Eq. 1), which assesses the linear correlation between the estimated and measured data. The value of R 2 extent from 0 to 1; a value near to 1 reflects better model performance, while correlation becomes weaker as it approaches zero (Moriasi et al. 2007;Piñeiro et al. 2008). Root mean squared error (RMSE) (Eq. ...
Article
Full-text available
A comprehensive analysis of regional climate changes is essential in arid and semi-arid regions to optimize water resources management. This research aims to evaluate the changes in temperature and precipitation across the Mujib Basin in Jordan, using the most recent Coupled Model Inter-comparison Project Phase 6 (CMIP6) model. Firstly, the performance of six CMIP6 general circulation models (GCMs) to reproduce historical temperature and precipitation from 1985 to 2014 was evaluated using observed climate data. The most suitable GCM was then bias-corrected using the linear scaling approach. The findings demonstrate that the EC-Earth3–Veg model could reasonably simulate the historical climate pattern of the Mujib Basin, with coefficient of determination (R²) values of 0.90, 0.83, and 0.65 for monthly Tmin, Tmax, and precipitation, respectively. Under both the SSP2-4.5 and SSP5-8.5 scenarios, Tmax is projected to increase by 1.4 to 3.9 °C and 1.6 to 6.8 °C, respectively, whereas Tmin increases from 1.4 to 3.4 °C and 1.6 to 6.4 °C. Furthermore, precipitation is projected to decrease by 4.61–23.2% at the end of the 21st century. These findings could help policymakers in formulating better adaptation strategies to reduce the impact of climate change in Jordan This is a crucial step toward becoming a climate-resilient nation.
... To assess the performance of both the calibration and validation models, we used the adjusted coefficient of determination (adj. R 2 ), the root mean square error (RMSE), relative RMSE (%RMSE), Willmott's index of agreement (d) (Willmott, 1981), and the slope and intercepts of each linear model comparing the observed (measured) value against the predicted value from PLSR models (Piñeiro et al., 2008). Finally, we used the Variable Influence on Projection (VIP; Wold et al., 2001) scores calculated with the 'plsVarSel' package (Mehmood et al., 2012) to determine the most important wavelengths for predicting a given residue biochemical trait, which has been used in prior studies of a similar nature (Wang et al., 2021. ...
Article
Full-text available
Purpose Cover crops and reduced tillage are two key climate smart agricultural practices that can provide agroecosystem services including improved soil health, increased soil carbon sequestration, and reduced fertilizer needs. Crop residue carbon traits (i.e., lignin, holocellulose, non-structural carbohydrates) and nitrogen concentrations largely mediate decomposition rates and amount of plant-available nitrogen accessible to cash crops and determine soil carbon residence time. Non-destructive approaches to quantify these important traits are possible using spectroscopy. Methods The objective of this study was to evaluate the efficacy of spectroscopy instruments to quantify crop residue biochemical traits in cover crop agriculture systems using partial least squares regression models and a combination of (1) the band equivalent reflectance (BER) of the PRecursore IperSpettrale della Missione Applicativa (PRISMA) imaging spectroscopy sensor derived from laboratory collected Analytical Spectral Devices (ASD) spectra (n = 296) of 11 cover crop species and three cash crop species, and (2) spaceborne PRISMA imagery that coincided with destructive crop residue collections in the spring of 2022 (n = 65). Spectral range was constrained to 1200 to 2400 nm to reduce the likelihood of confounding relationships in wavelengths sensitive to plant pigments or those related to canopy structure for both analytical approaches. Results Models using laboratory BER of PRISMA all demonstrated high accuracies and low errors for estimation of nitrogen and carbon traits (adj. R² = 0.86 − 0.98; RMSE = 0.24 − 4.25%) and results indicate that a single model may be used for a given trait across all species. Models using spaceborne imaging spectroscopy demonstrated that crop residue carbon traits can be successfully estimated using PRISMA imagery (adj. R² = 0.65 − 0.75; RMSE = 2.71 − 4.16%). We found moderate relationships between nitrogen concentration and PRISMA imagery (adj. R² = 0.52; RMSE = 0.25%), which is partly related to the range of nitrogen in these senesced crop residues (0.38–1.85%). PRISMA imagery models were also influenced by atmospheric absorption, variability in surface moisture content, and some presence of green vegetation. Conclusion As spaceborne imaging spectroscopy data become more widely available from upcoming missions, crop residue trait estimates could be regularly generated and integrated into decision support tools to calculate decomposition rates and associated nitrogen credits to inform precision field management, as well as to enable measurement, monitoring, reporting, and verification of net carbon benefits from climate smart agricultural practice adoption in an emerging carbon marketplace.
... The formula for calculating the coefficient of determination is as follows [35]: ...
Article
Full-text available
The permanent magnetic properties of Nd-Fe-B magnets strongly depend on the alloy composition. Machine learning is based on mathematical and information science methods and uses existing Nd-Fe-B data to predict the magnetic properties of Nd-Fe-B materials. We use the ensemble learning boosting method to establish the gradient boosting regression tree (GBRT) model for Nd2Fe14B melt-spun bonded magnets, in comparison with three other methods of machine learning: support vector machine (SVR), multiple linear regression (MLR), and random forest (RFR). The results show that the machine learning GBRT model developed using the ensemble learning algorithm has higher prediction accuracy and better stability than those three traditional machine learning (SVR, MLR, RFR) models used in the past to predict the magnetic properties of melt-spun Nd-Fe-B bonded magnets. We also used the GBRT model to predict hard magnetic properties of melt-spun Nd2Fe14B/Fe3B composite materials. Several new alloy compositions of melt-spun Nd-Fe-B bonded magnets and Nd2Fe14B/Fe3B composite materials with high-performances were also predicted. Machine learning based on the GBRT model can play an important role in the design, preparation, and development of melt-spun Nd2Fe14B bonded magnets and Nd2Fe14B/Fe3B composite materials.
... The modelling approach was then validated by comparing the predicted values (x-axis) versus the observed values (y-axis), following ref. 68. ...
Article
Full-text available
Soils support a vast amount of carbon (C) that is vulnerable to climatic and anthropogenic global change stressors (for example, drought and human-induced nitrogen deposition). However, the simultaneous effects of an increasing number of global change stressors on soil C storage and persistence across ecosystems are virtually unknown. Here, using 1,880 surface soil samples from 68 countries across all continents, we show that increases in the number of global change stressors simultaneously exceeding medium–high levels of stress (that is, relative to their maximum levels observed in nature) are negatively and significantly correlated with soil C stocks and mineral association across global biomes. Soil C is particularly vulnerable in low-productivity ecosystems (for example, deserts), which are subjected to a greater number of global change stressors exceeding medium–high levels of stress simultaneously. Taken together, our work indicates that the number of global change stressors is a crucial factor for soil C storage and persistence worldwide.
Article
Warmer temperatures associated with climate change have affected the phenology of most plants, but limited information exists for the American cranberry (Vaccinium macrocarpon Ait.), an important specialty crop. We examined long-term spatiotemporal trends in spring development of cranberry buds using field observations of cranberry bud stages over a 65-yr period, spanning from 1958–2022. A growing degree day (GDD) model was further used to interpret the observed trends in bud development over the study period. To assess spatial variability in cranberry bud development, the GDDs were computed using gridded weather data for four counties of Massachusetts, representing 85% of the state’s cranberry acreage. A Theil-Sen linear regression model was implemented to determine trends in the occurrence of the bud stages. Field observations revealed significant temporal trends (p-value < 0.01) in the annual timing of white bud and cabbage head stages, occurring 18–20 days earlier in the spring than 65 years ago. This earlier bud development can increase the risk of frost damage, especially during late-spring freezes. GDDs accumulated at a faster rate towards the end of the study period due to rising air temperatures. Analysis of 65 years of gridded data revealed a significant trend of earlier development across the four counties. The rate of advancement in cabbage head stage ranged from -0.15 to -0.25 d yr −1 across the study area. These findings highlight the need for updated frost forecasting models that account for the changing growth schedule of cranberry.
Article
The large spatial coverage of commercial tuna longline operations makes it crucial to compare the impact of different spatial scales on catch per unit effort (CPUE) standardization. This study utilized catch data from yellowfin tuna caught by the longline vessels in waters near Micronesia from 2019 to 2022 to compare the impact of different spatial scales (0.25° × 0.25°, 1° × 1°, and 2° × 2°) on the CPUE standardization and to explore the spatiotemporal distribution characteristics of CPUE. Spatiotemporal environmental factors and operational factors were incorporated across spatial scales using generalized additive models (GAMs). The study found that the results of CPUE standardization were significantly affected by spatial scale. The most effective model corresponded to 1° × 1° when the longline operation spanned approximately 1° in the longitudinal direction. The standardization models effectively mitigated the impact of spatiotemporal, gear-related, and environmental effects to a certain extent. The relatively high abundance index was in the first and fourth quarters, and the high standardized CPUE (value > 10) was predominantly located at 4° N–0°. The GAM, incorporating spatiotemporal effects, has demonstrated good performance in CPUE standardization and can be an effective method when the data dimensionality is relatively low.
Article
Full-text available
The Himalayan rivers are prone to frequent floods and pose serious risks to human lives and infrastructure. Accurate discharge prediction is crucial for effective flood mitigation and sustainable water resource management. This study focuses on the Sindh River, a vital water source in the Kashmir valley, supporting hydropower and irrigation. Advanced artificial intelligence techniques were applied to analyze 40 years of historical discharge data to address its complex hydrological dynamics. The study evaluated various machine learning models, including K-Nearest Neighbors, Support Vector Regression, Gradient Boosting, Extreme Gradient Boosting, Random Forest (RF), Artificial Neural Network (ANN), and Seasonal Autoregressive Integrated Moving Average (SARIMA). A hybrid RF-SARIMA model was also developed to improve prediction accuracy. The dataset was split into 80% for training and 20% for testing. Model performance was assessed using statistical metrics such as coefficient of determination (R²), mean squared error, mean absolute error, and root mean squared error, along with visual tools like box plots, scatter plots, Taylor diagrams, and time series analyses. Results revealed that RF, SARIMA, and ANN performed well among standalone models. However, the hybrid RF-SARIMA model delivered the best results, with an R² of 0.88 and a correlation coefficient above 0.9 for monthly discharge predictions. This study highlights the hybrid model’s potential to enhance discharge forecasting for the Sindh River, providing valuable insights for flood management and sustainable water planning in the Himalayan regions.
Article
Full-text available
Visual interactions play an instrumental role in collective-motion-related decision-making. However, our understanding of the various tentative mechanisms that can serve the visual-based decision-making is limited. We investigated the role that different attributes of the visual stimuli play in the collective-motion-related motor response of locust nymphs. We monitored and analyzed the behavioral responses of individual locusts tethered in a natural-like walking posture over an airflow-suspended trackball to carefully selected stimuli comprising various black rectangular shapes. The experimental findings together with a prediction model relating the level of behavioral response to the visual stimuli attributes indicate a major role of the number of objects in the visual field, and a further important effect of the object's vertical moving edges. While the object's horizontal edges can be utilized in the estimation of conspecifics’ heading, the overall area or visual angle subtended by the stimuli do not seem to play any role in inducing the response. Our results offer important novel insights regarding the fundamental visual-based mechanisms underlying animal collective motion and can be useful also in swarm robotics.
Article
Potassium (K) availability in plant cells is critical for maintaining plant productivity across many terrestrial ecosystems. Yet, there is no comprehensive assessment of the mechanisms by which plants respond to potassium application in such conditions, despite the global challenge of escalating osmotic stress. Herein, we conducted a meta-analysis using data from 2381 paired observations to investigate plant responses to potassium application across various morphological, physiological, and biochemical parameters under both osmotic and non-osmotic stress. Globally, our results showed the significant effectiveness of potassium application in promoting plant productivity (e.g., +12~30% in total dry weight), elevating photosynthesis (+12~30%), and alleviating osmotic damage (e.g., -19~26% in malonaldehyde), particularly under osmotic stress. Moreover, we found evidence of interactive effects between osmotic stress and potassium on plant traits, which were more pronounced under drought than salt stress, and more evident in C3 than C4 plants. Our synthesis verifies a global potassium control over osmotic stress, and further offers valuable insights into its management and utilization in agriculture and restoration efforts.
Article
Island ecosystems have significant conservation value owing to their higher endemic biotas. Moreover, studies of regional communities that compare differences in species composition (species dissimilarity) among islands and the mainland suggest that community assembly on islands is different from that on the mainland. However, the uniqueness of island biotic assembly has been little studied at the global scale, nor have phylogenetic information or alien species been considered in these patterns. We evaluate taxonomic and phylogenetic change from one community to the next, focusing on differences in species composition between mainland-mainland (M-M) pairs compared to differences between mainland-island pairs (M-I) and between island-island pairs (I-I), using herpetofauna on islands and adjacent mainland areas worldwide. Our analyses detect greater taxonomic and phylogenetic dissimilarity for M-I and I-I comparisons than predicted by M-M model, indicating different island herpetofauna assembly patterns compared with mainland counterparts across the world. However, this higher M-I dissimilarity has been significantly decreased after considering alien species. Our results provide global evidence on the importance of island biodiversity conservation from the aspect of both the taxonomic and phylogenetic uniqueness of island biotic assembly.
Article
The production of contrasted polymer-coated cardboards is necessary to establish the structure/properties relationships and produce the « just necessary » packaging materials. For that purpose, the impact of the processing parameters on the resulting structure of polymer-coated cardboards was modeled using a statistical Design of Experiment (DOE) approach. Five independent factors were considered: three numeric continuous factors, the pressure, the temperature, and the duration of thermocompression, one numeric discrete factor, the initial polymer film thickness, and one categorical factor, the cardboard type. Four responses were considered: the thicknesses of each of the three layers constituting polymer-coated cardboards (i.e., the free polymer, the impregnated, and the free cardboard) and the material’s curvature. The choice of the adequate DOE was first assessed using a Scoping Design and coupled with the characterization of the used polymer, i.e., poly(3-hydroxybutyrate-co-3-hydroxyvalerate (PHBV). The I-optimal DOE was found to be the most suitable and was therefore implemented. To validate the model, the DOE was then used to produce two targeted structures, one with no impregnation of the polymer and another one with a complete impregnation of the cardboard. The values of thicknesses and curvature were not significantly different from the model predictions, therefore verifying the model.
Article
Automotive biofuels offer a promising alternative to traditional fossil fuels. Accurate evaluation of combustion and emissions in IC engines and vehicles is crucial. This research aimed at developing and validating an engine and vehicle simulation methodology to assess the fuel effect on vehicle consumption and emissions considering different driving cycles and the road slope (barely evaluated for fuels widely used in emerging markets). Two blends were tested: 20 % biodiesel (B20) and 20 % hydrotreated vegetable oil (HVO20) with Ultra-Low-Sulfur Diesel (ULSD). A light-duty diesel vehicle model was developed in GTSuite®, using emission maps from a calibrated steady-state engine model. Good agreement with experiments was found. Road slope in local DC significantly increased fuel consumption and CO, CO2, NOx, and PN emissions, reducing HC. Compared to ULSD, B20 reduced PN and HC by 27–35 % and 12–22.5 %, respectively. HVO20 had a smaller effect on PN but reduced HC emissions by up to 19.5 %. Neither blend significantly affected CO and CO2. B20 slightly increased NOx and fuel consumption, while HVO20 had no significant impact on these.
Article
Full-text available
When output (x) of a mechanistic model is compared with measurement (y), it is common practice to calculate the correlation coefficient between x and y, and to regress y on x. There are, however, problems in this approach. The assumption of the regression, that y is linearly related to x, is not guaranteed and is unnecessary for the x-y comparison. The correlation and regression coefficients are not explicitly related to other commonly used statistics [e.g., root mean squared deviation (RMSD)]. We present an approach based on the mean squared deviation (MSD = RMSD2) and show that it is better suited to the x-y comparison than regression. Mean squared deviation is the sum of three components: squared bias (SB), squared difference between standard deviations (SDSD), and lack of correlation weighted by the standard deviations (LCS), To show examples, the MSD-based analysis was applied to simulation vs. measurement comparisons in literature, and the results were compared with those from regression analysis, The analysis of MSD clearly identified the simulation vs. measurement contrasts with larger deviation than others; the correlation-regression approach tended to focus on the contrasts with lower correlation and regression line far front the equality line. It was also shown that results of the MSD-based analysis mere easier to interpret than those of regression analysis. This is because the three MSD components are simply additive and all constituents of the MSD components are explicit. This approach will be useful to quantify the deviation of calculated values obtained with this model from measurements.
Article
Full-text available
Improved understanding of the factors that limit crop yields in farmers' fields will play an important role in increasing regional food production while minimizing environmental impacts. However, causes of spatial variability in crop yields are poorly known in many regions because of limited data availability and analysis methods. In this study, we assessed sources of between-field wheat (Triticum aestivum L.) yield variability for two growing seasons in the Yaqui Valley, Mexico. Field surveys conducted in 2001 and 2003 provided data on management practices for 68 and 80 wheat fields throughout the Valley, respectively, while yields on these fields were estimated using concurrent Landsat satellite imagery. Management-yield relationships were analyzed with t tests, linear regression, and regression trees, all of which revealed significant but year-dependent impacts of management on yields. In 2001, an unusually cool year that favored high yields, N fertilizer was the most important source of between-field variability. In 2003, a warmer year with reduced irrigation water allocations, the timing of the first postplanting irrigation was found to be the most important control. Management explained at least 50% of spatial yield variability in both years. Regression tree models, which were able to capture important nonlinearities and interactions, were more appropriate for analyzing yield controls than traditional linear models. The results of this study indicate that adjustments in management can significantly improve wheat production in the Yaqui Valley but that the relevant controls change from year to year.
Article
Full-text available
We analyzed the similarity of structural and functional characteristics of temperate grassland and shrubland ecosystems of North and South America. We based our analyses on correlative models that describe the climatic controls of grassland and shrubland structure and functioning at regional scales. We evaluated models that describe the regional distribution of plant functional types (C 3 and C 4 grasses and shrubs), soil organic carbon (SOC), and aboveground net primary production (ANPP) and its seasonality. To evaluate the predictive power of the models, we compared their estimates against observed data. We derived data sets, independent from those used to generate the models in North America, from climatically similar areas in South America. Our results support the notion that, in climatically similar regions, structural and func-tional attributes such as plant functional type composition, SOC, ANPP, and ANPP sea-sonality have similar environmental controls, independent of the evolutionary history of the regions. The study suggests the existence of an important degree of convergence at regional scales in both functional and structural attributes. It also points out differences in the regional patterns of some attributes that require further analyses.
Article
We analyzed the similarity of structural and functional characteristics of temperate grassland and shrubland ecosystems of North and South America. We based our analyses on correlative models that describe the climatic controls of grassland and shrubland structure and functioning at regional scales. We evaluated models that describe the regional distribution of plant functional types (C3 and C4 grasses and shrubs), soil organic carbon (SOC), and aboveground net primary production (ANPP) and its seasonality. To evaluate the predictive power of the models, we compared their estimates against observed data. We derived data sets, independent from those used to generate the models in North America, from climatically similar areas in South America. Our results support the notion that, in climatically similar regions, structural and functional attributes such as plant functional type composition, SOC, ANPP, and ANPP seasonality have similar environmental controls, independent of the evolutionary history of the regions. The study suggests the existence of an important degree of convergence at regional scales in both functional and structural attributes. It also points out differences in the regional patterns of some attributes that require further analyses.
Article
Solar irradiance is an important input parameter for many simulation models dealing with plant responses to the environment and is not measured in as many locations as temperature and precipitation. This situation has led to the development of algorithms to simulate solar irradiance with measured temperature and precipitation. The objectives of this research effort were two-fold: (1) to determine if a location-specific algorithm to simulate solar irradiance (based on temperature and precipitation) developed in the central Great Plains of the US could be used in other locations and (2) to determine if these results could then be used to develop a generalized algorithm. Data (temperature, precipitation, and solar irradiance) from a wide variety of locations (climates) were used to develop the location-specific and generalized algorithm. Simulated values of solar irradiance from both algorithms were evaluated with independent data from the same locations. A second independent data set was used to evaluate the generalized algorithm. For the first independent data set using the location-specific algorithm, root mean square errors (RMSE) varied from 2.6 to 5.5 MJ m−2 per day. Using the generalized algorithm, RMSE varied from 3.2 to 5.5 MJ m−2 per day, excluding Barrow, AK, which had an RMSE of 6.8 MJ m−2 per day. For the second independent data set using the generalized algorithm, RMSE ranged from 4.0 to 5.8 MJ m−2 per day. The RMSE for simulated solar irradiance was relatively large when the mean annual temperature term (ΔT), which is used in the generalized algorithm, was relatively small and precipitation was relatively high. With this caveat, the generalized algorithm can be used to simulate solar irradiance at locations where only daily values of temperature and precipitation are measured. The RMSE values reported in this study compare favorably to results for other algorithms used to simulate solar irradiance. An advantage of this approach is its relative simplicity and ease of use.
Article
In the present paper, the principles of Empirically Based Uncertainty Analysis (EBUA) are described. EBUA is based on the evaluation of ‘performance indices’ that express the level of agreement between the model and sets of empirical independent data collected in different experimental circumstances. Some of these indices may be used to evaluate the confidence limits of the model output. The method is based on the statistical analysis of the distribution of the index values and on the quantitative relationship of these values with the ratio ‘experimental data/model output’. Some performance indices are described in the present paper. Among these, the so called ‘functional distance’ (d) between the logarithm of model output and the logarithm of the experimental data, defined as where Mi is the ith experimental value, Oi the corresponding model evaluation and n the number of the couplets ‘experimental value, predicted value’, is an important tool for the EBUA method. From the statistical distribution of this performance index, it is possible to infer the characteristics of the distribution of the ratio ‘experimental data/model output’ and, consequently to evaluate the confidence limits for the model predictions. This method was applied to calculate the uncertainty level of a model developed to predict the migration of radiocaesium in lacustrine systems. Unfortunately performance indices are affected by the uncertainty of the experimental data used in validation. Indeed, measurement results of environmental levels of contamination are generally associated with large uncertainty due to the measurement and sampling techniques and to the large variability in space and time of the measured quantities. It is demonstrated that this non-desired effect, in some circumstances, may be corrected by means of simple formulae.
Article
A general linear model (GLM) was used to evaluate the deviation of predicted values from expected values for a complex environmental model. For this demonstration, we used the default level interface of the regional mercury cycling model (R-MCM) to simulate epilimnetic total mercury concentrations in Vermont and New Hampshire lakes based on data gathered through the EPAs Regional Environmental Monitoring and Assessment Program (REMAP). The response variable for the GLM was defined as R-MCMs predictive error: the difference between observed mercury concentrations and modeled mercury concentrations in each lake. Least square means of the response variable are used as an estimate of the magnitude and significance of bias, i.e., a statistically discernable trend in predictive errors for a given lake type, e.g., acidic, stratified, or oligotrophic. Using our approach, we determined lake types where significant over-prediction and under-prediction of epilimnetic total mercury concentration was occurring, i.e., regions in parameter space where the model demonstrated significant bias was distinguished from regions where no significant bias existed. This technique is most effective for finding regions of parameter space where bias is significant. Drawing conclusions concerning regions that show no significant bias can be misleading. The significant interaction terms in the GLM demonstrated that addressing this problem using univariate statistical techniques would lead to a loss of important information.
Article
Statistical and deterministic simulation modelling rely on a complex process made of trials, errors, and gradual improvement of the simulations. The major problem is to be able to quantify the quality of the simulations in order to know if a modification of the concepts, the laws simulating the processes, or the parameters improve it. To try to quantify the quality of simulations using a mathematical criterion we focus on simple linear regression parameters: the values of the slope (a) and the y-intercept (b). The estimated values of these parameters differ depending on which kind of regression model (model I or II) is used. An artificial dataset illustrates that ordinary least-squares regression (OLS; model I regression) leads to results that are not those expected; but using major axis regression (MA; model II regression) instead of OLS leads to the correct answer. The value of a, when it significantly differs from 1, indicates a difference between observed and simulated values proportional to the values of the variable. The value of b, when it significantly differs from 0, indicates a systematic and constant difference between observations and simulations. Taking into account the values of a and b, we define four possible outcomes which allow, at first, to define the quality of a simulation without considering the coefficient of determination, r2: (i) a n.s.d. (not significantly different from) 1 and b n.s.d. 0 (perfect agreement between observations and simulations), (ii) a n.s.d. 1 and b s.d. 0 (significant constant difference between observations and simulations) or a s.d. 1 and a s.d. 0 and b n.s.d. 0 (differences proportional to the values of the variable), (iii) a s.d. 1 and a s.d. 0 and b s.d. 0 (superimposition of a constant difference and a proportional difference), and (iv) a n.s.d. 0 (no relation between simulations and observations). The value of r2 is used to rank two simulations pertaining to the same group. That classification of the quality of the simulations is applied to a real-data example: a simulation of the temporal change in chlorophyll a in a high-rate algal pond.
Article
The appropriateness of a statistical analysis for evaluating a model depends on the model's purpose. A common purpose for models in agricultural research and environmental management is accurate prediction. In this context, correlation and linear regression are frequently used to test or compare models, including tests of intercept a = 0 and slope b = 1, but unfortunately such results are related only obliquely to the specific matter of predictive success. The mean squared deviation (MSD) between model predictions X and measured values Y has been proposed as a directly relevant measure of predictive success, with MSD partitioned into three components to gain further insight into model performance. This paper proposes a different and better partitioning of MSD: squared bias (SB), nonunity slope (NU), and lack of correlation (LC). These MSD components are distinct and additive, they have straightforward geometric and analysis of variance (ANOVA) interpretations, and they relate transparently to regression parameters. Our MSD components are illustrated using several models for wheat (Triticum aestivum L.) yield. The MSD statistic and its components nicely complement correlation and linear regression in evaluating the predictive accuracy of models.
Article
Four related approaches for assessing model goodness-of-fit (GOF) are discussed in this paper: linear regression of observed versus predicted values, the sum of squared prediction errors, a reliability index summarizing predictions as being within a factor Ks of observed values, and correlation-like measures of fit that normalize the sum of squared prediction error to be between zero and one. Relationships among the four measures are derived and alternative decompositions of the measures into components relating to bias, variance, and consistency are presented. The measures are extended to include lack-of-fit terms when multiple observations are available for each prediction, and except for the reliability index, extended to include the multivariate case of multiple prediction variables. Application of the GOF measures to model predictions and observed radon-222 concentrations (a univariate example) showed significant lack-of-fit for high concentrations. Application of the GOF measures to predicted and
Article
We present the application of a simple physiological model (3-PG) for estimating biomass accumulation in New Zealand vegetation. The simulation was performed utilising monthly surfaces of temperature, precipitation, and radiation coupled with digital soil maps classified into fertility classes at a 1-km2 resolution. From 3-PG simulations, we investigated trends in long-term biomass accumulation by simulating vegetation growth over the entire country for 100 years. The predictions were compared with data collected at a number of spatial scales including: (1) individual plot measurements of forest and scrub stem biomass; (2) average stem biomass for South Island forest types; and (3) total vegetation biomass for forest and scrub vegetation types for North and South Islands. The model was calibrated by comparing simulated stem biomass data to literature values for individual plots. Simulated and plot based estimates of stem biomass were highly correlated (r2=0.98) once key parameters were calibrated for New Zealand vegetation. 3-PG predictions correlated well with regional estimates of aboveground stem biomass for the South Island of New Zealand (r2=0.82) and also with total vegetation biomass for the entire country (r2=0.72). However, both were slightly underestimated due to factors associated with assumptions about allocation of biomass to roots, soil characteristics, and stand age. The results indicate that climate and soil fertility exert considerable control on biomass accumulation at the broad scale. Application of the 3-PG model can provide accurate national estimates of biomass that supplement field-based programs by improving large-scale field sampling efficiency and highlight research related to physiology, biomass allocation patterns, and environmental mapping.
Article
Model evaluation is argued to necessitate the use of a hypothesis testing framework instead of the use of goodness-of-fit (GOD) against time series data. A test statistic T is developed, based on model deviation from expected system behaviors. If a model does not exceed the error bounds on expected behavior, then we cannot say that it differs from the real system. Precision is measured not by degree of fit to a set of data but by the precision (width of confidence limits around) the expected system behavior. Implementation of this approach is enhanced by development of new test criteria that evaluate biological and ecological realism based on aggregate indices, failure modes, extreme condition tests, and others. Model performance is assessed by the extent to which the model falls within these expected bounds. Using this test statistic as a basis, revised approaches to sensitivity and uncertainty analysis are recommended. Factors not addressed by these techniques require structural analysis, which is based on comparisons between models with different structures. The application of the advocated approach should enhance confidence in ecosystem models applied to policy issues such as natural resource management or global change impacts.
Article
Model evaluation is an essential aspect of the process of development of system models. When the main purpose of the model is prediction, a reasonable criterion of model quality is the mean squared error of prediction. This criterion is defined here, and it is shown how it can be estimated from available data in a number of situations, including the situation where the parameters of the model are adjusted to the data. An example of the use of this criterion for choosing between alternative models is presented.
Article
The use of regression, ordination and dynamic ecosystem modelling in limnology is discussed by evaluating some of the vices and virtues of these techniques. Both general characteristics of the approaches and a few examples are used to stress the importance of analysis of the residuals for evaluation of the models. Not completely unexpected, a simple ordination model did not perform worse than a complicated ecosystem model, both applied to the same lake system. Both models are shown to fail for prediction purposes, mainly because the error variance in the data used for parameter estimation is large compared to the variance that can be explained.Integration of regression, multivariate analysis and dynamic ecosystem modelling, all followed by analysis of the residuals, is advised. Nested models are proposed as a solution for the problem of changing parameters and for the problem of parameters being different among lakes.
Article
NOAA AVHRR has been used extensively for monitoring vegetation condition and changes across the United States. Integration of crop growth models with MODIS imagery at 250 m resolution from the Terra Satellite potentially offers an opportunity for operational assessment of the crop condition and yield at both field and regional scales. The primary objective of this research was to evaluate the quality of the MODIS 250 m resolution data for retrieval of crop biophysical parameters that could be integrated in crop yield simulation models. A secondary objective was evaluating the potential use of MODIS 250 m resolution data for crop classification. A field study (24 fields) was conducted during the 2000 crop season in McLean County, Illinois, in the U.S. Midwest to evaluate the applicability of the MODIS 8-day, 250 m resolution composite imagery (version 4) for operational assessment of crop condition and yields. Ground-based canopy and leaf reflectance and leaf area index (LAI) measurements were used to calibrate a radiative transfer model to create a look up table (LUT) that was used to simulate LAI. The seasonal trend of MODIS derived LAI was used to find crop model parameters by adjusting the LAI simulated from the climate-based crop yield model. Other intermediate products such as crop phenological events were adjusted from the LAI seasonal profile. Corn (Zea mays L.) and soybean (Glycine max (L.) Merr.) yield simulations were conducted on a 1.6 × 1.6 km2 spatial resolution grid and the results integrated to the county level. The results were within 10% of county yields reported by the USDA National Agricultural Statistics Service (NASS).
Article
Regression of observations from the real system on model predictions is sometimes used for empirical validation. The arguments against this procedure are: (1) that it is a misapplication of regression; (2) that null hypothesis tests are ambiguous; (3) that regression lacks sensitivity in this context because distinguishing the points from a random cloud is rarely necessary at this stage of model development; (4) that the fitted line is irrelevant to model performance; and (5) that the assumptions for regression can be difficult to satisfy. An alternative method is given which concentrates on the deviations (prediction minus observation) and where the modeller has to define criteria for an adequate model with reference to its purpose. This method can be rigorous, objective and quantitative, and is also easy for non-modellers to understand.