Article

Achieving robustness across season, location and cultivar for a NIRS model for intact mango fruit dry matter content. II. Local PLS and nonlinear models

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

A range of modelling techniques were used in the estimation of dry matter content of intact mango fruit from short wave near infrared spectra, collected using an interactance geometry, with models developed on a data set collected across three seasons (n = 10,243) and tested on that of a fourth season (n = 1,448). Model types included Artificial Neural Network (ANN), Gaussian Process Regression (GPR), Local Optimized by Variance Regression (LOVR), Local Partial Least Squares Regression (LPLS), Local PLS Scores (LPLS-S) and Memory Based Learner (MBL), with manual tuning of parameters undertaken. Additionally, two commercially available cloud-based chemometric packages for automated model development were trialled. All of these models gave a better result than use of a global PLS model. The best result (lowest RMSEP) was achieved with an ensemble of ANN, GPR and LPLS-S, with the best individual model result achieved by LOVR, with RMSEP of 0.839 % and 0.881 %, respectively, compared to the global PLS result of 1.014 %. The best precision was achieved with the LPLS model, with a SEP of 0.846 %, compared to the global PLS result of 1.012 %. LOVR was twice as fast as a generalized latent variable selection method LPLS-S-cv in prediction of independent validation set (at 58.7 × 10 − 3 s compared to 163 x 10-3 s). The ANN model was satisfactory in all categories (prediction speed, model build speed, and prediction statistics) and insensitive to tuning, e.g., 33 of the 70 parameter combinations were within 0.05 units of RMSEP of the minimum combination. However, the ANN learning rate was low. For applications that require 'real-time' prediction, such as fruit packlines, use of ANN and GPR models is recommended. For non-cloud based handheld NIR devices lacking the computational power to perform local modelling, ANN is recommended , and LOVR or a model ensemble recommended in cloud based implementation. The automated cloud-based systems performed well (RMSEP of 0.850 % and 0.963 % for Hone Create and DataRobot, ensemble models, respectively), without human intervention for the choosing and tuning of models.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Recently, the growing research on NIRS has led to more complex predictive methods in various fields of application such as food, agriculture, and medicine. These methods include support vector machines (SVMs), which use kernel functions, and neural networks, including Artificial Neural Networks (ANNs) and Convolutional Neural Networks (CNNs) (Anderson et al., 2021;Mishra et al., 2022). Although they are capable of capturing complex patterns, including data non-linearity, they require extensive datasets for training, laborious and time-consuming hyperparameters tuning, and significant computational power. ...
... In plant breeding, models tailored to specific years, genetic material, or growing conditions may not generalize well to new contexts (lack of robustness). A robust model typically requires a carefully selected training set that is representative of the chemical, physical (e.g., colour, seed size, etc.) and spectral variability present in a specific crop (Anderson et al., 2021;Nicolai et al., 2007). Training set samples are often randomly selected. ...
... In addition, the training set size is also a pivotal factor in building robust models. Once calibration models have been established, it is advisable to validate their accuracy with external samples (independent validation sets) (Anderson et al., 2021;Nicolai et al., 2007). This involves testing models that have been developed using data from specific field trials to predict the chemical compositions of seeds harvested in different locations, in different years, or from different genetic material. ...
Article
Full-text available
Near-infrared spectroscopy (NIRS) provides a high-throughput phenotyping technique to assist breeding for improved faba bean seed quality. We combined chemical analysis of protein, oil content (and composition) with NIRS through chemometrics, employing Partial Least Squares (PLS), Elastic Net (EN), Memory-based Learning (MBL), and Bayes B (BB) as prediction models. Protein was the most reliably predicted trait (R² = 0.96–0.98) across field trials, followed by oil (R² = 0.82–0.86) and oleic acid (R² = 0.31–0.68). Samples for training the models were selected using K-means clustering. The optimal statistical approach for prediction was compound-specific: PLS for protein (Root Mean Squared Error - RMSE = 0.46), BB for oil (RMSE = 0.067), and EN for oleic acid content (RMSE = 2.83). Reduced training set simulations revealed different effects on prediction accuracy depending on the model and compound. Several NIR regions were pinpointed as highly informative for the compounds, using the shrinkage and variable selection capabilities of EN and BB.
... Here, four datasets are used from each type thus making a total of 8 NIR spectroscopy datasets. For the regressions problem, the four datasets are adulteration of honey (AH) [18], active substance in a pharmaceutical tablet (AST) [21], dry matter content within mango fruit (DMM) [22] and moisture content of grain protein (MGP) [23]. The four datasets used for the classification task are adulteration of rice dataset (RA) [6], coffee type (CF) dataset [24], strawberry fruit (SB) [25] and starch powder classification (SP) [12]. ...
... Let us call it the DMM dataset. Originally, the data for intact mango fruit was collected with short wave near infrared spectra using an interactance geometry, with total data set collected across three seasons (n = 10243) and that of a fourth season (n = 1,448) consisting of 306 wavelength points [22]. The dataset was reduced after pre-processing to have a number of samples (n=11362) with 103 wavelength points (features) as published in [27]. ...
... The dataset was reduced after pre-processing to have a number of samples (n=11362) with 103 wavelength points (features) as published in [27]. In the paper [22], PLS regression and ANN were used as regression models and they achieve similar performance with Root Mean Square Error of Prediction (RMSEP) of around 1%. ...
Article
Full-text available
Near infrared spectroscopy (NIRS) is a widely used analytical technique for non-destructive analysis of various materials including food fraud detection. However, the accurate calibration of NIRS data can be challenging due to the complexity of the underlying relationships between the spectral data and the target variables of interest. Ensemble learning, which combines multiple models to make predictions, has been shown to improve the accuracy and robustness of predictive models in various domains. This paper proposes stacking ensemble machine learning (SEML) for calibration of NIRS data with two levels of learning involved. Eight (8) spectroscopy datasets from public repository and previously published works by the authors are used as the case study. The model well generalized the data in the respective regression tasks with of at least »0.8 in the test samples and in the respective classification tasks with classification accuracy (CA) of at least »0.8 also. In addition, the proposed SEML can improve, or at least reach par with, the accuracy of individual base learners in both train and test samples for all cases of regression and classification datasets. It shows superior performance in test samples for both regression and classification datasets with respectively ranging from 0.86 to nearly 1 and CA ranging from 0.89 to 1. ABSTRAK: Spektroskopi inframerah dekat (NIRS) adalah teknik analitikal yang banyak digunakan bagi analisa pelbagai bahan tanpa merosakkan bahan termasuk ketika mengesan penipuan makanan. Walau bagaimanapun, kalibrasi yang tepat bagi data NIRS adalah sangat mencabar kerana hubungan antara data spektral dan pemboleh ubah sasaran yang ingin dikaji bersifat kompleks. Gabungan pembelajaran (Ensemble learning), iaitu gabungan pelbagai model bagi membuat prediksi, telah terbukti dapat meningkatkan ketepatan dan kecekapan model prediksi dalam pelbagai bentuk. Kajian ini mencadangkan Turutan Gabungan Pembelajaran Mesin (Stacking Ensemble Machine Learning ) (SEML), bagi teknik penentu ukuran data NIRS melibatkan dua tahap pembelajaran. Lapan (8) set data spektroskopi dari repositori awam dan kajian terdahulu oleh pengarang telah digunakan sebagai kes kajian. Model ini menggeneralisasi data dalam tugas regresi masing-masing sebanyak ?0.8 bagi sampel ujian dan pengelasan tugas masing-masing dengan ketepatan klasifikasi (CA) sekurang-kurangnya ?0.8. Tambahan, SEML yang dicadangkan ini dapat membantu, atau sekurang-kurangnya setanding dengan ketepatan individu dalam pembelajaran berkumpulan dalam kedua-dua sampel latihan dan ujian bagi semua kes set data regresi dan klasifikasi. Ia menunjukkan prestasi terbaik dalam sampel ujian bagi kedua-dua kumpulan set data regresi dan klasifikasi dengan masing-masing antara 0.86 hingga hampir 1 dan antara julat 0.89 hingga 1 bagi CA.
... However, no justification was given for the choice of this architecture beyond its use in other (non-spectroscopy) applications. The CNN model improved the prediction error from the previous best reported RMSEP (achieved by Anderson et al. 89 with an ensemble of multiple non-linear models) of 0.84 to 0.79% FW. 77 However, the authors raised the concern that the CNN model might be overfitted to the data set, with a new CNN model proposed in a subsequent paper based on testing on fruit from a new season, cultivar and scanned with a new instrument. 79 The results of Mishra and co-workers has yet to be replicated by other research groups. ...
... However, the platform did not have a PLS regression or CNN toolkit. Anderson et al. 89 evaluated the DataRobot and Hone cloud chemometric platforms in the context of the mango dry matter data set used by Mishra and Passos. 77 Again, neither platform provided a CNN resource. ...
... As the largest publicly available fruit data set, it has been utilised in several studies exploring the use of global modelling techniques. [76][77][78][79][80][81]87,89,94 In these works data of the first three seasons is used for training, and the fourth season for independent validation. Table 5 presents a direct comparison of the reported RSMEP on the same independent test. ...
Article
The Part 1 prequel to this review evaluated the evolution of modelling techniques used in evaluation of fruit quality over the past three decades and noted a progression towards the use of artificial neural networks (ANNs) and convolutional neural networks (CNNs). In this review, Part 2, the use of CNNs for NIR fruit quality evaluation is explored, given the success of CNNs in various other fields, such as image, video, speech, and audio processing, and the availability of large (open source) datasets of fruit spectra and reference quality attribute, which is required for the training of CNN models. The review provides an overview of deep learning and the CNN architectures and techniques used in NIR spectroscopy for regression modelling, with advantages and disadvantages identified. Studies using CNN for NIR based fruit quality evaluation are then critically examined. Eight publications have presented on models using the same open-source mango dry matter calibration and test set, enabling inter-method comparisons. CNN models have been demonstrated to be accurate, precise and robust. Techniques of transfer learning for CNN models offer an alternative solution to model updating and calibration transfer methods applied in traditional chemometrics. The review has highlighted crucial areas that require resolution and exploration in this application through future research, including, (i) data requirements for training a CNN (ii) optimal spectral pre-processing for CNN (iii) CNN architecture and hyper-parameter selection and tuning for fruit quality evaluation (iv) CNN model interpretability and explainability. Future studies must conduct clearer comparison to partial least squares (PLS) regression and shallow ANNs to better assess the prospective benefit of using CNN, a more complex model. The potential for visualisation of spectra relevance to the CNN model using techniques such as GradCam, currently employed in visualising 2D-CNN models, remains to be explored.
... During ripening, most of the starch is broken down into soluble sugars, which determines the final sweetness and flavour of mango. Hence, DM is a promising additional parameter to predict the fruit quality at harvest (Anderson et al., 2021(Anderson et al., , 2020a(Anderson et al., , 2020b. It has been confirmed that DM at harvest is highly correlated with the soluble solids content (SSC) in ripe, ready-to-eat fruit (de Freitas et al., 2022). ...
... NIR spectroscopy is a well-developed analytical technique and has been used for predicting DM in a wide variety of fruit such as apples (Teh et al., 2020), pears (Mishra et al., 2021b), avocados (Mishra et al., 2022a;, and olives (Sun et al., 2020a(Sun et al., , 2020b. In addition, NIR spectroscopy has been used for the analysis of mango fruit including prediction of DM (Anderson et al., 2021(Anderson et al., , 2020a(Anderson et al., , 2020b, and firmness (Kasim et al., 2021), and for detection of internal defects (Gabriëls et al., 2020). ...
... Often a calibration model performs well on one batch of samples while it fails on a new batch of data due to some unmodelled variability present in the new data . To solve such challenges, different approaches have been tested such as model updating by including new samples, measuring a large number of samples with a wider variation for calibration (Anderson et al., 2021(Anderson et al., , 2020a(Anderson et al., , 2020b, and new advanced data modelling approaches such as local modelling (Lesnoff et al., 2020;Shen et al., 2019), artificial intelligence (Mishra & Passos, 2021b), and domain adaption . ...
Article
Full-text available
A novel approach to developing robust calibration models for predicting dry matter in mango fruit is presented. The robust methodology includes automatic iterative downweighting of outlying samples during the chemo-metric modelling to learn a robust mathematical relationship between near-infrared (NIR) spectra and dry matter (DM). The robust models were compared with traditional partial least-squares (PLS) modelling by validation on independent data derived from four different cultivars of another origin and measured by a different instrument (without calibration transfer) from an open access mango dataset. The results showed that downweighting of several outliers in the robust modelling approach reduced the root mean squared error of prediction (RMSEP) from 1.03 % DM to 0.75 % DM compared to that achieved with a PLS model tested on samples of different cultivars. Furthermore, independent tests of the robust model on sample sets composed of data from different cultivars, origin, and measured with a different NIR instrument reduced the RMSEP from 2.06 % DM to 0.89 % DM without any need for model update and transfer. The robust models can help improve the prediction of fruit traits and are a further step in broadening the application of NIR spectroscopy in horticultural practice.
... Classical chemometric approaches (mainly principal component analysis PCA-and PLS-based techniques) combined with knowledge-driven spectroscopic pre-processing have proved to be successful in modelling NIR data. However, to further push model performances with the intention of handling data non-linearities, non-linear methods have been used [9]. ...
... It is expected that the continued adoption of new spectral technologies (e.g., handheld spectrometers) will contribute to the inversion of this tendency. Large (e.g., number of independent samples > 10 4 ) open NIR datasets are not common yet in the spectral community and are currently limited to fruit [9,30,31], soil [5,6], and seeds [28,32]. Most of the initial works on the application of DL to NIR spectral analysis have shown the potential of DL on small spectral data sets [4,12,14,16], comprising even fewer than 100 samples in some cases. ...
... In another example, DL was applied to NIR spectra for dry matter prediction of mango fruit [7] with~10 4 samples; the authors achieved better performance in terms of a lower root mean square error in prediction (RMSEP) than with a wide range of non-DL chemometric approaches [9,30]. Another work on soil spectroscopy [6], which also explored DL approaches based on~10 4 spectra and values of the corresponding reference properties, achieved better performance than non-DL chemometric approaches. ...
Article
Full-text available
Deep learning (DL) is emerging as a new tool to model spectral data acquired in analytical experiments. Although applications are flourishing, there is also much interest currently observed in the scientific community on the use of DL for spectral data modelling. This paper provides a critical and comprehensive review of the major benefits, and potential pitfalls, of DL for spectral data modelling. Although this work focuses on DL for the modelling of near-infrared (NIR) spectral data in chemometric tasks, many of the findings can be expanded to cover other spectral techniques. Finally, empirical guidelines on the best practice for the use of DL for the modelling of spectral data are provided.
... The original mango data set is composed of 4 harvest seasons 2015, 2016, 2017 and 2018 [75], and further detailed in Refs. [76,77]. The data set has Vis-NIR spectra (with 103 features per spectrum) of mango (the input variables, x) and corresponding dry matter content (DM in percentage units, %) measured in laboratory (target variables, y). ...
... As usual the test set is used only after model optimization to access the model's final performance. The data partition used here is the same as [19,76,77] to facilitate the comparison of results. For this specific chemometric problem related with fruit spectra, one of the main objectives is to produce models that are robust across harvest seasons. ...
... As a final remark, it might be useful to remember that this problem deals with inter-seasonal predictions of fruit properties, which makes it a hard problem to solve due to the biological variability of fruit. Nonetheless, the results obtained here using a simple CNN architecture and the proposed optimization of only 4 hyperparameters (for 500 trials) led to results on par with the best result presented in the broad machine learning benchmark of [77] obtained using an ensemble of models (see their Table 3). For this specific type of problem, a much more laborious optimization strategy that involves a more complex target objective function like the one presented in Ref. [19] can lead to even better results. ...
Article
Full-text available
Deep spectral modelling for regression and classification is gaining popularity in the chemometrics domain. A major topic in the deep learning (DL) modelling of spectral data is the choice and optimization of the deep neural network architecture suitable for the specific task of spectral modelling. Although there are several recent research articles already available in the chemometric domain showing advanced approaches to deep spectral modelling, currently, there is a lack of hands-on tutorial articles in this space that supply the non-expert user with practical tools to learn and implement advanced DL optimization methodologies aimed at spectral data. Hence, this tutorial article aims at reducing the gap between the non-expert user of DL in the chemometric community and the implementation of DL models for daily usage. This tutorial supplies a quick introduction to the state-of-the-art deep spectral modelling and related DL concepts and presents a set of methodologies aimed at DL hyperparameters’ optimization. To this end, this tutorial shows two practical examples on how to implement and optimize two DL models for spectral regression and classification tasks. The models are implemented in python and Tensorflow and the complete code is supplied in the form of two complementary notebooks.
... 6 Measurement of samples covering a wide variability is the first step to achieve global models; however, the second step is to properly model the data to develop robust models that not only learn the relationship between spectra and the property of interest but also learn the complex pattern such as the causes for seasonal, cultivar and ripeness related variabilities. 22,23 Several authors have widely tried linear, non-linear, global and local modelling approaches to achieve robust models and have achieved satisfactory performance. 6,22,23 A key feature of portable spectroscopy is its wide-scale usage compared to the scientific laboratory-based spectrometers. ...
... 22,23 Several authors have widely tried linear, non-linear, global and local modelling approaches to achieve robust models and have achieved satisfactory performance. 6,22,23 A key feature of portable spectroscopy is its wide-scale usage compared to the scientific laboratory-based spectrometers. 14,15,24 Such a wide-scale usage has an advantage as it can generate enormous information-rich data sets. ...
... 25 The data set was explored in several studies with traditional chemometric and machine learning approaches and the best root-mean-square error prediction (RMSEP) was noted as 0.84%. 22,23 Taking advantage of the size of such a large data set, recently a deep learning model was proposed. 26 That DL model outperformed all the earlier linear, non-linear, local, and global modelling approaches and reached a RMSEP of 0.79% on the same test set reported in earlier studies. ...
Article
Full-text available
Recently, a large near-infrared spectroscopy data set for mango fruit quality assessment was made available online. Based on that data, a deep learning (DL) model outperformed all major chemometrics and machine learning approaches. However, in earlier studies, the model validation was limited to the test set from the same data set which was measured with the same instrument on samples from a similar origin. From a DL perspective, once a model is trained it is expected to generalise well when applied to a new batch of data. Hence, this study aims to validate the generalisability performance of the earlier developed DL model related to DM prediction in mango on a different test set measured in a local laboratory setting, with a different instrument. At first, the performance of the old DL model was presented. Later, a new DL model was crafted to cover the seasonal variability related to fruit harvest season. Finally, a DL model transfer method was performed to use the model on a new instrument. The direct application of the old DL model led to a higher error compared to the PLS model. However, the performance of the DL model was improved drastically when it was tuned to cover the seasonal variability. The updated DL model performed the best compared to the implementation of a new PLS model or updating the existing PLS model. A final root-mean-square error prediction (RMSEP) of 0.518% was reached. This result supports that, in the availability of large data sets, DL modelling can outperform chemometrics approaches. K E Y W O R D S artificial intelligence, calibration transfer, deep learning, fruit-chemistry, spectroscopy
... In the case of fresh fruit analysis, the implementation of NIR is not as straightforward and very often the NIR calibration does not precisely predict the property of interest when the models are used on a new batch of fruit [1,2,5,9]. The term 'new batch' is broad but can be related to measurements performed on e.g., the samples measured at a different moment in time [4], different seasons [10], different cultivars/varieties [11], different geographic location [10] or different ripeness levels [10]. In addition, models may need adjustments if measurements are performed at different temperature conditions [10] or if components in the instrument are changed, such as the light source. ...
... In the case of fresh fruit analysis, the implementation of NIR is not as straightforward and very often the NIR calibration does not precisely predict the property of interest when the models are used on a new batch of fruit [1,2,5,9]. The term 'new batch' is broad but can be related to measurements performed on e.g., the samples measured at a different moment in time [4], different seasons [10], different cultivars/varieties [11], different geographic location [10] or different ripeness levels [10]. In addition, models may need adjustments if measurements are performed at different temperature conditions [10] or if components in the instrument are changed, such as the light source. ...
... In the case of fresh fruit analysis, the implementation of NIR is not as straightforward and very often the NIR calibration does not precisely predict the property of interest when the models are used on a new batch of fruit [1,2,5,9]. The term 'new batch' is broad but can be related to measurements performed on e.g., the samples measured at a different moment in time [4], different seasons [10], different cultivars/varieties [11], different geographic location [10] or different ripeness levels [10]. In addition, models may need adjustments if measurements are performed at different temperature conditions [10] or if components in the instrument are changed, such as the light source. ...
Article
Full-text available
Near-infrared (NIR) spectroscopy models for fresh fruit quality prediction often fail when used on a new batch or scenario having new variability which was absent in the primary calibration. In this study, to solve the challenge of updating NIR models related to fresh fruit quality properties, the use of a semi-supervised parameter-free calibration enhancement (PFCE) approach is proposed. Model updating with PFCE was shown in two ways: first where the model on the primary batch was updated individually for each new fruit batch, and second where the model was sequentially updated for the next batches. Further, for the first time, a case of updating an instrument transferred model is also presented. The PFCE approach was shown in two real cases related to moisture and total soluble solids prediction in pear and kiwi fruit. In the case of pear, the model was later updated for 3 new measurement batches, while, for kiwi, a commercial model was updated to incorporate the variability of a new experiment carried out in the laboratory environment. For each modelling demonstration, the performance was benchmarked with the partial least-square (PLS) regression analysis on the primary batch. The results showed that the models updated with a semi-supervised approach kept a high predictive performance on new measurement batches, without any need for parameter optimization. An instrument transferred model was also updated to keep its performance on different batches. Further, the sequential updating approach was found to be performing better than the update for individual batches, as the models were able to learn from multiple batches. Model updating with a semi-supervised approach allows the NIR spectroscopy of fresh fruits to be scalable and models can be shared between scientific or application community.
... The performance of the parallel CNNs is compared with the traditional single block CNNs based DL modelling [27] and with the commonly used multiblock chemometric approach called sequentially orthogonalized partial least-squares (SO-PLS) regression [17]. Finally, the results from the deep multiblock modelling approach were compared with the best reported results on the mango data set [32] using only the NIR part of the spectra [33,34]. ...
... The data in total have 11,691 Vis-NIR spectra (350e1200 nm in 3 nm sampling) and reference DM measurements performed on mango fruit across 4 harvest seasons 2015e2018. According to the description of the data [33,34], the spectral measurements were performed with F750 Produce Quality Meter (Felix Instruments, Camas, USA), while DM (%) was measured with oven drying (UltraFD1000, Ezidri, Beverley, Australia). The spectra, at the source repository, comes prepartitioned into training and test sets, in order to be able to make a fair comparison with the results previously reported on the data set [33,34]. ...
... According to the description of the data [33,34], the spectral measurements were performed with F750 Produce Quality Meter (Felix Instruments, Camas, USA), while DM (%) was measured with oven drying (UltraFD1000, Ezidri, Beverley, Australia). The spectra, at the source repository, comes prepartitioned into training and test sets, in order to be able to make a fair comparison with the results previously reported on the data set [33,34]. Out of 11,691 spectra, 10,243 training spectra are from the first three harvest seasons (2015e2017), while the remaining 1448 spectra (from 2018) are the independent test set. ...
Article
Full-text available
In the domain of chemometrics, multiblock data analysis is widely performed for exploring or fusing data from multiple sources. Commonly used methods for multiblock predictive analysis are the extensions of latent space modelling approaches. However, recently deep learning (DL) approaches such as convolutional neural networks (CNNs) have outperformed the single block traditional latent space modelling chemometric approaches such as partial least-square (PLS) regression. The CNNs based DL modelling can also be performed to simultaneously deal with the multiblock data but was never explored until this study. Hence, this study for the first time presents the concept of parallel input CNNs based DL modelling for multiblock predictive chemometric analysis. The parallel input CNNs based DL modelling utilizes individual convolutional layers for each data block to extract key features that are later combined and passed to a regression module composed of fully connected layers. The method was tested on a real visible and near-infrared (Vis-NIR) large data set related to dry matter prediction in mango fruit. To have the multiblock data, the visible (Vis) and near-infrared (NIR) parts were treated as two separate blocks. The performance of the parallel input CNN was compared with the traditional single block CNNs based DL modelling, as well as with a commonly used multiblock chemometric approach called sequentially orthogonalized partial least-square (SO-PLS) regression. The results showed that the proposed parallel input CNNs based deep multiblock analysis outperformed the single block CNNs based DL modelling and the SO-PLS regression analysis. The root means squared errors of prediction obtained with deep multiblock analysis was 0.818 %, relatively lower by 4 and 20 % than single block CNNs and SO-PLS regression, respectively. Furthermore, the deep multiblock approach attained ∼3 % lower RMSE compared to the best known on the mango data set used for this study. The deep multiblock analysis approach based on parallel input CNNs could be considered as a useful tool for fusing data from multiple sources.
... Each individual fruit data set was randomly split into train (80%) and test (20%) sets, distributed according to Table 1 and then concatenated into monolithic train and holdout test sets. Given the number of samples in this data set, this partition rate allows for the implemented deep learning models to capture intricate patterns and to undergo proper CNN hyperparameter optimization, while still leaving a sizeable portion of data for testing [14,26,45,47,48]. Spectra and DM distributions of individual fruit data types are shown in Fig. 1. ...
... The independent mango data set is partitioned the same way as used in Refs. [14,45], i.e., the training split is composed of data from harvest seasons 2015, 2016 and 2017 (n = 7413) while the test split contains data from the 2018 harvest season (n = 1448). To take advantage of all the samples in the multi-fruit data set, both PLS and CNN-R_v1E were trained on the full multi-fruit data set (train and test data aggregated). ...
Article
Full-text available
Convolutional Neural Networks (CNNs) have proven to be a valuable Deep Learning (DL) algorithm to model near-infrared spectral data in Chemometrics. However, optimizing CNN architectures and their associated hyperparameters for specific tasks is challenging. In this study, we explore the development of 1D-CNN architectures for the task of fruit dry matter (DM) estimation, testing various designs and optimization strategies to achieve a generic DL model that is robust against data fluctuations. The models are built using a multi-fruit data set (n=2397) that includes NIR spectra of apples, kiwis, mangoes, and pears. The obtained CNN models are compared with PLS (taken as baseline), and to Locally Weighted PLS (LW-PLS) models. In general, the optimized CNN architectures obtained lower RMSEs (best RMSE=0.605 %) than PLS (RMSE=0.892 %) and LW-PLS (RMSE=0.687 %) on a holdout test set. For this specific task, CNNs start outperforming PLS when the number of training samples is around 500. Furthermore, it is also shown how a global CNN model, trained on multi-fruit data, performs against PLS models of individual fruits in the sub-tasks of individual fruit DM prediction and generalization to an external mango data set. Overall, with proper architecture optimization, CNNs show strong performance and generalization for NIR-based dry matter estimation across diverse fruits.
... Both modalities are complementary and are useful for non-destructive fruit analysis at different levels of fresh fruit chain. For example, the point-based sensing which is available in portable, or pocket forms is highly suitable for measuring fresh fruit during their early stages such as while the fruit are still on tree or plant [8,9]. This is possible as the portable spectrometer can be brought close to the fruit in tree or plant without any need to pluck the fruit from tree, leading to minimal fruit wastage as well no need for chemical destructive analysis, enabling monitoring development in time on individual fruits. ...
... In practice, different models should be created for different kiwifruit varities/ cultivars. Altough in presence of data from several kiwifruit varities/ cultivars, it is advisable to develop global models as recently performed in a recent study related to mango fruit [8,9]. However, developing global models was out of scope in this work due to limited sample size and will be explored in future works. ...
... Extrapolating in time is often difficult because the future is different from the past. The dataset of mangoes by Anderson et al [4,5] is a good example: Using spectra from 3 years, the goal is to predict dry matter (DM) content in the next year. Thus, we want a neural network configuration that doesn't overfit the past but extrapolates well to the future. ...
... As in previous works [4,5,3], the 2018 harvest season is used as the test set for evaluation. For a predictive model in the fruit industry to be practical, it must be robust to year-to-year variations. ...
Preprint
Full-text available
Neural networks are configured by choosing an architecture and hyperparameter values; doing so often involves expert intuition and hand-tuning to find a configuration that extrapolates well without overfitting. This paper considers automatic methods for configuring a neural network that extrapolates in time for the domain of visible and near-infrared (VNIR) spectroscopy. In particular, we study the effect of (a) selecting samples for validating configurations and (b) using ensembles. Most of the time, models are built of the past to predict the future. To encourage the neural network model to extrapolate, we consider validating model configurations on samples that are shifted in time similar to the test set. We experiment with three validation set choices: (1) a random sample of 1/3 of non-test data (the technique used in previous work), (2) using the latest 1/3 (sorted by time), and (3) using a semantically meaningful subset of the data. Hyperparameter optimization relies on the validation set to estimate test-set error, but neural network variance obfuscates the true error value. Ensemble averaging - computing the average across many neural networks - can reduce the variance of prediction errors. To test these methods, we do a comprehensive study of a held-out 2018 harvest season of mango fruit given VNIR spectra from 3 prior years. We find that ensembling improves the state-of-the-art model's variance and accuracy. Furthermore, hyperparameter optimization experiments - with and without ensemble averaging and with each validation set choice - show that when ensembling is combined with using the latest 1/3 of samples as the validation set, a neural network configuration is found automatically that is on par with the state-of-the-art.
... Basically, it is a case of standard free calibration transfer where the free access dataset measured with a similar instrument was generalized to a new local instrument. The free access data set used in this study is the mango data set [40][41][42] for which both spectra and reference moisture content values are available. Please note that the mango data set is based on mango samples measured in Australia. ...
... After drying, the dried fruit weight was measured, and the MC was estimated and expressed in %. The spectral range used for modelling was the same (684-990 nm) as recommended in earlier publications [40,42,43] on the open-access mango data set. Please note that the original free access mango data set has more than 10k measurements. ...
Article
Full-text available
Multivariate spectral signals are highly correlated. Often, variable selection techniques are deployed, aiming at model optimization, identification of key variables to explore the underlying physicochemical system or development of a cheap multi-spectral system based on key variables. However, many times the selected variables do not supply a good estimate of properties when tested on a new setting such as new measurements performed on a different spectrometer, different physical or chemical state of the samples and difference in the environmental factors around the experiment. Often the model based on variables selected in the first domain (specific conditions/instrument) does not generalize on the new domain (specific conditions/instrument). To deal with it, in the present work a new method to variable selection called domain invariant covariate selection (di-CovSel) is proposed. The method selects the most informative variables which are invariant to the differences in the instruments, physical or chemical state of the samples and the differences in the environmental factors around the experiment. The method is inspired by domain invariant partial least-square (di-PLS) and the covariate selection (CovSel). The potential of the method is demonstrated on four real cases related to the calibration of near-infrared (NIR) spectroscopy on agri-food materials. The results show that in all the cases, the domain invariant features selected by the di-CovSel have low prediction error compared to the standard variable selection with the CovSel approach when the models are tested on a new data domain. In summary, domain invariant features selected across domains support the development of calibration models with good generalization and supply a better understanding of the system by bypassing the external factors originating from differences in the instruments, physical or chemical states of the samples and the differences in the environmental factors around the experiment. Note that one key feature of the proposed method is that the most important variables which generalize well across domains can be identified without requiring reference measurements in the target domain.
... Ví dụ iển hình v màu s c cho th y nhi u lo i quả nh xo i v u da xanh có màu vỏ kh ng i ng kể qu trình h n v th y i màu trong thịt quả iễn ra. Nhi u chỉ tiêu ch t l ng nội quả khác và ít ph bi n h n ũng ó thể ớ l ng b ng ph ng ph p ph n t h ph cận hồng ngo i [40], [44], [45]. Tuy nhiên, c n l u ý n ộ dày c a vỏ khi áp dụng ph ng ph p này và c n có sự chọn lựa phù h p một trong h ộ thu ph ận hồng ngo i ph bi n. ...
... ộ tr ởng th nh sinh lý a quả có thể nh gi th ng qu v th ờng t ng qu n thuận với tỉ lệ thịt quả h y h m l ng tinh bột nh gi ng ộ tr ởng thành sinh lý giúp thu ho h ng thời iểm nhờ ó giảm thiệt h i kinh t trong tr ờng h p h n ch quả chín dẫn n giảm khả n ng vận huyển, r t ng n thời gi n l u tr v thời h n sử ụng s u thu ho h [41], [53]. Vì th , ch t l ng một số lo i trái cây nh xo i, kiwi ó thể nh gi ể phân lo i thông qua DM [40], [45]. h y, ặc biệt l ph ng ph p dựa trên phân tích quang ph có thể kh ng ph h p ể áp dụng ph quát cho nhi u tr ờng h p (ví dụ nh kh giống cùng vùng trồng trọt, hay cùng giống khác vùng trồng trọt). ...
Article
Full-text available
Đánh giá chất lượng và phân loại trái cây bằng phương pháp không phá hủy đang được quan tâm nhiều trong những năm gần đây. Nhằm định hướng nghiên cứu về lĩnh vực phân tích chất lượng không phá hủy và đề xuất các đối tượng trái cây tiềm năng cho những nghiên cứu trong tương lai, 140 công trình đánh giá chất lượng trái cây không phá hủy thuộc cơ sở dữ liệu Scopus trong giai đoạn từ năm 2016 đến tháng 6 năm 2021 đã được thống kê và phân tích. Từ đó, mức độ quan tâm đối với các loại trái cây, phương pháp và kết quả tốt nhất tương ứng với từng phương pháp cũng được xác định. Kết quả nghiên cứu cho thấy, phương pháp phân tích quang phổ khả kiến – cận hồng ngoại đang được đặc biệt quan tâm. Bên cạnh xoài và táo là hai loại trái cây được quan tâm hàng đầu, các nghiên cứu cũng nên tập trung vào các loại trái cây tiềm năng thuộc nhóm có mức độ quan tâm thấp hơn vì số lượng công bố khoa học liên quan có thể chưa đáp ứng được nhu cầu thực tế. Đặc biệt, phát triển công nghệ phân tích không phá hủy phù hợp và đặc thù với giống cùng khu vực trồng cũng nên xem xét cho các đối tượng trái cây có chỉ dẫn địa lý và giá trị kinh tế cao.
... Previous studies have shown the potential benefits of using artificial neural networks (ANN) to assess the food quality of foods and agricultural items such as potatoes, coconuts, durians, and olive oil (Caladcad et al., 2020;Rady et al., 2015;Sankar et al., 2017). Anderson et al. (2021) used an ANN model to forecast the dry matter content of mangos, and they achieved outstanding results in all parameters. Sadhu et al. (2020) established the link between fried fish temperature, cooking time, oil content, and nutritional value. ...
Article
Full-text available
The food industry uses artificial intelligence (AI) to enhance food quality and security while proposing significant capital savings and resource optimization. Additionally, understanding machine learning (ML) techniques is essential for their effectiveness. Therefore, the gap lies in examining how industrial automation plays a crucial role in successfully implementing this new technology. To address this gap, this review explores AI’s potential to significantly enhance food safety by creating a more transparent supply chain management system. Therefore, the primary focus is exploring potential AI applications, such as artificial neural networks (ANN) and convolutional neural networks (CNN), for detecting food and agricultural product quality. The primary goal of utilizing these AI applications is to reduce human intervention and effort. These methodologies have advantages and disadvantages regarding theoretical knowledge and model interpretation.
... However, using this approach may fail if the differentiation of measurements in the predicted samples is significant. Anderson et al. [40] developed learning models using artificial neural networks for different batches of mangoes spanning four seasons and predicted different categories of samples with high prediction accuracy, whereas this approach requires sufficiently large samples with sufficiently strong computational power. The results of this study suggest that updated sample selection should be carried out and combined with the model updating strategy, which implies that not only is the predictive power of the updated models improved, but the models is also simplified. ...
Article
Full-text available
When using near-infrared (NIR) techniques combined with machine learning to identify the origin of apples, the construction and updating of discrimination models is essential. However, when constructing and updating the models, due to the high dimensionality of NIR data and the redundancy of the updated sample set can lead to irrational sample partitioning and the selected updated samples are not effective enough, which in turn affects the prediction performance of the NIR models. Therefore, in this study, local linear reconstruction combined with spectral information entropy (LLR–SIE) as the sample selection method is proposed. The two traditional methods, the Kennard-Stone (KS) and the sample set portioning based on joint x–y distance (SPXY) are used to compare and verify the superiority of the proposed method. Discrimination models was constructed after dividing the sample set using LLR–SIE method, KS method and SPXY method, and the results of the divided sample set were visually expressed. And using the above three methods, the updated samples set were selected for updating the initial discrimination models, and the spectra features of the updated samples set selected based on different methods were demonstrated. The results show that the models constructed using the LLR–SIE method after sample partitioning obtained an accuracy of 92.7% and 91% in predicting the two batches of samples, and reached a double high in both recall and precision, with both metrics above 0.9. This improves the prediction accuracy by at least 4% relative to the accuracy obtained from the models constructed after dividing the samples set using KS. The prediction accuracy of the models constructed after dividing the sample set using the SPXY method was 91% and 86.5%, and the recall and precision were low. In terms of model updating, when different numbers of updated samples are used for model updating, the accuracy obtained from updating the discrimination models by selecting updated samples using the LLR–SIE method is higher than that obtained by updating the model using the KS method and the SPXY method. After 150 updated samples were selected, the use of the LLR–SIE method resulted in the updated model prediction accuracy of 90% on the F-Measure metric. The results proved that the prediction accuracy of the model built after sample partitioning using the LLR–SIE method is better than that built after partitioning by the two traditional methods. In addition, with the same number of updated samples selected, the updated samples selected using the LLR–SIE method are more effective compared to the traditional methods, resulting in higher comprehensive evaluation indexes of the updated model. The LLR–SIE method combined with the NIR technique can provide a new solution to the problem of model construction and updates.
... Combining with chemometrics and machine learning can make better use of the chemical composition and structural information in the sample to achieve accurate identification and analysis of the sample. Scholars are exploring various nonlinear calibration methods, including linear discriminant analysis (LDA), partial least squares regression (PLSR), least squares support vector machine (LSSVM), and support vector regression (SVR), to build a more robust model for quantitative analysis (Anderson et al., 2021;Zhang et al., 2020). These studies have revealed that nonlinear modeling approaches often outperform traditional linear techniques. ...
Article
Full-text available
To enhance the precision of detecting moldy core disease in apples, near-infrared (NIR) spectroscopy was employed for quickly and non-destructive detection. The impact of lighting patterns and sample positioning on detection efficacy was investigated, with optical simulation methods being utilized. Discrimination models for moldy core were developed using support vector machines (SVM) and particle swarm optimization-least squares support vector machine (PSO-LSSVM), allowing for the optimal lighting pattern to be determined based on the results of these models. After that, the discrimination models of moldy core in the three sample positionings were developed, and the optimal sample positioning was determined. Finally, interval combination optimization (ICO)-competitive adaptive reweighted sampling (CARS) method was used to screen the feature wavelengths for moldy core under the optimal lighting pattern and sample positioning. The results show that 90° + 180° combined lighting pattern is the optimal lighting pattern for detection of moldy core in apples. The model built by PSO-LSSVM with normalization + Gaussian filter smoothing + detrended fluctuation analysis (NOR + GFS + Detrend) has the best performance; the sensitivity, specificity, and accuracy of the model in prediction set are 93.75%, 100%, and 96.83%, respectively. T1 is the optimal sample positioning under the 90° + 180° combined lighting pattern, and the sensitivity, specificity, and accuracy of the best SVM model are 91.89%, 94.44%, and 93.15%, respectively. After ICO-CARS screening, the number of modeling variables accounts for only 1.6% of the original wavelength variables, effectively simplifying the classification model. This study provides technical support for the rapid non-destructive and high-precision detection of moldy core in apples.
... However, the high occurrence of non-linearity in the dataset, such as the variation mentioned above, is the primary cause of significant errors in analyte determinations (Blanco et al., 2000). Scholars are trying to utilize various nonlinear calibration methods (polynomial partial least squares regression (Poly-PLSR), spline partial least squares regression (Spline-PLSR), gaussian process regression (GPR), support vector regression (SVR), artificial neural network (ANN), etc.), to build a more robust model for quantitative analysis (Anderson et al., 2021;Balabin and Lomakina, 2011;Balabin et al., 2007). They found that the nonlinear modelling methods could achieve better results than conventional linear methods. ...
Article
The visible/near-infrared (Vis/NIR) spectroscopy technique has been widely used for the online detection of soluble solids content (SSC) in apples. However, external factors such as sample size and detection position can cause spectral distortion, resulting in a decline in detection accuracy. In this study, we aimed to develop a more robust prediction model that can resist the impact of sample size and detection position on the model. Firstly, we collected and analyzed the transmission spectra of apple samples under different sizes and detection positions using a self-designed Vis/NIR spectroscopy online acquisition device. It was found that the effect of fruit size and detection position on Vis/NIR spectra was due to optical path difference. Thus, a diameter correction method was utilized to uniformly correct the obtained absorbance spectra. The performance of local models achieved better results after correction. And then, global models with various preprocessing methods were developed. To further improve the model performance, changeable size moving window (CSMW) and competitive adaptive reweighted sampling (CARS) were utilized to select the effective wavelengths. After that, one dimensional-convolutional neural network (1D-CNN) model was constructed, which outperformed the other models without any pre-processing and optimization methods, and the values of R 2 Cal , RMSEC, R 2 Pre , and RMSEP are 0.953, 0.254 %, 0.900, and 0.371 %, respectively. In this study, conventional PLSR modelling methods and deep 1D-CNN method were compared under the influence of the two external factors, and the result showed that 1D-CNN can serve as a more convenient alternative for apple online SSC determination, which could significantly reduce the complexity of the Vis/NIR spectroscopy modeling process.
... Nonetheless, the ANN could predict the CO 2 and ethylene production with less error rather than SVR models. Considering the data set and the aim of prediction, the accuracy and error of ANN and support vector machine (SVM) with different algorithms were compared by many researchers (Anderson et al., 2021;Cao et al., 2022;Mogollon et al., 2020). ...
Article
Full-text available
In this study, the fresh and in‐packaged apricots were treated with dielectric barrier discharge (DBD) cold plasma for 5, 10, and 15 min and then stored at 21°C for 12 days to simulate the shelf life of the apricot. The effects of DBD treatment on the main quality attributes of apricot such as physicochemical traits (mass loss, pH, soluble solid content, titratable acidity, and skin color), mechanical properties (Young's modulus, tangent modulus, and bioyield stress), in‐package gas composition and ethylene production were investigated during the storage time. In addition, the bruise susceptibility (BS) of control and treated samples at the microscale was evaluated by using pendulum test and scanning electron microscopy. The results of the mass loss, pH, soluble solid content, titratable acidity, skin color, and bioyield stress have been applied for input parameters of the developed artificial neural network (ANN) and support vector regression (SVR) models to predict the CO2 and ethylene production. The statistical data showed the performance of the developed ANN to predict the CO2 (R² = 0.983, root mean square error [RMSE] = 0.486) and ethylene production (R² = 0.933, RMSE = 5.376) was superior to SVR (CO2: R² = 0.894, RMSE = 6.077 and ethylene: R² = 0.759, RMSE = 14.117). These results indicated that intelligent methods are effective and robust tools for predicting the quality parameters of fresh fruits in the postharvest process. Practical Applications Advancement of technology and artificial intelligence leads to the tendency of many food industry engineers and researchers in the R&D department to use this technology to classify and predict various food industry processes. In the process of packaging products, there are challenges such as the amount of produced carbon dioxide and ethylene inside the package, which affects the shelf life of fresh agricultural products. In this research, artificial neural networks and support vector machine methods have been used to predict the two mentioned parameters. This approach can be used for the digitalization of the packaging process of agricultural and horticultural products and reduce the number of experimental activities. In addition, it is possible to improve the digitalization process by the fusion of smart sensors with machine learning methods.
... Malvandi et al. (2022) and Wang et al. (2022a) compared multiple linear regression (MLR), artificial neural networks (ANN), and PLS algorithms to predict the ripeness of the fruit, and both concluded that ANN has more advantages (Clark et al., 2003). However, Anderson et al. (2020Anderson et al. ( , 2021 and Pissard et al. (2021) obtained very good results when using MLR and PLS. This proves that applying the right preprocessing, prediction, or classification methods can maximize the benefits of spectra. ...
Article
Full-text available
Objectives The quality of the fruit seriously affects the economic value of the fruit. Fruit quality is related to many ripening parameters, such as soluble solid content (SSC), pH, and firmness (FM), and is a complex process. Traditional methods are inefficient, do not guarantee quality, and do not adapt to the current rhythm of the fruit market. In this paper, a was designed and implemented for quality prediction and maturity level classification of Philippine Cavendish bananas. Materials and Methods The quality changes of bananas in different stages were analyzed. Twelve light intensity reflectance values for each maturity stage were compared to conventionally measured SSC, FM, PH, and color space. Results Our device can be compared with traditional forms of quality measurement. The experimental results show that the established predictive model with specific preprocessing and modeling algorithms can effectively determine various banana quality parameters (SSC, pH, FM, L*, a*, and b*). The RPD values of SSC and a* were greater than 3.0, the RPD values of L* and b* were between 2.5 and 3.0, and the pH and FM were between 2.0 and 2.5. In addition, a new banana maturity level classification method (FSC) was proposed, and the results showed that the method could effectively classify the maturity level classes (i.e. four maturity levels) with an accuracy rate of up to 97.5%. Finally, the MLR and FSC models are imported into the MCU to realize the near-range and long-range real-time display of data. Conclusions These methods can also be applied more broadly to fruit quality detection, providing a basic framework for future research.
... Finally, Ding, et al. [113] compared the use of Radial Basis Function Neural Networks (RBF-NN) and PLS in dehydrated tomato samples, finding that RBF-NN performed better for the lycopene, total phenolic content and total antioxidant capacity measured by the DPPH and ABTS assays, while PLS performed better for the prediction of total antioxidant capacity via the FRAP method. Furthermore, only this one study used a deep learning or ANN algorithms (in this case, RBF-NN), although machine learning is a topic of increasing interest for other areas of IR spectroscopy [149,150]. ...
Article
Full-text available
Infrared spectroscopy (wavelengths ranging from 750-25,000 nm) offers a rapid means of assessing the chemical composition of a wide range of sample types, both for qualitative and quantitative analyses. Its use in the food industry has increased significantly over the past five decades and it is now an accepted analytical technique for the routine analysis of certain analytes. Furthermore, it is commonly used for routine screening and quality control purposes in numerous industry settings, albeit not typically for the analysis of bioactive compounds. Using the Scopus database, a systematic search of literature of the five years between 2016 and 2020 identified 45 studies using near-infrared and 17 studies using mid-infrared spectroscopy for the quantification of bioactive compounds in food products. The most common bioactive compounds assessed were polyphenols, anthocyanins, carotenoids and ascorbic acid. Numerous factors affect the accuracy of the developed model, including the analyte class and concentration, matrix type, instrument geometry, wavelength selection and spectral processing/pre-processing methods. Additionally, only a few studies were validated on independently sourced samples. Nevertheless, the results demonstrate some promise of infrared spectroscopy for the rapid estimation of a wide range of bioactive compounds in food matrices.
... In the weighted averaging method, the weighted average is used to average the predicted values from multiple base models. Finally, the stacking method builds a meta-model that uses the predicted values from multiple base models as its features (Anderson et al., 2021;Shen et al., 2020). In this study, we adopted ensemble learning with PLS, SVM, and ANN models as the base models and SVM was used as the meta-model in the stacking method. ...
Article
Full-text available
Mealiness is a phenomenon in which intercellular adhesions in apples loosen during storage, causing a soft and floury texture at the time of eating, and leading to lower consumer preference. Although apples can be stored and commercially sold throughout the year, the occurrence of mealiness is not monitored during storage. Therefore, the objective of this research was to non-destructively estimate the mealiness of apple fruit by means of laser scattering measurement. This method is based on laser light backscattering imaging but can quantify a wider range of backscattered light than the conventional method by utilizing high dynamic range (HDR) rendering techniques. Lasers with wavelengths of 633 nm and 850 nm were used as a light source, and after acquiring backscattered images, profiles, and images were obtained. Profile features such as curve fitting coefficients and profile slopes and image features such as statistical image features and texture features were extracted from the profiles and images, respectively. PLS, SVM, and ANN models were used for the estimation of mealiness. The results of the estimation based on these features showed that the ANN model combining both wavelengths had a higher performance (R = 0.634, RMSE = 7.621) than the models constructed from features of single wavelength measurements. In order to further improve the performance of the model, we applied various ensemble learning methods to combine different estimation models. As a result, the ensemble model showed the highest performance (R = 0.682, RMSE = 7.281). These results suggest that laser scattering measurement is a promising method for estimating apple fruit mealiness.
... The prediction results of one master calibration model to measure the LC of different batches of snow pear samples has always had large errors based on NIR spectroscopy (Nicolaï et al., 2008). The "different batches" usually referred to the different measurement times, different seasons, different geographical locations, and different fruit maturity of snow pear samples (Anderson et al., 2021). Moreover, changes in the ambient temperature of NIR spectrum acquisition and the instrument components (such as the light source) could affect the accuracy and robustness of the calibration model. ...
Article
Full-text available
Snow pear is very popular in southwest China thanks to its fruit texture and potential medicinal value. Lignin content (LC) plays a direct and negative role (higher concentration and larger size of stone cells lead to thicker pulp and deterioration of the taste) in determining the fruit texture of snow pears as well as consumer purchasing decisions of fresh pears. In this study, we assessed the robustness of a calibration model for predicting LC in different batches of snow pears using a portable near-infrared (NIR) spectrometer, with the range of 1033–2300 nm. The average NIR spectra at nine different measurement positions of snow pear samples purchased at four different periods (batch A, B, C and D) were collected. We developed a standard normal variate transformation (SNV)-genetic algorithm (GA) -the partial least square regression (PLSR) model (master model A) - to predict LC in batch A of snow pear samples based on 80 selected effective wavelengths, with a higher correlation coefficient of prediction set (Rp) of 0.854 and a lower root mean square error of prediction set (RMSEP) of 0.624, which we used as the prediction model to detect LC in three other batches of snow pear samples. The performance of detecting the LC of batch B, C, and D samples by the master model A directly was poor, with lower Rp and higher RMSEP. The independent semi-supervision free parameter model enhancement (SS-FPME) method and the sequential SS-FPME method were used and compared to update master model A to predict the LC of snow pears. For the batch B samples, the predictive ability of the updated model (Ind-model AB) was improved, with an Rp of 0.837 and an RMSEP of 0.614. For the batch C samples, the performance of the Seq-model ABC was improved greatly, with an Rp of 0.952 and an RMSEP of 0.383. For the batch D samples, the performance of the Seq-model ABCD was also improved, with an Rp of 0.831 and an RMSEP of 0.309. Therefore, the updated model based on supervision and learning of new batch samples by the sequential SS-FPME method could improve the robustness and migration ability of the model used to detect the LC of snow pears and provide technical support for the development and practical application of portable detection device.
... But at some point, over-fitting and slow training are still huge challenges when it comes to ANN applications. Anderson et al. (2021Anderson et al. ( , 2020 theorize that if you want real-time monitoring, it's best not to use ANN directly, but if you use it in automated cloud-based systems, the results are the best. ...
Article
Full-text available
The global fruit industry is continually confronted with new technological challenges to meet people's material quality of life expectations. Fruit maturity has a strong association with the receiving time, transportation technique, and storage method of the fruit, and it has a direct impact on the fruit's quality. How to undertake rapid and non-destructive fruit quality testing has become a prominent topic in recent years. Because of its high repeatability, ease of operation, pollution-free, and measurement stability, visible and near-infrared (Vis/NIR) spectroscopy has become the most advanced non-destructive quality assessment technique in terms of equipment, applications, and data analysis methods in the field of non-destructive monitoring. An overview of the use of Vis/NIR optical biosensors in fruit internal quality monitoring and variety identification is presented. The benefits and drawbacks of various types of optical biosensors, as well as the practicality of various measurement modalities, are explored. Commonly used spectral biosensor data processing methods are summarized, including preprocessing, variable selection, calibration, and validation. Finally, the transition of pricey handheld NIR equipment to more cost-effective photodiode-based fruit maturity estimate devices was indicated as an issue for further investigation.
... Recently, visible-near infrared (Vis-NIR) spectroscopy has been increasingly used in studies for non-destructive determination of ingredients of fruits (32)(33)(34)(35)(36) and especially of tomato (22,23,37,38). The rapid determination of SSC in an intact fruit or in a sample homogenized from it is a difficult task due to its high water content (39). ...
Article
Full-text available
Tomato-based products are significant components of vegetable consumption. The processing tomato industry is unquestionably in need of a rapid definition method for measuring soluble solids content (SSC) and lycopene content. The objective was to find the best chemometric method for the estimation of SSC and lycopene content from visible and near-infrared (Vis-NIR) absorbance and reflectance data so that they could be determined without the use of chemicals in the process. A total of 326 Vis-NIR absorbance and reflectance spectra and reference measurements were available to calibrate and validate prediction models. The obtained spectra can be manipulated using different preprocessing methods and multivariate data analysis techniques to develop prediction models for these two main quality attributes of tomato fruits. Eight different method combinations were compared in homogenized and intact fruit samples. For SSC prediction, the results showed that the best root mean squared error of cross-validation (RMSECV) originated from raw absorbance (0.58) data and with multiplicative scatter correction (MSC) (0.59) of intact fruit in Vis-NIR, and first derivatives of reflectance (R² = 0.41) for homogenate in the short-wave infrared (SWIR) region. The best predictive ability for lycopene content of homogenate in the SWIR range (R² = 0.47; RMSECV = 17.95 mg kg–1) was slightly lower than that of Vis-NIR (R² = 0.68; 15.07 mg kg–1). This study reports the suitability of two Vis-NIR spectrometers, absorbance/reflectance spectra, preprocessing methods, and partial least square (PLS) regression to predict SSC and lycopene content of intact tomato fruit and its homogenate.
... Introducing new season samples usually further improves the test performance of direct model extrapolation (Figs. 6 and 7, Tables A8 and A9). Existing studies have demonstrated the model developed on Seasons 1 to 3 could provide satisfactory predictions for Season 4's samples using the optimized ensemble modeling and data preprocessing techniques (Anderson et al., 2020(Anderson et al., , 2021. Our results indicate the model fine-tuning method using limited updating data, e.g., 10% to 20% samples of the new season, has the potential to improve performance and alleviate the optimization need for developing robust models towards multiple seasons. ...
Article
Model updating for developed calibrations is critical for robust spectral analysis in fruit quality control. Existing methods have limitations that usually need sufficient samples for model recalibration and are mainly designed for conventional linear models. This study proposes a model fine-tuning approach to update nonlinear deep learning models using limited sample sizes for fruit detection under interseason variation. This approach provides RMSE of 0.407, 1.035, and 0.642, for predicting soluble solid content (%) or dry matter content (%), in the Cuiguan pear, Rocha pear, and Mango dataset. The proposed approach reduces at least 9.2%, 17.5%, and 11.6% of test RMSE in three datasets compared with conventional model updating methods, including the global model (optimized between the global CNN and global PLS model), recalibration and slope/bias correction based on the PLS model. The model fine-tuning approach shows improved reliability under different updating sample sizes, ranging from 5% to 20% proportions of the new season's samples. The utilization of cumulative data in multiple previous seasons enables further improved performance. This study potentially facilitates implementing high-performance deep learning approaches in on-site applications of fruit quality control.
... Based on the LeNet architecture, a classification accuracy of 100% for 140 training data and 20 epochs was achieved. In recent studies by Anderson et al. (2021), the dry matter content prediction of mango was measured using ANN model, leading to the attainment of satisfactory results in all parameters (prediction statistics, model build speed, and prediction speed). Estrada-Pérez et al. (2021) reported on the application of CNN model to detect adulteration in rice flour. ...
Article
Full-text available
A rising awareness for quality inspection of food and agricultural products has generated a growing effort to develop rapid and non-destructive techniques. Quality detection of food and agricultural products has prime importance in various stages of processing due to the laborious processes and the inability of the system to measure the whole of the food production. The detection of food quality has previously depended on various destructive techniques that require sample destruction and a large amount of postharvest losses. Artificial Intelligence (AI) has emerged with big data technologies and high-performance computation to create new opportunities in the multidisciplinary agri-food domain. This review presents the key concepts of AI comprising an expert system, artificial neural network (ANN), and fuzzy logic. A special focus is laid on the strength of AI applications in determining food quality for producing high and optimum yields. It was demonstrated that ANN provides the best result for modelling and effective in real-time monitoring techniques. The future use of AI for assessing quality inspection is promising which could lead to a real-time as well as rapid evaluation of various food and agricultural products.
... These results matched those of other studies and the literature, illustrating biosolids' function as rich organic fertilizer with both macro and micronutrients. Being a clear reflection pertaining to the overall development obtained during the whole season, they are also a strong indicator of the efficiency of soil texture as a source of nutrients [23]. In other words, the results clearly illustrate the efficient functionality of biosolids as an organic fertilizer and highlight their role in developing a good biomass over the whole life cycle of petunia plants. ...
Article
Full-text available
This study was conducted to evaluate the effect of three different rates of municipal biosolids produced in Qatar on plant characteristics and soil texture and its potential impacts on groundwater. Petunia atkinsiana, was used in this study. The experiment took place in a greenhouse in pots with soil mixed with 0, 3, 5, and 7 kg/m2 biosolids. Pelletized class A biosolids from the Doha North Sewage Treatment Plant were used. Results revealed significant differences in all measured parameters, which were affected by biosolid treatments compared to the control treatment. Electrical conductivity, pH, macro and micronutrients and heavy metals were significantly affected by biosolid treatments. The comparison of the discovered levels against the international acceptable ceilings of pollutants indicated the advantages of utilizing class A biosolids, as they were well below the international acceptable levels and showed the best test rates, indicating that the product is a sustainable and efficient organic fertilizer for ornamental plants. Furthermore, the results highlight no potential significant impacts on groundwater due to trace presence of heavy metals, owing to the nature of deep groundwater in Qatar and the usage of modern irrigation devices that fulfil the exact needs of plants in a harsh climate and high evaporation rate.
Article
Background pH and total soluble solids (TSS) are important quality parameters of mangoes; they represent the acidity and sweetness of the fruit, respectively. This study predicts the pH and TSS of intact mangoes based on near-infrared (NIR) spectroscopy using multi-predictor local polynomial regression (MLPR) modeling. Herein, the prediction performance of kernel partial least square regression (KPLSR), support vector machine regression (SVMR), and MLPR is compared. Methods For this purpose, 186 intact mango samples at three different maturity stages are used. Prediction models are built using MLPR, KPLSR, and SVMR based on untreated and treated spectra. The best regression model for predicting pH is MLPR based on Gaussian filter smoothing spectra. Moreover, the TSS value is more accurately predicted using MLPR based on Savitzky–Golay smoothing. Results The findings reveal that MLPR is highly accurate in estimating the pH and TSS of mangoes, with mean absolute percentage error (MAPE) values less than 10 %. In addition, the MLPR model has the best predictive performance with the lowest Mean Squared error (MSE) and root mean squared error (RMSE) values and the highest R2 value. Conclusions The use of NIR spectroscopy in combination with multi-predictor local polynomial regression could provide a quick and non-destructive technique for predicting mango quality. Thus, the results of this study help support sustainable production as a sustainable development goal.
Article
Modeling near‐infrared (NIR) spectral data to predict fresh fruit properties is a challenging task. The difficulty lies in creating generalized models that can work on fruits of different cultivars, seasons, and even multiple commodities of fruit. Due to intrinsic differences in spectral properties, NIR models often fail in testing, resulting in high bias and prediction errors. One current solution for achieving generalized models is to use large calibration sets measured over multiple cultivars and harvest seasons. However, current practice primarily focuses on calibration sets for single fruit commodities, disregarding the rich information available from other fruit commodities. This study aims to demonstrate the potential of locally weighted partial least‐squares an example of just‐in‐time (JIT) modeling to develop real‐time models based on calibration sets consisting of multiple fruit commodities. The study also explores JIT modeling for leveraging relevant information from other fruit commodities or adapting the model based on new samples. The application demonstrated here predicts the dry matter in fresh fruit using portable NIR spectroscopy. The results show that JIT modeling is particularly effective for multiple fruit commodities in a single calibration set. The JIT models achieved a root mean squared error of prediction (RMSEP) of 0.69% fresh weight (FW), while the traditional partial least squares (PLS) modeling RMSEP was 0.93% FW. JIT modeling can be particularly beneficial when the user has multiple fruit datasets and wants to combine them into a single dataset to utilize all the relevant information available.
Article
Full-text available
As the biopharmaceutical industry looks to implement Industry 4.0, the need for rapid and robust analytical characterization of analytes has become a pressing priority. Spectroscopic tools, like near‐infrared (NIR) spectroscopy, are finding increasing use for real‐time quantitative analysis. Yet detection of multiple low‐concentration analytes in microbial and mammalian cell cultures remains an ongoing challenge, requiring the selection of carefully calibrated, resilient chemometrics for each analyte. The convolutional neural network (CNN) is a puissant tool for processing complex data and making it a potential approach for automatic multivariate spectral processing. This work proposes an inception module‐based two‐dimensional (2D) CNN approach (I‐CNN) for calibrating multiple analytes using NIR spectral data. The I‐CNN model, coupled with orthogonal partial least squares (PLS) preprocessing, converts the NIR spectral data into a 2D data matrix, after which the critical features are extracted, leading to model development for multiple analytes. Escherichia coli fermentation broth was taken as a case study, where calibration models were developed for 23 analytes, including 20 amino acids, glucose, lactose, and acetate. The I‐CNN model result statistics depicted an average R² values of prediction 0.90, external validation data set 0.86 and significantly lower root mean square error of prediction values ∼0.52 compared to conventional regression models like PLS. Preprocessing steps were applied to I‐CNN models to evaluate any augmentation in prediction performance. Finally, the model reliability was assessed via real‐time process monitoring and comparison with offline analytics. The proposed I‐CNN method is systematic and novel in extracting distinctive spectral features from a multianalyte bioprocess data set and could be adapted to other complex cell culture systems requiring rapid quantification using spectroscopy.
Article
When using near‐infrared (NIR) techniques for analysis, model construction and maintenance updates are essential. When model construction is performed in machine learning, the sample set is usually divided into the calibration set and the validation set. The representativeness of the calibration set and the reasonable distribution of the validation set affects the accuracy of the established model. In addition, when maintaining and updating models, selecting the most informative updated sample not only improves the model prediction accuracy but also reduces sample preparation. In this paper, the spectral information entropy (SIE) is proposed to be used as a similarity criterion for dividing the sample set and use this criterion to select updated samples. The Kennard–Stone (KS) and the sample set portioning based on joint x – y distance (SPXY) methods were used for comparison to verify the superiority of the proposed method. The results showed that the model built after dividing the sample set using the SIE method has good prediction effect compared with KS and SPXY method. When predicting soluble solid content (SSC) and hardness, the prediction determination coefficient ( ) was improved by more than 15%, and the root mean square error (RMSE) of prediction was reduced by 50%. In terms of model updating, selecting a small number of updated samples using the SIE method can improve the correlation coefficient ( ) to more than 80%, and updated models' prediction accuracy is higher than that of KS and SPXY method. It is confirmed that the SIE method can make the NIR analysis technique more reliable.
Article
The consumer price index (CPI) is an important indicator to measure inflation or deflation, which is closely related to residents’ lives and affects the direction of national macroeconomic policy formulation. It is a common method to discuss CPI from the perspective of economic analysis, but the statistical principles and influencing factors related to CPI are often ignored. Thus, the impact of different types of CPI on China’s overall CPI was discussed from three aspects: statistical simulation, machine learning prediction and correlation analysis of various types of influencing factors and CPI in this study. Realistic data from the National Bureau of Statistics from 2010 to 2022 were selected as the analysis object. The Statistical analysis showed that in 2015 and 2020, CPI had a fluctuating trend due to the impact of education and transportation. Four types of statistical models including Gauss, Lorentz, Extreme and Pearson were compared. It was determined that the R2 fitted by Extreme model was higher (R2 = 0.81), and the optimal year of simulation was around 2019, which was close to reality. To accurately predict the CPI, the results of Support Vector Machine, Regression decision tree and Gaussian regression (GPR) were compared, and the GPR was determined to be the optimal model (R2 = 0.99). In addition, Spearman matrix analyzed the correlation between CPI and various influencing factors. Herein, this study provided a new method to determine and predict the changing trend of CPI by using big data analysis.
Article
Determining sample similarity underlies many foundational principles in analytical chemistry. For example, calibration models are unsuitable to predict outliers. Calibration transfer methods assume a moderate degree of sample and measurement dissimilarities between a calibration set and target prediction samples. Classification approaches link target sample similarities to groups of similar class samples. Although similarity is ubiquitous in analytical chemistry and everyday life, quantifying sample similarity is without a straightforward solution, especially when target domain samples are unlabeled and the only known features are measurable, such as spectra (the focus of this paper). The process proposed to assess sample similarity integrates spectral similarity information with contextual considerations among source analyte contents, model, and analyte predictions. This hybrid approach named the physicochemical responsive integrated similarity measure (PRISM) amplifies hidden-but-essential physicochemical properties encoded within respective spectra. PRISM is tested on four near-infrared (NIR) data sets for four diverse application areas to show efficacy. These applications are the assessment of prediction reliability and model updating for model generalizability, outlier detection, and basic matrix matching evaluation. Discussion is provided on adapting PRISM to classification problems. Results indicate that PRISM collects large amounts of similarity information and effectively integrates it to produce a quantitative similarity evaluation between the target sample and a source domain. The approach is also useful for biological samples with additional physiochemical variations. While PRISM is dynamically tested on NIR data, parts of PRISM were previously applied to other data types, and PRISM should be applicable to other measurement systems perturbed by matrix effects.
Article
As a critical step in the manufacturing of herbal medicines, extraction process is mainly used to collect chemical components and has a significant impact on downstream operations. However, the monitoring...
Article
Background: pH and total soluble solids (TSS) are important quality parameters of mangoes; they represent the acidity and sweetness of the fruit, respectively. This study predicts the pH and TSS of intact mangoes based on near-infrared (NIR) spectroscopy using multi-predictor local polynomial regression (MLPR) modeling. Herein, the prediction performance of kernel partial least square regression (KPLSR), support vector machine regression (SVMR), and MLPR is compared. Methods: For this purpose, 186 intact mango samples at three different maturity stages are used. Prediction models are built using MLPR, KPLSR, and SVMR based on untreated and treated spectra. The best regression model for predicting pH is MLPR based on Gaussian filter smoothing spectra. Moreover, the TSS value is more accurately predicted using MLPR based on Savitzky–Golay smoothing. Results: The findings reveal that MLPR is highly accurate in estimating the pH and TSS of mangoes, with mean absolute percentage error (MAPE) values less than 10 %. In addition, the MLPR model has the best predictive performance with the lowest Mean Squared error (MSE) and root mean squared error (RMSE) values and the highest R ² value. Conclusions: The use of NIR spectroscopy in combination with multi-predictor local polynomial regression could provide a quick and non-destructive technique for predicting mango quality. Thus, the results of this study help support sustainable production as a sustainable development goal.
Article
Full-text available
The perishable nature of agricultural produce (e.g. fresh fruits and vegetables) necessitates good decision making to avoid losses and maximize profits. Near infrared (NIR) spectroscopy is one of the tools that has proven useful for this purpose. However, despite the recent development of fairly low-cost portable instruments (e.g. SCiO, Linksquare, Neospectra), financially accessible NIR solutions still need to be provided for the technique to become more widely available. Therefore, herein we report the development of a prototype of a low cost NIR device based on the Hamamatsu C14384MA-01 minispectrometer. The spectrometer was equipped with both filament and LED light sources. The constructed device was used to collect NIR spectroscopic information from Nam Dok Mai mangoes and this data was used to develop predictive models for the quality parameters of mangoes: dry matter (DM), total soluble solids (TSS), titratable acidity (TA), pH, and firmness. Mango samples used for developing the predictive models were collected at two different time periods. The models were made using partial least squares regression (PLSR) and the data analysis used both unprocessed data and preprocessed data (e.g. Savitzky-Golay derivative, standard normal variate (SNV) normalization). The best models were developed for TA, pH, and TSS. The model for TA had values of Rc2Rc2{R}_{c}^{2} and Rcv2Rcv2{R}_{cv}^{2} of 0.82 and 0.84, respectively, while the RMSEC and RMSECV were 0.37 and 0.36% for malic acid, respectively. The model for pH had an Rc2Rc2{R}_{c}^{2} and Rcv2Rcv2{R}_{cv}^{2} of 0.85 and 0.80, respectively, while the RMSEC and RMSECV were 0.46 and 0.45, respectively. The model for TSS had an Rc2Rc2{R}_{c}^{2} and Rcv2Rcv2{R}_{cv}^{2} of 0.84 and 0.81, respectively, while the RMSEC and RMSECV were 1.22 and 1.07°Brix, respectively. The models for DM and firmness had either the Rcv2Rcv2{R}_{cv}^{2} or both Rc2Rc2{R}_{c}^{2} and Rcv2Rcv2{R}_{cv}^{2} values below 0.80. The results showed the possibility of constructing an operational NIR spectrometer based on the C1384MA-01 sensor, which can yield spectral data of sufficient quality to allow development of prediction models for mango quality parameters.
Article
In order to overcome the tedious and time-consuming chemical analysis process and the low accuracy of traditional machine learning model in recognizing varieties, in this study, a recognition model of corn varieties was proposed based on convolutional neural network (LeNet-5) combining the near-infrared(NIR) spectrum processing technology and the deep learning model. First, 450 groups of NIR spectral data of 6 different varieties of corn were acquired by Fourier transform near-infrared spectrometer, and randomly divided into training set and test set according to the ratio of 2:1. Then, the spectral data were preprocessed by the de trending algorithm (DT), and 114 characteristic wavenumbers were extracted from the original NIR spectral data using the competitive adaptive reweighting sampling algorithm (CARS). Finally, based on the optimal selection of spectral characteristic wave numbers, a recognition model of corn varieties was constructed based on LeNet-5 convolution neural network. The results showed that the accuracy of the recognition model was 99.20 %, and the average time was 0.3500 s. Compared with the back propagation neural network (BP), K-nearest neighbor (KNN), support vector machine (SVM), and partial least squares (PLS), the average value of the recognition accuracy was improved by 25.78 %, which provided a new idea and method for accurate and rapid recognition of corn varieties.
Preprint
Full-text available
Mealiness is a phenomenon in which intercellular adhesions in apples loosen during storage, causing soft and floury texture at the time of eating, and leading to lower consumer preference. Although apples can be stored and commercially sold throughout the year, the occurrence of mealiness is not monitored during storage. Therefore, the objective of this research was to non-destructively estimate the mealiness of apple fruit by means of laser scattering measurement. This method is based on laser light backscattering imaging but can quantify a wider range of backscattered light than the conventional method. Lasers with wavelengths of 633 nm and 850 nm were used as a light source, and after acquiring backscattered images, profiles and images were obtained. Profile features such as curve fitting coefficients and profile gradients, and image features such as statistical image features and texture features were extracted from the profiles and images, respectively. PLS, SVM, and ANN models were used for the estimation of mealiness. The results of the estimation based on these features showed that the ANN model combining both wavelengths had a higher performance (R = 0.634, RMSE = 7.621) than the models constructed from features calculated from the data obtained by a single wavelength. In order to improve the performance of model, we applied various ensemble learning. As a result, the ensemble model showed the highest performance (R = 0.682, RMSE = 7.281). These results suggest that laser scattering measurement is a promising method for estimating the apple fruit mealiness.
Article
Full-text available
Configuring a neural network’s architecture and hyperparameters often involves expert intuition and hand-tuning to extrapolate well without overfitting. This paper considers automatic methods for configuring a neural network for the domain of visible and near-infrared (Vis-NIR) spectroscopy. In particular, we study the effect of (a) validation set choice for validating configurations and (b) using ensembles. We consider several validation set choices: a random sample of 33% of non-test data (the technique used in previous work), samples from the latest year (a harvest season), and the first, middle, and latest 33% of samples sorted by time. To test these methods, we do a comprehensive study of a held-out 2018 harvest season of mango fruit given Vis-NIR spectra from 3 prior years. We find that ensembling improves the state-of-the-art model’s variance and accuracy. Furthermore, hyperparameter optimization experiments show that when ensembling is combined with using the latest 33% of samples as the validation set, a neural network configuration is found automatically that performs as well as an expertly-chosen configuration.
Article
As a promising process to remove nitrogen, anaerobic ammonium oxidation (anammox) is sensitive to environmental factors and toxic substances (antibiotics, heavy metals, sulfides, etc.). Especially, the nitrogen removal efficiency and biological activity of the anammox system were greatly inhibited by heavy metals. Few studies have been devoted to systematically investigating the effects and quantitatively deciphering the mechanism of different heavy metals on anammox-based systems. Based on the big data analysis of previous publications, the ionic heavy metals contributed to the greater decrease of specific anammox activity (SAA) than their nanoparticulate forms. Especially the inhibition ratio of SAA severely deteriorated under Cu²⁺ and CuO NPs, reaching 85% and 81%, respectively. This inhibition effect of heavy metals on nitrogen removal was also verified by the correlation analysis. Moreover, Logistic kinetic model was suitable to fit the inhibition periods of heavy metals, and Gaussian model could accurately predict nitrogen removal rate (R² over 0.95) better than artificial neural network model. Based on quantitatively calculation of three independent nitrogen removal processes, nano heavy metals had a slight promotion effect on the ammonia oxidation process that proliferated amoA, while Ni²⁺ and CuO nanoparticulate inhibited nitrite oxidation process severely accompanying by the augmentation of resistance genes. Importantly, the anammox process was restrained by all heavy metals, verifying that anammox played a key role in the whole nitrogen removal process. This big data analysis provides a useful direction that it should give priority to ionic heavy metal and the copper element for anammox-based process under stress of heavy metals.
Article
Short wave near infrared (NIR) spectroscopy operated in a partial or full transmission geometry and a point spectroscopy mode has been increasingly adopted for evaluation of quality of intact fruit, both on-tree and on-packing lines. The evolution in hardware has been paralleled by an evolution in the modelling techniques employed. This review documents the range of spectral pre-treatments and modelling techniques employed for this application. Over the last three decades, there has been a shift from use of multiple linear regression to partial least squares regression. Attention to model robustness across seasons and instruments has driven a shift to machine learning methods such as artificial neural networks and deep learning in recent years, with this shift enabled by the availability of large and diverse training and test sets.
Article
Anaerobic ammonium oxidation (anammox) is an advanced nitrogen removal process that is widely used in the nitrogen removal of various antibiotic containing wastewaters due to its high efficiency and energy saving characteristics. However, as a widely used antibiotic, the inhibitory effect of oxytetracycline (OTC) on anammox is unclear. In this study, the effect of OTC on the anammox-based nitrogen removal process was revealed by kinetic model and machine learning models. Statistical analysis showed that anammox started to be inhibited when the OTC concentration reached 2 mg/L. The inhibition and recovery periods were simulated under OTC stress. During the inhibition period, the R² fitted by Exp model was higher, and the simulated maximum nitrogen removal rate (NRR) was between 0.47 and 17.05 kg/(m³·d). During the recovery period, both Boltzmann and Gauss models fit well. In addition, the machine learning model of the artificial neural network predicted the NRR more accurately, indicating that the importance of environmental factors was lower than the effluent parameters. Spearman correlation analysis showed that the NRR was negatively correlated with OTC under both short-term and long-term OTC stress. Furthermore, the hydraulic retention time and water quality parameters played an important role in the short-term and long-term experiment, respectively. Finally, redundancy analysis demonstrated that the abundance of nitrogen functional genes, such as hydrazine dehydrogenase, nitrite/nitric oxide oxidoreductase and hydrazine synthase, was negatively correlated with the amount of OTC, while antibiotic resistance genes showed the opposite trend.
Article
Full-text available
The application of visible (Vis; 400–750 nm) and near infrared red (NIR; 750–2500 nm) region spectroscopy to assess fruit and vegetables is reviewed in context of ‘point’ spectroscopy, as opposed to multi- or hyperspectral imaging. Vis spectroscopy targets colour assessment and pigment analysis, while NIR spectroscopy has been applied to assessment of macro constituents (principally water) in fresh produce in commercial practice, and a wide range of attributes in the scientific literature. This review focusses to key issues relevant to the widespread implementation of Vis-NIR technology in the fruit sector. A background to the concepts and technology involved in the use of Vis-NIR spectroscopy is provided and instrumentation for in-field and in-line applications, which has been available for two and three decades, respectively, is described. A review of scientific effort is made for the period 2015 - February 2020, in terms of the application areas, instrumentation, chemometric methods and validation procedures, and this work is critiqued through comparison to techniques in commercial use, with focus to wavelength region, optical geometry, experimental design, and validation procedures. Recommendations for future research activity in this area are made, e.g., application development with consideration of the distribution of the attribute of interest in the product and the matching of optically sampled and reference method sampled volume; instrumentation comparisons with consideration of repeatability, optimum optical geometry and wavelength range). Recommendations are also made for reporting requirements, viz. description of the application, the reference method, the composition of calibration and test populations, chemometric reporting and benchmarking to a known instrument/method, with the aim of maximising useful conclusions from the extensive work being done around the world.
Article
Full-text available
This paper investigates the use of least squares support vector machines and Gaussian process regression for multivariate spectroscopic calibration. The performances of these two non-linear regression models are assessed and compared to the traditional linear regression model, partial least squares regression on an agricultural example. The non linear models, least squares support vector machines, and Gaussian process regression, showed enhanced generalization ability, especially in maintaining homogeneous prediction accuracy over the range. The two non-linear models generally have similar prediction performance, but showed different features in some situations, especially when the size of the training set varies. This is due to fundamental differences in fitting criteria between these models.
Article
Full-text available
Near-infrared (NIR) spectroscopy is one of the most prominent non-destructive techniques which can classify the quality of agricultural products. An important issue for the NIR is when the new target data from different environments, e. g. area, orchard, harvesting age, harvesting year, etc. , is classified by the existing classifier, the predicting performance might not be effective. Furthermore, the classification model re-building can incur the cost in term of laboring or time. In this paper, an ensemble classification is applied to predict the quality of mango cv. Nam Dok Mai Si Thong fruits. The ensemble classification is a technique which combines multiple classifiers to work as a sole classifier. In addition, such individual classifier, sub-classifier, will be assigned a weight according to the similarity to the target dataset for adapting itself to the new environment. In our work, the ensemble classification will contain the classification models, multiple linear regression (MLR) equations, trained from the data with different harvesting year, and harvesting period for the fruit collection on such year. The weight of each sub-classifier is obtained from the difference between its prediction error and the prediction error of the target dataset. The proposed approach was evaluated by the experiments in which its performance was compared with the traditional sole classifiers as well as the naïve ensemble classifiers without weighting. From this result, the standard error of prediction (SEP) values of the proposed ensemble classifier, naïve ensemble classifier, and single classifier were 0. 95, 1. 08, and 0. 99, respectively. Such results can illustrate that our proposed ensemble classifier can perform well for the changing environments.
Article
Full-text available
A new procedure (LOCAL) for local calibrations is presented. LOCAL selects spectra from a library of samples and computes a PLS calibration equation for each constituent of the sample. This study evaluated the performances of LOCAL on the prediction of ground corn grain and haylage using several different combinations of data transformations, wavelength segment reduction, number of PLS factors and samples used in calibration. LOCAL resulted in lower SEP values for all the constituents of corn and dry matter of haylage with improvements ranging between 6 to 13%. Global calibrations had only a small advantage over LOCAL (1-2%) in the prediction of acid detergent fibre and crude protein in haylage. The two most important variables controlling the accuracy of predictions were number of samples in the calibration and number of PLS factors in the solution. Best results were obtained using 150 to 225 samples and more than 20 PLS factors per calibration equation. The speed of the LOCAL procedure is 0.5-2 s per sample on a 90 MHz computer. With this speed and accuracy, LOCAL is now available for real-time routine operation on a Windows platform.
Article
Full-text available
Infrared (IR) or near-infrared (NIR) spectroscopy is a method used to identify a compound or to analyze the composition of a material. Calibration of NIR spectra refers to the use of the spectra as multivariate descriptors to predict concentrations of the constituents. To build a calibration model, state-of-the-art software predominantly uses linear regression techniques. For nonlinear calibration problems, neural network-based models have proved to be an interesting alternative. In this paper, we propose a novel extension of the conventional neural network-based approach, the use of an ensemble of neural network models. The individual neural networks are obtained by resampling the available training data with bootstrapping or cross-validation techniques. The results obtained for a realistic calibration example show that the ensemble-based approach produces a significantly more accurate and robust calibration model than conventional regression methods.
Conference Paper
Full-text available
We present a method for constructing ensembles from libraries of thousands of models. Model libraries are generated using different learning algorithms and parameter settings. Forward stepwise selection is used to add to the ensemble the models that maximize its performance. Ensemble selection allows ensembles to be optimized to performance metric such as accuracy, cross entropy, mean precision, or ROC Area. Experiments with seven test problems and ten metrics demonstrate the benefit of ensemble selection.
Article
Short wave near infrared spectroscopy has found use in non-invasive assessment of dry matter content (DMC, % fresh weight) of mango fruit, both as a guide to harvest maturity and ensure eating quality of ripened fruit. In this study, this application is optimised in terms of pre-processing of spectra, the source of variations important to model performance documented, and the performance of cultivar or physiological stage specific partial least squares regression (PLSR) models, global PLSR and an artificial neural network (ANN) model are compared. The data set consisted of 4675 samples acquired across four seasons, ten cultivars and two growing regions, with harvest populations used as cross validation groups. The data of the fourth season was reserved as an independent test set. Spectra pre-treatment of mean centred Savitzy-Golay second derivative (second order polynomial using a 17 point interval) and use of the wavelength range 684−990 nm gave the lowest RMSECV for PLSR models, although other ranges had similar statistics. The fruit physiological stage had the greatest impact on PLSR model performance, compared to cultivar, year or growing region, as estimated using a ‘variable importance metric’ devised and implemented using a random forest regression. The use of specific (cultivar or physiological stage) PLSR models improved prediction results of the independent validation set (RMSEP on DMC decreased from 1.01 to 0.88 %), and was similar to the result of a global ANN model (0.89 %). The use of an ANN model is recommended in terms of ease of use of a single model across all cultivars.
Article
Vibrational spectroscopy methods are widely investigated as fast and non-destructive alternatives for postharvest quality evaluation. As these methods measure spectral responses at a large number of wavebands correlated to the quality traits of interest, multivariate calibration equations have to be built to estimate the quality traits from the acquired spectra. This paper provides an overview of the most important multivariate data analysis techniques for exploring spectral data, detecting outliers and building calibration models for predicting the quality traits of interest. Both linear and non-linear calibration methods are discussed for quantitative (continuous) and qualitative (discrete) quality traits. For each of the presented methods the theory is explained, followed by illustration of an example case from the postharvest domain and a discussion of applications of this technique for postharvest quality evaluation based on spectral sensors. As spectral preprocessing, careful validation and calibration transfer are crucial aspects for successful implementation of spectral sensors for postharvest quality evaluation, special attention is given to these aspects. Finally, conclusions are drawn and recommendations are made with respect to the steps to take and points of attention for successful calibration.
Article
In identifying spectral outliers in near infrared calibration it is common to use a distance measure that is related to Mahalanobis distance. However, different software packages tend to use different variants, which lead to a translation problem if more than one package is used. Here the relationships between squared Mahalanobis distance D², the GH distance of WinISI, and the T² and leverage (L) statistics of The Unscrambler are established as D2 = T² ≈ L × n ≈ GH × k, where n and k are the numbers of samples and variables, respectively, in the set of spectral data used to establish the distance measure. The implications for setting thresholds for outlier detection are discussed. On the way to this result the principal component scores from WinISI and The Unscrambler are compared. Both packages scale the scores for a component to have variances proportional to the contribution of that component to total variance, but the WinISI scores, unlike those from The Unscrambler, do not have mean zero.
Article
A local‐based method for near‐infrared spectroscopy predictions, the local partial least squares regression on global PLS scores (LPLS‐S), is proposed in this work and compared with the usual local PLS (LPLS) regression approach. LPLS‐S is based on the idea of replacing the original spectra with a global PLS score matrix before using the usual LPLS. This is done with the aim of increasing the speed of the calculations, which can be an important parameter for online applications in particular, especially when implemented on large databases. In this study, the performance of the two local approaches was compared in terms of efficiency and speed. It could be concluded that the root‐mean‐square error of prediction of LPLS and LPLS‐S were 1.1962 and 1.1602, respectively, but the calculation speed for LPLS‐S was more than 20 times faster than for the LPLS algorithm. The performance of LPLS‐S was compared with usual local PLS regression approach (LPLS) in terms of efficiency and speed. Compared with the LPLS algorithm, the calculation speed of LPLS‐S was greatly improved. The optimum parameters for the LPLS‐S algorithm were easier and more repeatable to be obtained whatever the test set than the LPLS algorithm.
Article
Second derivative of interactance spectra (731-926 nm) of intact peaches and Brix values of extracted juice were used to develop a least squares support vector machine (LS-SVM) regression (based on an RBF kernel) and a PLS regression model. An iterative approach was taken with the LS-SVM regression, involving a grid search with application of a gradient-based optimisation method using a validation set for the tuning of hyperparameters, followed by pruning of the LS-SVM model with the optimised hyperparameters. The grid search approach led to five-fold faster and better determination of hyperparameters. Less than 45% of the initial 1430 calibration samples were kept in the models. In prediction of an independent test set with 120 samples, the pruned LS-SVM models performed better than the PLS model (RMSEP decreased by 9% to 14%).
Article
Pedometrics is the use of quantitative methods for the study of soil distribution and genesis and as a sustainable resource. A common research area in pedometrics and chemometrics is the calibration and prediction of soil properties from diffuse infrared reflectance spectra. The most common method is using partial least-squares regression (PLS). In this paper we present an alternative method in the form of regression rules. The regression-rules model consists of a set of rules, in which each rule is a linear model of the predictors. It is also analogous to piecewise linear functions. The accuracy is tested for prediction of soil properties from their mid-infrared (2500-25000 nm) diffuse reflectance spectra. In addition, we also tested it with the Chimiometrie 2006 challenge data which used the near-infrared spectra to predict soil properties. The results showed that, in comparison with PLS with spectra pretreatment and another data-mining technique, the regression-rules model provides greater accuracy, is simpler and more parsimonious, produces comprehensible equations, provides an optimal variable selection, and respects the upper and lower limits of the data.
Article
This paper Investigates the effect of multiplicative scatter correction (MSC) and nonlinear regression based on the first two and three principal components for near-Infrared reflectance (NIR) spectroscopy data. The focus will be on linearity/nonlinearity as well as treatment of outliers. The main contribution of the paper is the presentation of and testing of a calbration method based on classification and local leastsquares regression. The theory Is illustrated by three examples from NIR analysis. The local linear calibration outperformed traditional methods In two of the examples.
Article
The frequent non-linearity of the calibration models used in infrared reflectance spectroscopy (NIRSS) is the main source of large errors in analyte determinations with this technique. Non-linearity in this type of system arises from factors such as the multiplicative effect of differences in particle size among samples or an intrinsically non-linear absorbance–concentration relationship resulting from interactions between components, hydrogen bonding, etc. In this work, calibration methods including partial least-squares (PLS) regression, linear quadratic PLS (LQ-PLS), quadratic PLS (QPLS) and artificial neural networks (ANNs) were used in conjunction with the NIRRS technique to determine the moisture content of acrylic fibres, the wide variability in linear density of which results in differential multiplicative effects among samples. Based on the results, PC-ANN is the best choice for the intended application. However, the joint use of an effective spectral pretreatment and computational methods such as PLS and LQ-PLS, the optimization of which is much less labour-intensive, provides comparable results. Standard normal variate (SNV) was found to be the best of the spectral pretreatments compared with a view to reducing the non-linearity introduced by scattering. The subsequent application of PLS provides accurate results with linear systems (absorption band at 1450 nm). A non-linear calibration model must be applied instead, however, if the system concerned is intrinsically non-linear. Under these conditions, the three methods tested for this purpose (LQ-PLS, QPLS and ANN) provide comparable results.
Article
This study compares the performance of partial least squares (PLS) regression analysis and artificial neural networks (ANN) for the prediction of total anthocyanin concentration in red-grape homogenates from their visible-near-infrared (Vis-NIR) spectra. The PLS prediction of anthocyanin concentrations for new-season samples from Vis-NIR spectra was characterised by regression non-linearity and prediction bias. In practice, this usually requires the inclusion of some samples from the new vintage to improve the prediction. The use of WinISI LOCAL partly alleviated these problems but still resulted in increased error at high and low extremes of the anthocyanin concentration range. Artificial neural networks regression was investigated as an alternative method to PLS, due to the inherent advantages of ANN for modelling non-linear systems. The method proposed here combines the advantages of the data reduction capabilities of PLS regression with the non-linear modelling capabilities of ANN. With the use of PLS scores as inputs for ANN regression, the model was shown to be quicker and easier to train than using raw full-spectrum data. The ANN calibration for prediction of new vintage grape data, using PLS scores as inputs, was more linear and accurate than global and LOCAL PLS models and appears to reduce the need for refreshing the calibration with new-season samples. ANN with PLS scores required fewer inputs and was less prone to overfitting than using PCA scores. A variation of the ANN method, using carefully selected spectral frequencies as inputs, resulted in prediction accuracy comparable to those using PLS scores but, as for PCA inputs, was also prone to overfitting with redundant wavelengths.
Cubist: (v0.2.3) Regression Modeling Using Rules With Added Instance-based Corrections
  • M Kuhn
  • S Weston
  • C Keefer
  • N Coulter
  • R Quinlan
  • Rulequest Research Pty
  • Ltd
Kuhn, M., Weston, S., Keefer, C., Coulter, N., Quinlan, R., Rulequest Research Pty Ltd, 2020. Cubist: (v0.2.3) Regression Modeling Using Rules With Added Instance-based Corrections. https://cran.r-project.org/web/packages/Cubist/index.html.
7-3) Functions for Latent Class Analysis, Short Time Fourier Transform, Fuzzy Clustering, Support Vector Machines, Shortest Path Computation, Bagged Clustering, Naive Bayes Classifier
  • D Meyer
  • E Dimitriadou
  • K Hornik
  • A Weingessel
  • F Leisch
  • C C Chang
  • C C Lin
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., Chang, C.C., Lin, C.C., 2019. e1071: (v1.7-3) Functions for Latent Class Analysis, Short Time Fourier Transform, Fuzzy Clustering, Support Vector Machines, Shortest Path Computation, Bagged Clustering, Naive Bayes Classifier. https://cran.r-project.org/web/packages/ e1071/index.html.
Artificial Neural Networks and Near Infrared Spectroscopy-A Case Study on Protein Content in Whole Wheat Grain
  • L Norgaard
  • M Lagerholm
  • M Westerhaus
Norgaard, L., Lagerholm, M., Westerhaus, M., 2013. Artificial Neural Networks and Near Infrared Spectroscopy-A Case Study on Protein Content in Whole Wheat Grain. http s://pdfs.semanticscholar.org/5fb4/3fd836031b67f9fc8372244ecb14aae99d89.pdf.
resemble: (v1.2.2) Regression and Similarity Evaluation for Memory-Based Learning in Spectral Chemometrics
  • L Ramirez-Lopez
  • A Stevens
Ramirez-Lopez, L., Stevens, A., 2016. resemble: (v1.2.2) Regression and Similarity Evaluation for Memory-Based Learning in Spectral Chemometrics. https://cran.r-pr oject.org/web/packages/resemble/index.html.
The spectrum-based learner: a new local approach for modeling soil vis-NIR spectra of complex datasets
  • L Ramirez-Lopez
  • T Behrens
  • K Schmidt
  • A Stevens
  • J A M Demattê
  • T Scholten
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Demattê, J.A.M., Scholten, T., 2013. The spectrum-based learner: a new local approach for modeling soil vis-NIR spectra of complex datasets. Geoderma 195, 268-279. https://doi.org/10.1016/j. geoderma.2012.12.014.
Nnet: (v7.3-13) Software for Feed-forward Neural Networks With a Single Hidden Layer, and for Multinomial Log-linear Models
  • B Ripley
  • W Venables
Ripley, B., Venables, W., 2020. Nnet: (v7.3-13) Software for Feed-forward Neural Networks With a Single Hidden Layer, and for Multinomial Log-linear Models. https: //cran.r-project.org/web/packages/nnet/index.html.