ArticlePDF Available

Origin and Variety Identification of Dried Kelp Based on Fluorescence Fingerprinting and Machine Learning Approaches

MDPI
Applied Sciences
Authors:

Abstract and Figures

Accurate labeling of the origin of food ingredients is essential to ensure quality and safety; however, establishing a reliable identification method remains an urgent task. The origin and variety of dried kelp are generally identified based on their morphological characteristics; however, they are difficult to distinguish unless experts are involved. In addition, genetically close varieties have almost no differences in their base sequences; therefore, the accuracy of conventional identification methods using genetic analysis is limited. This study aimed to develop a system for identifying the origin and variety of dried kelp using fluorescence fingerprint data obtained by fluorescence spectroscopy and a convolutional neural network (CNN). The fluorescence characteristics of dried kelp were measured in the range between 250 and 550 nm. The obtained fluorescence fingerprint data were converted into image data and analyzed using a CNN model implemented in Python, TensorFlow, and Keras. Unlike conventional methods that rely on morphological characteristics and genetic analyses, by combining fluorescence spectroscopy and CNN, a high identification accuracy of 98.86% was achieved even for genetically close varieties. These results highlight the excellent potential of fluorescent fingerprints in identifying the origin and variety of food and are believed to contribute to preventing food fraud and quality control.
Content may be subject to copyright.
Academic Editors: Oana Bianca Oprea
and Ignat Tolstorebrov
Received: 13 January 2025
Revised: 8 February 2025
Accepted: 8 February 2025
Published: 10 February 2025
Citation: Suzuki, K.; Akiyama, R.;
Llave, Y.; Matsumoto, T. Origin and
Variety Identification of Dried Kelp
Based on Fluorescence Fingerprinting
and Machine Learning Approaches.
Appl. Sci. 2025,15, 1803. https://
doi.org/10.3390/app15041803
Copyright: © 2025 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license
(https://creativecommons.org/
licenses/by/4.0/).
Article
Origin and Variety Identification of Dried Kelp Based on
Fluorescence Fingerprinting and Machine Learning Approaches
Kana Suzuki, Rikuto Akiyama, Yvan Llave and Takashi Matsumoto *
Department of Food Science and Technology, Tokyo University of Marine Science and Technology,
4-5-7 Konan, Minato-ku, Tokyo 108-8477, Japan; m246009@edu.kaiyodai.ac.jp (K.S.);
s213065@edu.kaiyodai.ac.jp (R.A.); pllave1@kaiyodai.ac.jp (Y.L.)
*Correspondence: tmatsu55@kaiyodai.ac.jp; Tel.: +81-3-5463-0635
Abstract: Accurate labeling of the origin of food ingredients is essential to ensure quality
and safety; however, establishing a reliable identification method remains an urgent task.
The origin and variety of dried kelp are generally identified based on their morphological
characteristics; however, they are difficult to distinguish unless experts are involved. In
addition, genetically close varieties have almost no differences in their base sequences;
therefore, the accuracy of conventional identification methods using genetic analysis is
limited. This study aimed to develop a system for identifying the origin and variety of
dried kelp using fluorescence fingerprint data obtained by fluorescence spectroscopy and a
convolutional neural network (CNN). The fluorescence characteristics of dried kelp were
measured in the range between 250 and 550 nm. The obtained fluorescence fingerprint
data were converted into image data and analyzed using a CNN model implemented in
Python, TensorFlow, and Keras. Unlike conventional methods that rely on morphological
characteristics and genetic analyses, by combining fluorescence spectroscopy and CNN,
a high identification accuracy of 98.86% was achieved even for genetically close varieties.
These results highlight the excellent potential of fluorescent fingerprints in identifying the
origin and variety of food and are believed to contribute to preventing food fraud and
quality control.
Keywords: machine learning; convolutional neural network (CNN); fluorescence spectroscopy;
fluorescence fingerprinting; food origin identification; food variety identification; kelp; kombu
1. Introduction
Labeling the origin of food ingredients is an important factor when consumers assess
the quality and safety of a product. In Japan, labeling the origin of ingredients in processed
foods has been mandatory since April 2022 [
1
], and interest in the accuracy of this labeling
has been increasing. This trend is not limited to Japan but is observed globally. For example,
labeling the origin of food is mandatory under European Union Law [
2
] and in the US
under the Code of Federal Regulations [
3
]. The introduction of such systems has enabled
consumers to obtain transparent information, and the labeling of origin contributes to
improving the reliability of food.
In contrast, the problem of food fraud remains a significant issue. People try to profit
by fraudulently using the names of production areas and varieties with high market value.
Fraudulent acts, such as false labeling of the origin of high-value products, pose a risk to
consumer trust and safety. Therefore, a reliable method is required to verify the origin of
food and to support accurate labeling.
Appl. Sci. 2025,15, 1803 https://doi.org/10.3390/app15041803
Appl. Sci. 2025,15, 1803 2 of 12
The flavor and ingredients of kelp are different depending on the origin and variety,
and there are large differences in distribution prices. Identifying the origin and variety of
kelp is important for food quality assurance, distribution management, and regional brand
protection. DNA and chemical analyses based on the soil composition of the production
area have been used to identify the origin of kelp using physicochemical tests. Hattori
et al. [4] have performed inorganic element analysis of kelp produced in Japan and China
using inductively coupled plasma mass spectrometry (ICP–MS) and developed a method
for determining the origin of kelp using inorganic elements as indicators. Morohashi
et al. [
5
] have also demonstrated the effectiveness of elemental analysis of wakame seaweed
using ICP–MS for identifying the origin of kelp. Shimizu et al. [
6
] have developed a
DNA extraction method with high amplification by polymerase chain reaction, extracted
mitochondrial DNA from 10 types of kelp, and performed genetic analysis. In this study,
six types of kelp have been identified; however, almost no difference has been found in base
sequences between Ma-kombu (Lonicera japonica) and its varieties Rishiri-kombu (Laminaria
ochotensis) and Rausu-kombu (L. ochotensis), indicating that further detailed studies are
necessary. Ma-kombu and its varieties, Rausu-kombu and Rishiri-kombu, have a low
degree of speciation, and the base sequences of rDNA internal transcribed spacer 1 (ITS-1)
and RuBisco spacer, which are effective in detecting species differences in many brown
algae, are completely identical [7].
Various studies have been conducted to identify the geographical origins of food, such
as coffee beans, fruit spirits, and Chinese red wine [
8
11
]. The methods used to distinguish
the origins include terahertz (THz) spectroscopy, Raman spectroscopy, ultraviolet–visible
(UV–VIS) spectroscopy, and NIR spectroscopy.
Fluorescence spectroscopy is a method for measuring food characteristics by using the
properties of substances to absorb light at certain wavelengths and emit light at different
wavelengths. Hu et al. [
12
] have demonstrated the possibility of using excitation-emission
matrix fluorescence spectroscopy combined with chemometrics to distinguish the geo-
graphical origins of rice from other varieties. Riza et al. [
13
] have identified the variety and
geographical origin of Italian olive oil. Karoui et al. [
14
] have proposed a method for identi-
fying the botanical origin of Swiss honey. Strelec et al. [
15
] have detected insects infesting
wheat grains using front-face fluorescence and UV–VIS spectroscopy. The effectiveness
of fluorescence spectroscopy has been confirmed for a wide variety of foods, including
rice, olive oil, honey, and wheat. A lot of information can be obtained more simply and
efficiently by fluorescence spectroscopy than conventional physicochemical analysis. Since
it is highly sensitive to minute differences in composition and is not easily affected by
moisture, it was thought to be an effective method for identifying the origin and variety
of dried kelp with high accuracy. This perspective has not been sufficiently examined in
previous research.
Various machine learning methods have been developed and combined with other
methods to identify the geographical origin of food such as peaches, Chinese Longjing
tea, and Pu’er tea, including VIS–NIR, fluorescence spectroscopy, image-processing
technology,
1
H nuclear magnetic resonance spectroscopy, hyperspectral imaging (HIS)
technology [1618]
. Chen et al. [
19
] have proposed a method for identifying the adulter-
ation of camellia oil and quantifying the level of adulteration using excitation–emission
matrix spectroscopy and a CNN. Wu et al. [
20
] have proposed a method for detecting
the adulteration of nine types of vegetable oils using three-dimensional (3D) fluorescence
spectroscopy and a CNN. In addition to a CNN, k-nearest neighbor (KNN), Random Forest
(RF), Support Vector Machine (SVM), and Partial Least Squares have been used for com-
parison. Hu et al. [
21
] have reported that a CNN shows high classification accuracy when
using 3D fluorescence spectroscopy, SVM, RF, and CNN to detect counterfeit camellia oil.
Appl. Sci. 2025,15, 1803 3 of 12
In this study, fluorescence fingerprint data assessed using fluorescence spectroscopy
were used in a machine learning model to evaluate the identification of kelp varieties.
Building on the findings of previous studies, this study makes new academic and practical
contributions to the literature. First, it is novel as it uses fluorescence fingerprint data
obtained by fluorescence spectroscopy to identify the origin and variety of dried kelp.
Fluorescence spectroscopy has not been used for dried kelp, and no attempts have been
made to analyze the data with high accuracy using machine learning. This study aims to fill
this technological gap. Second, it aims to extract information on the origin and variety from
fluorescence fingerprint data with high accuracy using a CNN, a type of deep learning. A
CNN can automatically learn the features of complex patterns and is expected to have a
higher classification accuracy than conventional machine learning algorithms. Therefore,
improved accuracy is expected by applying this method to identify the origin and variety
of dried kelp. Furthermore, this study aims to provide a highly practical method for
identification to prevent food fraud and ensure quality assurance.
2. Materials and Methods
2.1. Target Foods
In this study, we investigated the origins and varieties of kombu in Hokkaido, a
representative kombu-producing region in Japan. Three types of dried kombu of differ-
ent origins and varieties were studied: Rausu-kombu, Rishiri-kombu (L. ochotensis), and
Mitsuishi-kombu (Laminaria angustata), which are genetically similar to Ma-kombu (L. japon-
ica). Eleven commercially available dried kombu samples were collected. Each product was
procured from a different manufacturer or with a different expiry date, with the intention
of obtaining data that reflects differences in origin, season, and manufacturing process. The
samples were ground and homogenized in a blender, then sealed in a powder cell, and
their fluorescence fingerprints were measured using a spectrofluorometer. Fluorescence
fingerprint data for 570 pieces (190 for each variety) were obtained. This resulted in the
construction of a dataset for identifying differences in the chemical and physical properties
between varieties using machine learning.
2.2. Analysis Method
A fluorescence spectrophotometer (F-7100; Hitachi High-Tech Science, Ibaraki, Japan)
was used to measure the fluorescence properties of the samples. The excitation and
fluorescence wavelength ranges were both 250–550 nm, the ex-citation sampling interval
was 10.0 nm, and the fluorescence sampling interval was 5.0 nm. The excitation and
emission wavelengths were selected in the range of 250–550 nm so that they did not
overlap, as shown in Figures 1and 2.
Appl. Sci. 2025, 15, x FOR PEER REVIEW 4 of 12
(a) (b) (c)
Figure 1. Unied scale dataset: (a) Rausu, (b) Rishiri, and (c) Hidaka.
(a) (b) (c)
Figure 2. Individual scale dataset: (a) Rausu, (b) Rishiri, and (c) Hidaka.
Before measurement, the dried kelp samples were homogenized by grinding in a mill
(DM-7452; DR MILLS, Guangzhou, Guangdong China) for approximately 1 min (10 s ×
six times) to obtain uniform uorescence properties. The ground samples were stored in
an airtight container to prevent humidication and oxidation, which may change the u-
orescence properties. No solvent was used for sample treatment, the sample was placed
in a solid sample holder, and the detection angles of the excitation light and the uores-
cence were set at 90 degrees, and measurements were performed in an orthogonal cong-
uration. To maintain consistency, minimize sample-to-sample variation, and ensure accu-
rate data collection, approximately 0.3 g of sample was sealed in a powder cell and placed
in the uorescence spectrophotometer just before measurement.
2.3. Development of Machine Learning Model
2.3.1. Data Acquisition and Preprocessing
The photomultiplier voltage was set to 480 V. Based on the spectra acquired under
these measurement conditions, a 3D uorescence matrix (FD3 le) was generated for each
sample. The obtained data were presented as 2D maps.
The maximum and minimum uorescence intensity values of all samples were ex-
tracted and unied as the maximum and minimum values of the scale for all 2D images.
The maximum and minimum values of the uorescence intensity scale were 1712 and 38,
respectively.
Furthermore, the 2D images were cropped to a square (360 px vertical × 360 px hori-
zontal) using Python (ver. 3.13.2), and the graduations, scales, and unnecessary white
Figure 1. Unified scale dataset: (a) Rausu, (b) Rishiri, and (c) Hidaka.
Appl. Sci. 2025,15, 1803 4 of 12
Appl. Sci. 2025, 15, x FOR PEER REVIEW 4 of 12
(a) (b) (c)
Figure 1. Unied scale dataset: (a) Rausu, (b) Rishiri, and (c) Hidaka.
(a) (b) (c)
Figure 2. Individual scale dataset: (a) Rausu, (b) Rishiri, and (c) Hidaka.
Before measurement, the dried kelp samples were homogenized by grinding in a mill
(DM-7452; DR MILLS, Guangzhou, Guangdong China) for approximately 1 min (10 s ×
six times) to obtain uniform uorescence properties. The ground samples were stored in
an airtight container to prevent humidication and oxidation, which may change the u-
orescence properties. No solvent was used for sample treatment, the sample was placed
in a solid sample holder, and the detection angles of the excitation light and the uores-
cence were set at 90 degrees, and measurements were performed in an orthogonal cong-
uration. To maintain consistency, minimize sample-to-sample variation, and ensure accu-
rate data collection, approximately 0.3 g of sample was sealed in a powder cell and placed
in the uorescence spectrophotometer just before measurement.
2.3. Development of Machine Learning Model
2.3.1. Data Acquisition and Preprocessing
The photomultiplier voltage was set to 480 V. Based on the spectra acquired under
these measurement conditions, a 3D uorescence matrix (FD3 le) was generated for each
sample. The obtained data were presented as 2D maps.
The maximum and minimum uorescence intensity values of all samples were ex-
tracted and unied as the maximum and minimum values of the scale for all 2D images.
The maximum and minimum values of the uorescence intensity scale were 1712 and 38,
respectively.
Furthermore, the 2D images were cropped to a square (360 px vertical × 360 px hori-
zontal) using Python (ver. 3.13.2), and the graduations, scales, and unnecessary white
Figure 2. Individual scale dataset: (a) Rausu, (b) Rishiri, and (c) Hidaka.
Before measurement, the dried kelp samples were homogenized by grinding in a
mill (DM-7452; DR MILLS, Guangzhou, Guangdong, China) for approximately 1 min
(
10 s ×six times
) to obtain uniform fluorescence properties. The ground samples were
stored in an airtight container to prevent humidification and oxidation, which may change
the fluorescence properties. No solvent was used for sample treatment, the sample was
placed in a solid sample holder, and the detection angles of the excitation light and the
fluorescence were set at 90 degrees, and measurements were performed in an orthogonal
configuration. To maintain consistency, minimize sample-to-sample variation, and ensure
accurate data collection, approximately 0.3 g of sample was sealed in a powder cell and
placed in the fluorescence spectrophotometer just before measurement.
2.3. Development of Machine Learning Model
2.3.1. Data Acquisition and Preprocessing
The photomultiplier voltage was set to 480 V. Based on the spectra acquired under
these measurement conditions, a 3D fluorescence matrix (FD3 file) was generated for each
sample. The obtained data were presented as 2D maps.
The maximum and minimum fluorescence intensity values of all samples were ex-
tracted and unified as the maximum and minimum values of the scale for all 2D images.
The maximum and minimum values of the fluorescence intensity scale were 1712 and 38,
respectively.
Furthermore, the 2D images were cropped to a square (360 px vertical
×
360 px
horizontal) using Python (ver. 3.13.2), and the graduations, scales, and unnecessary white
spaces common to all images were removed. These 2D images and the CSV file containing
the image classes (variety) were used as input data for the machine learning model.
2.3.2. Model Construction
As deep learning algorithms, some models were created using a CNN, as well as KNN,
RF, SVM, and logistic regression (LR); a CNN was adopted as it had the highest validation
accuracy. The CNN was implemented using Python, TensorFlow (ver. 2.18.0), and the
Keras (ver. 3.9) library. The CNN architecture was simple and was designed with reference
to the LeNet-5 architecture. Specifically, multiple Conv2D and MaxPooling2D were used,
and the data were flattened in a Flatten layer and connected to a fully connected layer.
Convolutional layers: The first convolutional layer used 16 filters (3
×
3 kernel size)
and applied the ReLU function as an activation function for feature extraction. This
process extracts the local patterns from the image. L2 regularization was also ap-
plied, and the weights were penalized to suppress the overfitting of the model and
improve stability;
Appl. Sci. 2025,15, 1803 5 of 12
Pooling layers: A 2
×
2 max pooling was performed to shrink the feature map and
retain important information. Consequently, the computational load was reduced, and
overfitting was suppressed;
Additional convolutional and pooling layers: Convolutional layers with 32 and
64 filters
were combined with subsequent pooling layers to extract higher-level fea-
tures. L2 regularization was also applied to these convolutional layers;
Fully connected layers: After converting the feature map to one dimension in the
Flatten layer, it was passed through a fully connected layer of 256 units for the final
classification. L2 regularization was also applied to this fully connected layer;
Dropout layer: Randomly invalidating 30% (Dropout = 0.3) output of the fully con-
nected layer suppressed overfitting and improved robustness;
Output layer: The probability of each variety was calculated using the SoftMax acti-
vation function. This outputs the probability that the sample belongs to a particular
variety, and the final prediction is made.
2.3.3. Training and Optimization
Adam Optimizer was used to train the model. Adam dynamically adjusts the learning
rate according to changes in the loss function, and optimization is performed efficiently
and effectively. Categorical cross-entropy was adopted as the loss function, and the settings
were made suitable for multiclass classification problems. The batch size was set to 32. In
addition, early stopping was introduced to improve training efficiency while preventing
overfitting. Considering that this was a multiclass classification problem, the number of
samples for each class was constant, and the accuracy of the model was important, the
validation accuracy was set as a monitoring indicator. Furthermore, patience (waiting
period) was set to 10, and training was terminated if no improvement in validation loss
was observed within 10 epochs. The number of epochs was set to 60 so that learning could
continue until no improvement in the validation accuracy was observed. Additionally, by
integrating the prediction results of all fold models rather than a single-fold model, it was
designed to be more robust and have a high generalization performance. The test data was
evaluated using the ensemble model.
2.3.4. Evaluation of Model Performance
In this study, ensemble learning was performed based on the prediction results of each
model obtained from five folds, and the final evaluation was performed using a weighted
average method based on the validation accuracy. Specifically, weights were calculated
based on the validation accuracy of the models obtained in each fold, and the predictions
of the test data by the models in each fold were weighed to emphasize the predictions of
the models with high validation accuracy. Finally, the accuracy of the test data obtained via
ensemble learning was used as an evaluation index for the model.
2.3.5. Prediction for Unknown Data
For unknown samples, the trained model was used to predict the probability of each
variety. The variety with the highest predicted probability was determined as the origin
and variety of the unknown sample. This enabled the identification of new samples.
3. Results
3.1. Development of the Variety Identification System
The kelp variety identification system was developed based on three steps: data
acquisition and preprocessing, model building, and training.
Appl. Sci. 2025,15, 1803 6 of 12
During preprocessing, the fluorescence intensity scale was unified to a maximum
value of 1712 and a minimum value of 38 for all image data to minimize data variability. In
addition, unnecessary scales and white spaces were removed from the images to reduce the
influence of noise. This process reduces computational load and improves data consistency.
The dataset was split into training (80%) and test (20%) data. The training data were
further split into five folds using stratified k-fold cross-validation. This method improved
the generalization performance of the model.
A convolutional neural network (CNN) was used to build the model. This ensemble
model achieved an accuracy of 98.86% for the test data and effectively captured the fine
characteristics of the varieties.
By combining fluorescent fingerprint data and a CNN, this system achieved high
accuracy in identifying the origin and variety of dried kelp, demonstrating that it is a
promising method for preventing food fraud and ensuring quality control.
3.2. Influence of Fluorescence Intensity Scale Setting Method
In the preprocessing of the fluorescence fingerprint data, a dataset of images with
uniform maximum and minimum values of the scale of the fluorescence intensity was
created for all image data, and a dataset of non-uniform scale images with maximum and
minimum values of the scale of the fluorescence intensity set for each image data was
also created.
A part of the dataset of uniform scale images and the dataset of non-uniform scale
images is shown in Figures 1and 2, and the learning curves are shown in Figures 3and 4.
The images of these datasets revealed the following points. The uniform scale images
clearly visualized the differences in fluorescence intensity between kelp varieties, while the
non-uniform scale images emphasized the trends and patterns of each image, making it
easier to capture the features.
Appl. Sci. 2025, 15, x FOR PEER REVIEW 7 of 12
Figure 3. Unied scale dataset learning curve.
Figure 4. Individual scale dataset learning curve.
The uniform scale images were expected to reduce noise and improve the learning
accuracy of the model because the data were provided to the model in a relatively con-
sistent state. However, discriminant analysis using a CNN indicated that the uniform
scale images did not show a signicant improvement in accuracy compared with that of
the nonuniform scale images. A comparison of the learning curves showed that the da-
taset of nonuniform scale images (individual scale dataset) converged the learning curve
in relatively few epochs.
Furthermore, the dataset of nonuniform scale images conrmed that the uorescence
paerns of Rausu-kombu and Rishiri-kombu were similar; however, the dataset of uni-
form scale images revealed dierences in uorescence intensities between the two.
3.3. Identication of Kombu Varieties
The training and validation accuracies were 0.9982 and 0.9796, indicating excellent
performance of the entire model (Table 1). In addition, the training and validation losses
also showed low values, and no signicant discrepancy was observed between training
and validation. The test accuracy was 0.9886, conrming that predictions could be made
with an accuracy of >90%, even for samples of unknown varieties (Table 2). In the process
of model construction, Hidaka-kombu showed a characteristic uorescent ngerprint pat-
tern and achieved high classication accuracy. However, the identication accuracies of
Rausu-kombu and Rishiri-kombu were slightly lower than that of Hidaka-kombu, and
the tendency for some samples to be misclassied was conrmed. This dierence was
probably owing to similarities in the uorescence characteristics of Rausu-kombu and
Rishiri-kombu. However, the misclassication was improved by reviewing the data pre-
processing method (method of seing the uorescence intensity scale). Analysis using the
confusion matrix revealed the accuracy rate and misclassication tendency for each vari-
ety. The overall model performance was high, with the Precision, Recall, and F1-Score
Figure 3. Unified scale dataset learning curve.
Appl. Sci. 2025, 15, x FOR PEER REVIEW 7 of 12
Figure 3. Unied scale dataset learning curve.
Figure 4. Individual scale dataset learning curve.
The uniform scale images were expected to reduce noise and improve the learning
accuracy of the model because the data were provided to the model in a relatively con-
sistent state. However, discriminant analysis using a CNN indicated that the uniform
scale images did not show a signicant improvement in accuracy compared with that of
the nonuniform scale images. A comparison of the learning curves showed that the da-
taset of nonuniform scale images (individual scale dataset) converged the learning curve
in relatively few epochs.
Furthermore, the dataset of nonuniform scale images conrmed that the uorescence
paerns of Rausu-kombu and Rishiri-kombu were similar; however, the dataset of uni-
form scale images revealed dierences in uorescence intensities between the two.
3.3. Identication of Kombu Varieties
The training and validation accuracies were 0.9982 and 0.9796, indicating excellent
performance of the entire model (Table 1). In addition, the training and validation losses
also showed low values, and no signicant discrepancy was observed between training
and validation. The test accuracy was 0.9886, conrming that predictions could be made
with an accuracy of >90%, even for samples of unknown varieties (Table 2). In the process
of model construction, Hidaka-kombu showed a characteristic uorescent ngerprint pat-
tern and achieved high classication accuracy. However, the identication accuracies of
Rausu-kombu and Rishiri-kombu were slightly lower than that of Hidaka-kombu, and
the tendency for some samples to be misclassied was conrmed. This dierence was
probably owing to similarities in the uorescence characteristics of Rausu-kombu and
Rishiri-kombu. However, the misclassication was improved by reviewing the data pre-
processing method (method of seing the uorescence intensity scale). Analysis using the
confusion matrix revealed the accuracy rate and misclassication tendency for each vari-
ety. The overall model performance was high, with the Precision, Recall, and F1-Score
Figure 4. Individual scale dataset learning curve.
Appl. Sci. 2025,15, 1803 7 of 12
The uniform scale images were expected to reduce noise and improve the learning
accuracy of the model because the data were provided to the model in a relatively consistent
state. However, discriminant analysis using a CNN indicated that the uniform scale
images did not show a significant improvement in accuracy compared with that of the
nonuniform scale images. A comparison of the learning curves showed that the dataset
of nonuniform scale images (individual scale dataset) converged the learning curve in
relatively few epochs.
Furthermore, the dataset of nonuniform scale images confirmed that the fluorescence
patterns of Rausu-kombu and Rishiri-kombu were similar; however, the dataset of uniform
scale images revealed differences in fluorescence intensities between the two.
3.3. Identification of Kombu Varieties
The training and validation accuracies were 0.9982 and 0.9796, indicating excellent
performance of the entire model (Table 1). In addition, the training and validation losses
also showed low values, and no significant discrepancy was observed between training
and validation. The test accuracy was 0.9886, confirming that predictions could be made
with an accuracy of >90%, even for samples of unknown varieties (Table 2). In the process
of model construction, Hidaka-kombu showed a characteristic fluorescent fingerprint
pattern and achieved high classification accuracy. However, the identification accuracies
of Rausu-kombu and Rishiri-kombu were slightly lower than that of Hidaka-kombu,
and the tendency for some samples to be misclassified was confirmed. This difference
was probably owing to similarities in the fluorescence characteristics of Rausu-kombu
and Rishiri-kombu. However, the misclassification was improved by reviewing the data
preprocessing method (method of setting the fluorescence intensity scale). Analysis using
the confusion matrix revealed the accuracy rate and misclassification tendency for each
variety. The overall model performance was high, with the Precision, Recall, and F1-Score
exceeding 0.95 (Table 3). In the evaluation of each class, Hidaka-kombu had a Recall of 1.0,
whereas Precision was slightly lower at 0.97, suggesting the possibility of false positives
in predictions. The Precision and Recall values confirmed that Rausu-kombu achieved
the highest classification accuracy. Rishiri-kombu had slightly lower Recall than Precision,
indicating the possibility of misclassification.
Table 1. Model evaluation: Accuracy and loss of learning data by weighted average ensemble method
based on the accuracy of each fold of 10 trials.
Trial Number Training
Accuracy
Validation
Accuracy Training Loss Validation Loss
1 0.9968 0.9763 0.07083 0.1483
2 1.000 0.9869 0.03911 0.1055
3 0.9941 0.9739 0.05920 0.1300
4 1.000 0.9870 0.03515 0.1164
5 0.9984 0.9849 0.03445 0.1012
6 0.9978 0.9849 0.04721 0.09288
7 0.9979 0.9682 0.04644 0.1457
8 1.000 0.9828 0.03829 0.1376
9 0.9969 0.9808 0.04789 0.1359
10 1.000 0.9699 0.04979 0.1211
Average 0.9982 0.9796 0.04684 0.1235
Analysis of the learning curve showed that the difference between the training and
validation losses was small, and no signs of overfitting were noticed. Accuracy approached
1 as the number of epochs increased, and the overall performance of the model was
evaluated to be considerable (Figure 5). Furthermore, the validation accuracy of each fold
Appl. Sci. 2025,15, 1803 8 of 12
was stable and high. Therefore, the model performed consistently across the entire dataset.
By contrast, the model converged very early, suggesting that the learning rate was high or
that the initial weights were affected. In addition, the validation loss temporarily increased
in the middle of some folds (Fold 4), suggesting temporary instability of the model.
Appl. Sci. 2025, 15, x FOR PEER REVIEW 9 of 12
Figure 5. Learning curves showing the trends of training and validation losses during model train-
ing.
Furthermore, analysis of the uorescence ngerprint images conrmed that charac-
teristic peaks of uorescence intensity appeared for each variety. A strong uorescence
peak derived from amino acids at 290/330 nm excitation/emission region was observed
for Hidaka-kombu, and while the trend paerns of Rishiri-kombu and Rausu-kombu
were similar at approximately 280–330 and 350–440 nm, Rausu-kombu showed a rela-
tively weak overall uorescence intensity. Further localization of this peak may provide
additional insights but was not determined here as it is beyond the scope of this study.
This suggests that the dierences in the chemical components of each variety are reected
in their uorescent properties, providing useful information for identication.
These results conrmed that the combination of uorescence spectroscopy and a
CNN is an eective method for identifying the origin and variety of kelp.
Figure 5. Learning curves showing the trends of training and validation losses during model training.
Furthermore, analysis of the fluorescence fingerprint images confirmed that charac-
teristic peaks of fluorescence intensity appeared for each variety. A strong fluorescence
peak derived from amino acids at 290/330 nm excitation/emission region was observed
for Hidaka-kombu, and while the trend patterns of Rishiri-kombu and Rausu-kombu were
similar at approximately 280–330 and 350–440 nm, Rausu-kombu showed a relatively weak
overall fluorescence intensity. Further localization of this peak may provide additional
insights but was not determined here as it is beyond the scope of this study. This sug-
gests that the differences in the chemical components of each variety are reflected in their
fluorescent properties, providing useful information for identification.
Appl. Sci. 2025,15, 1803 9 of 12
These results confirmed that the combination of fluorescence spectroscopy and a CNN
is an effective method for identifying the origin and variety of kelp.
Table 2. Accuracy and loss of test data by weighted average ensemble method based on the accuracy
of each fold of 10 trials.
Trial Number Accuracy Loss
1 0.9912 0.5934
2 0.9912 0.5709
3 0.9737 0.5884
4 0.9912 0.5728
5 0.9912 0.5735
6 0.9912 0.5758
7 0.9912 0.5777
8 0.9825 0.5755
9 0.9912 0.5837
10 0.9912 0.5742
Average 0.9886 0.5786
Table 3. Classification performance evaluation index values for each variety (average of 10 trials).
Class Precision Recall F1-Score
Hidaka 0.970 1.000 0.990
Rausu 0.995 0.997 0.996
Rishiri 0.997 0.965 0.985
4. Discussion
In this study, by combining fluorescence fingerprint data with a CNN, the origin and
variety of dried kelp from Hokkaido were identified with high accuracy. The statistical
validity of the model was evaluated using accuracy, precision, recall, and F1 score (Table 3).
The test accuracy was 0.9886 (average of 10 trials), showing high prediction accuracy even
for samples of unknown variety. Comparison of the unified and individual scale datasets
(Figures 1and 2) showed that scale standardization improved classification performance.
These results confirm the robustness of the method in this study. This result was similar
to or better than the accuracy of machine learning studies [
8
,
9
]. In addition, although
the training loss was very low at 0.04684 (average of 10 trials), the validation loss (0.1235,
average of 10 trials) was high, suggesting the possibility of overfitting and the need for
improved preprocessing methods.
Although elemental analysis by ICP-MS [
4
] and DNA analysis [
6
] have a certain degree
of accuracy, ICP-MS requires expensive equipment and skilled techniques, and DNA analy-
sis has the problem that the accuracy of identification decreases when the genetic similarity
is high. The method proposed in this study was particularly effective in identifying Rausu-
kombu and Rishiri-kombu. Although these varieties have similar fluorescence properties,
CNN analysis enabled capturing subtle differences in the fluorescent patterns. Moreover,
Hidaka-kombu had a prominent fluorescent peak derived from amino acids, which helped
in achieving high identification accuracy. Therefore, the fluorescent fingerprints reflected
the differences in chemical properties among the kombu varieties. Identification methods
that utilize such differences in chemical properties are expected to be applied in the fields
of food quality assurance and origin certification, as used for identifying the geographical
origin of rice [12].
This study converts fluorescence fingerprint data from text data to image data using a
CNN model. The results of this study indicate that a CNN is superior to conventional ma-
Appl. Sci. 2025,15, 1803 10 of 12
chine learning methods (such as KNN and SVM) in pattern recognition of image data. This
result is consistent with that previously reported [
21
]. However, because the CNN model
requires a large amount of data and computational resources, a hybrid approach using
lightweight models (e.g., MobileNet and EfficientNet) or other algorithms (
e.g., XGBoost
and RF) can be considered for future applications.
The fluorescence properties of dried kelp are significantly affected by components
such as amino acids and polyphenols. In particular, fluorescence peaks derived from amino
acids were prominent in Hidaka-kombu, and this characteristic may be the main factor
in achieving high identification accuracy. However, Rausu-kombu and Rishiri-kombu
had similar fluorescence characteristics at approximately 280–330 and 350–440 nm, and
this similarity may have been a cause of misclassification. Lia et al. [
22
] have identified
fluorescent substances, such as chlorophyll and tocopherol, and distinguished olive oil
from Malta and other origins using fluorescence spectroscopy. Xie et al. [
23
] have used
the fluorescence properties of caffeine to determine coffee quality using a fluorescence
fingerprinting method. Ali et al. [
24
] have determined the quality of Sidr honey using a
fluorescence fingerprinting method by focusing on specific phenolic compounds (caffeic
acid, chlorogenic acid, and ferulic acid). The accuracy of the identification model may be
improved by combining fluorescence fingerprinting with chemical component analysis.
This model has the potential to be applied to different food categories and to identify
geographical origins, production areas, and other varieties. Fluorescence spectroscopy has
been used to identify the geographic origin and other varieties of rice [
15
], varieties and
geographic origin of olive oil [
16
], and botanical origin of Swiss honey [
17
]. Therefore, the
effectiveness of fluorescence spectroscopy has been confirmed for a wide variety of foods,
and its application for identifying kelp varieties may contribute to strengthening quality
control and traceability in the food manufacturing industry.
In future research, it is expected that the accuracy of identification will be improved
by integrating other spectroscopic techniques, such as NIR. NIR is excellent for measuring
the chemical components of food, and by combining it with fluorescence spectroscopy,
more multifaceted analysis will be possible. By utilizing multimodal machine learning
and integrating multiple spectroscopic data, it is expected that the accuracy of origin
identification will be improved.
In this study, only three types of kelp from Hokkaido were used, but the effects of
differences in region and harvest time were not taken into consideration. In the future, the
versatility of the model will be improved by using samples from other regions and different
harvest times.
5. Conclusions
The combination of fluorescence spectroscopy and a CNN is effective in identifying
the origin and variety of kelp, demonstrating its versatility and potential applications.
Future studies should aim to contribute to the prevention of fraud and quality control in
the food industry.
Expanding the dataset is necessary for practically using this model in the food manu-
facturing industry. In addition to the three varieties of kelp studied here, the versatility of
the model can be increased by targeting samples from other regions and different harvest
times. Furthermore, highly accurate models can be built using small datasets by utilizing
transfer learning to apply them to various food categories.
The identification accuracy can also be improved by integrating fluorescent fingerprint
data with other analytical methods. For example, through the supplementary use of
component analysis and spectral data, evaluation of food characteristics from multiple
angles and building an identification system with high accuracy and reliability are possible.
Appl. Sci. 2025,15, 1803 11 of 12
Ultimately, the results of this study will contribute to strengthening quality control and
prevention of origin fraud in the food manufacturing industry and improve
consumer trust.
Author Contributions: Conceptualization, T.M.; methodology, K.S. and R.A.; software, K.S., R.A. and
T.M.; validation, T.M.; data curation, K.S. and R.A.; writing—original draft preparation, K.S.; review,
Y.L.; writing—review and editing, T.M.; project administration, T.M.; funding acquisition, T.M. All
authors have read and agreed to the published version of the manuscript.
Funding: This work was supported by JSPS KAKENHI Grant Number JP22K02156. https://www.
jsps.go.jp/j-grantsinaid/16_rule/rule.html (accessed on 25 December 2024).
Data Availability Statement: The data that support the findings of this study are available from the
corresponding author upon reasonable request.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1.
Ministry of Agriculture, Forestry and Fisheries. About the System for Labeling the Origin of Ingredients in Processed Foods.
Available online: https://www.maff.go.jp/j/syouan/hyoji/gengen_hyoji.html (accessed on 25 December 2024). (In Japanese)
2.
European Union Laws. Available online: https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=CELEX:32018R0775 (accessed
on 25 December 2024).
3.
Code of Federal Regulations. Available online: https://www.ecfr.gov/current/title-19/chapter-I/part-134 (accessed on 25
December 2024).
4.
Hattori, S.; Tsukuda, M.; Homura, Y. Determination of the geographic origin of dried kelp by inorganic analysis. Nippon Suisan
Gakkaishi 2009,75, 77–82. (In Japanese) [CrossRef]
5.
Morohashi, T.; Aoyama, K.; Namikoshi, A.; Kimura, Y.; Hattori, S. Determination of the geographic origin of boiled and salted
wakame Undaria pinnatifida products by element analysis. Nippon Suisan Gakkaishi 2011,77, 243–245. (In Japanese) [CrossRef]
6.
Shimizu, K.; Kato, Y.; Kato, S.; Inoue, A.; Ojima, T.; Yasokawa, D. Technology for Geographic Origin Identification of Edible Kelps;
Report No. 11; Hokkaido Industrial Technology Center: Sapporo, Japan, 2010. (In Japanese)
7.
Kawai, T.; Yotsukura, N. Current remarks of phylogeny and taxonomy on genus Laminaria. Rishiri Stud. 2005,24, 37–47. (In
Japanese)
8.
Yang, S.; Li, C.; Mei, Y.; Liu, W.; Liu, R.; Chen, W.; Han, D.; Xu, K. Determination of the geographical origin of coffee beans using
terahertz spectroscopy combined with machine learning methods. Front. Nutr. 2021,8, 680627. [CrossRef]
9.
Berghian-Grosan, C.; Magdas, D.A. Application of Raman spectroscopy and machine learning algorithms for fruit distillates
discrimination. Sci. Rep. 2020,10, 21152. [CrossRef] [PubMed]
10.
Gu, H.W.; Zhou, H.H.; Lv, Y.; Wu, Q.; Pan, Y.; Peng, Z.X.; Zhang, X.H.; Yin, X.L. Geographical origin identification of Chinese red
wines using ultraviolet-visible spectroscopy coupled with machine learning techniques. J. Food Compos. Anal. 2023,119, 105265.
[CrossRef]
11.
Yan, X.; Xie, Y.; Chen, J.; Yuan, T.; Leng, T.; Chen, Y.; Xie, J.; Yu, Q. NIR spectrometric approach for geographical origin
identification and taste related compounds content prediction of Lushan Yunwu tea. Foods 2022,11, 2976. [CrossRef] [PubMed]
12.
Hu, L.; Zhang, Y.; Ju, Y.; Meng, X.; Yin, C. Rapid identification of rice geographical origin and adulteration by excitation-emission
matrix fluorescence spectroscopy combined with chemometrics based on fluorescence probe. Food Control 2023,146, 109547.
[CrossRef]
13.
Al Riza, D.F.; Kondo, N.; Rotich, V.K.; Perone, C.; Giametta, F. Cultivar and geographical origin authentication of Italian extra
virgin olive oil using front-face fluorescence spectroscopy and chemometrics. Food Control 2021,121, 107604. [CrossRef]
14.
Karoui, R.; Dufour, E.; Bosset, J.-O.; De Baerdemaeker, J. The use of front face fluorescence spectroscopy to classify the botanical
origin of honey samples produced in Switzerland. Food Chem. 2007,101, 314–323. [CrossRef]
15.
Strelec, I.; Kucko, L.; Roknic, D.; Mrsa, V.; Ugarcic-Hardi, Z. Spectrofluorimetric, spectrophotometric and chemometric analysis of
wheat grains infested by Sitophilus granaries.J. Stored Prod. Res. 2012,50, 42–48. [CrossRef]
16.
Yang, Q.; Tian, S.; Xu, H. Identification of the geographic origin of peaches by VIS-NIR spectroscopy, fluorescence spectroscopy
and image processing technology. J. Food Compos. Anal. 2022,114, 104843. [CrossRef]
17.
Hou, Z.; Jin, Y.; Gu, Z.; Zhang, R.; Su, Z.; Liu, S. 1H NMR spectroscopy combined with machine-learning algorithm for origin
recognition of Chinese famous green tea Longjing tea. Foods 2024,13, 2702. [CrossRef] [PubMed]
18.
Chen, M.; Guo, W.; Yi, X.; Jiang, Q.; Hu, X.; Peng, J.; Tian, J. Hyperspectral imaging combined with convolutional neural network
for Pu’er ripe tea origin recognition. J. Food Compos. Anal. 2025,139, 107093. [CrossRef]
Appl. Sci. 2025,15, 1803 12 of 12
19.
Chen, A.Q.; Wu, H.L.; Wang, T.; Wang, X.Z.; Sun, H.B.; Yu, R.Q. Intelligent analysis of excitation-emission matrix fluorescence
fingerprint to identify and quantify adulteration in camellia oil based on machine learning. Talanta 2023,251, 123733. [CrossRef]
[PubMed]
20.
Wu, M.; Li, M.; Fan, B.; Sun, Y.; Tong, L.; Wang, F.; Li, L. A rapid and low-cost method for detection of nine kinds of vegetable oil
adulteration based on 3-D fluorescence spectroscopy. LWT 2023,188, 115419. [CrossRef]
21. Hu, Y.; Wei, C.; Wang, X.; Wang, W.; Jiao, Y. Using three-dimensional fluorescence spectroscopy and machine learning for rapid
detection of adulteration in camellia oil. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2025,329, 125524. [CrossRef] [PubMed]
22.
Lia, F.; Formosa, J.P.; Zammit-Mangion, M.; Farrugia, C. The first identification of the uniqueness and authentication of Maltese
extra virgin olive oil using 3D-fluorescence spectroscopy coupled with multi-way data analysis. Foods 2020,9, 498. [CrossRef]
[PubMed]
23.
Xie, J.Y.; Tan, J. Front-face synchronous fluorescence spectroscopy: A rapid and non-destructive authentication method for
Arabica coffee adulterated with maize and soybean flours. J. Consum. Prot. Food Saf. 2022,17, 209–219. [CrossRef]
24.
Ali, H.; Khan, S.; Ullah, R.; Khan, B. Fluorescence fingerprints of Sidr honey in comparison with uni/polyfloral honey samples.
Eur. Food Res. Technol. 2020,246, 1829–1837. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Premium green tea is a high-value agricultural product significantly influenced by its geographical origin, making it susceptible to food fraud. This study utilized nuclear magnetic resonance (NMR) spectroscopy to perform chemical fingerprint analysis on 78 Longjing tea (LJT) samples from both protected designation of origin (PDO) regions (Zhejiang) and non-PDO regions (Sichuan, Guangxi, and Guizhou) in China. Unsupervised algorithms and heatmaps were employed for the visual analysis of the data from PDO and non-PDO teas while exploring the feasibility of linear and nonlinear machine-learning algorithms in discriminating the origin of LJT. The findings revealed that the nonlinear model random forest (92.2%), exhibited superior performance compared to the linear model linear discriminant analysis (85.6%). The random forest model identified 15 key marker metabolites for the geographical origin of LJT, such as kaempferol glycoside, glutamine, and ECG. The results support the conclusion that the integration of NMR with machine-learning classification serves as an effective tool for the quality assessment and origin identification of LJT.
Article
Full-text available
Lushan Yunwu Tea is one of a unique Chinese tea series, and total polyphenols (TP), free amino acids (FAA), and polyphenols-to-amino acids ratio models (TP/FAA) represent its most important taste-related indicators. In this work, a feasibility study was proposed to simultaneously predict the authenticity identification and taste-related indicators of Lushan Yunwu tea, using near-infrared spectroscopy combined with multivariate analysis. Different waveband selections and spectral pre-processing methods were compared during the discriminant analysis (DA) and partial least squares (PLS) model-building process. The DA model achieved optimal performance in distinguishing Lushan Yunwu tea from other non-Lushan Yunwu teas, with a correct classification rate of up to 100%. The synergy interval partial least squares (siPLS) and backward interval partial least squares (biPLS) algorithms showed considerable advantages in improving the prediction performance of TP, FAA, and TP/FAA. The siPLS algorithms achieved the best prediction results for TP (RP = 0.9407, RPD = 3.00), FAA (RP = 0.9110, RPD = 2.21) and TP/FAA (RP = 0.9377, RPD = 2.90). These results indicated that NIR spectroscopy was a useful and low-cost tool by which to offer definitive quantitative and qualitative analysis for Lushan Yunwu tea.
Article
In general, fluorescent probes are used to determine certain metal ions due to their high selectivity, whereas the weakly selective probe could produce different fluorescence spectra after interacting with numerous metal ions. Based on the different species and contents of metal ions in rice, the weakly selective fluorescence probe combined with chemometrics for rice origin traceability and adulteration identification was studied in this study. Excitation-emission matrix spectra (EEMs) of rice extracts from different geographical origins (including adulterated rice) combined with the weakly selective probe were collected. Considering the three-dimensional (3D) characteristics of the EEMs, multi-dimensional principal component analysis (M-PCA) and unfold partial least squares discriminant analysis (U-PLS-DA) pattern recognition methods were used to extract useful information from complex 3D fluorescence data. And the models were built to classify the origin and adulteration of rice. The results of the M-PCA analysis showed that rice from different origins could not be completely distinguished from each other, but there was a clustering trend. It was suggested that the 3D fluorescence data measured after the reaction of the weakly selective probe with rice extracts may be used for rice origin traceability combined with pattern recognition. The analysis of 3D fluorescence data based on U-PLS-DA showed that the classification accuracy of training sets was 100%, and the accuracy of predicted sets was 98%. The results of the M-PCA analysis also showed that rice with different adulteration ratios had a clustering trend. The accurate recognition rate of training sets and predicted sets after U-PLS-DA analysis of adulterated rice was 99% and 95%, respectively. The results showed that the weakly selective probe could be used for rice origin traceability and adulteration identification after the reaction with rice extracts, combined with pattern recognition methods. This study broadens the application range of the fluorescent probe, which can be used for origin traceability and adulteration recognition of foods that cannot produce fluorescence themselves.
Article
Identifying the geographic origin of peaches will not only help producers obtain higher economic benefits, but also enable consumers to buy the most satisfactory fruits. In this study, the feasibility of distinguishing the geographic origin of four traditional famous peaches in China by visible-near infrared spectroscopy, fluorescence spectroscopy and image processing technology was explored. Visible-near infrared spectra and fluorescence spectra of 397 nm–1175 nm and color characteristics extracted from images were used to establish the support vector machine, k-nearest neighbor, random forest and extreme learning machine classification models. The factors most related to the geographic origin were found by decision tree analysis. The results showed that the support vector machine models had the highest classification accuracy, some reaching 100%. In order to improve the calculation speed, the spectral principal components were used, resulting in the accuracy of support vector machine, k-nearest neighbor and random forest models more than 95%. The decision tree showed that R value, the first principal component of fluorescence spectra and H value played a decisive role in identifying the geographic origin, leading to the accuracy of support vector machine, k-nearest neighbor and random forest models more than 95%. This study compared the advantages and disadvantages of visible-near infrared spectroscopy, fluorescence spectroscopy and image processing technology in identifying geographic origin, and found that the combination of these three methods could effectively distinguished the geographic origin of peaches.
Article
This article describes a novel front-face synchronous fluorescence spectroscopy (FFSFS) method for the fast and non-invasive authentication of ground roasted Arabica coffee adulterated with roasted maize and soybean flours. The detection was based on the different composition of fluorescent Maillard reaction products and caffeine in roasted coffee and cereal flours. For each roasted maize or soybean adulterant flour (5-40 wt%), principal component analysis coupled with linear discriminant analysis (PCA-LDA) was used for qualitative discrimination. Quantitative prediction models were constructed based on the combination of unfolded total synchronous fluorescence spectra and partial least square regression (PLSR), followed by fivefold cross-validation and external validation. The PLSR models produced suitable results, with the determination coefficient of prediction (R p 2) > 0.9, root mean square error of prediction (RMSEP) < 5%, relative error of prediction (REP) < 25% and residual predictive deviation (RPD) > 3. The limits of detection (LOD) were both 10% for roasted maize and soybean flours. Most relative errors for the prediction of simulated blind samples were between -30% and + 30%. The benefits of this strategy are simplicity, rapidity, and non-destructive detection. However, owing to the high similarity between roasted coffee and roasted cereal flours and the influence of the roasting degree on fluorescent Maillard reaction products, its application is limited to the preliminary screening of roasted coffee with the same roasting degree, adulterated with relatively large amounts of roasted cereal flours which are roasted to analogous color to the coffee. Supplementary information: The online version contains supplementary material available at 10.1007/s00003-022-01396-8.
Article
Camellia oil (CAO) is a premium edible vegetable oil with medical value and biological activity, but it is susceptible to adulteration. Therefore, the demand for intelligent analysis to decipher the category and proportion of adulterated oil in CAO was the main driver of this work. Excitation-emission matrix fluorescence (EEMF) spectra of 933 vegetable oil samples were characterized by a chemometric method to obtain chemically meaningful information. Authenticity identification models were constructed using four machine learning methods to realize the discrimination of oil species adulterated in CAO mixtures. Meanwhile, quantitative models were established aiming at the fraud of CAO proportion in blended oil. Results showed that the specially constructed CNN obtained the optimal performance when evaluating unseen real-world samples, with a classification accuracy of 95.8% and 92.2%, and mean-absolute quantitative errors between 2.6–6.7%. Therefore, EEMF fingerprints coupled with machine learning are expected to provide intelligent and accurate analysis for authenticity detection of CAO.