ArticlePDF AvailableLiterature Review

Abstract and Figures

Machine learning has emerged with big data technologies and high-performance computing to create new opportunities for data intensive science in the multi-disciplinary agri-technologies domain. In this paper, we present a comprehensive review of research dedicated to applications of machine learning in agricultural production systems. The works analyzed were categorized in (a) crop management, including applications on yield prediction, disease detection, weed detection crop quality, and species recognition; (b) livestock management, including applications on animal welfare and livestock production; (c) water management; and (d) soil management. The filtering and classification of the presented articles demonstrate how agriculture will benefit from machine learning technologies. By applying machine learning to sensor data, farm management systems are evolving into real time artificial intelligence enabled programs that provide rich recommendations and insights for farmer decision support and action.
Content may be subject to copyright.
Machine Learning in Agriculture: A Review
Konstantinos G. Liakos 1, Patrizia Busato 2, Dimitrios Moshou 1,3, Simon Pearson 4ID
and Dionysis Bochtis 1, *ID
1Institute for Bio-Economy and Agri-Technology (IBO), Centre of Research and
Technology—Hellas (CERTH), 6th km Charilaou-Thermi Rd, GR 57001 Thessaloniki, Greece; (K.G.L.); (D.M.)
2Department of Agriculture, Forestry and Food Sciences (DISAFA), Faculty of Agriculture,
University of Turin, Largo Braccini 2, 10095 Grugliasco, Italy;
3Agricultural Engineering Laboratory, Faculty of Agriculture, Aristotle University of Thessaloniki,
54124 Thessaloniki, Greece
4Lincoln Institute for Agri-food Technology (LIAT), University of Lincoln, Brayford Way, Brayford Pool,
Lincoln LN6 7TS, UK,
*Correspondence:; Tel.: +30-2310-498210
Received: 27 June 2018; Accepted: 7 August 2018; Published: 14 August 2018
Machine learning has emerged with big data technologies and high-performance computing
to create new opportunities for data intensive science in the multi-disciplinary agri-technologies
domain. In this paper, we present a comprehensive review of research dedicated to applications
of machine learning in agricultural production systems. The works analyzed were categorized in
(a) crop management, including applications on yield prediction, disease detection, weed detection
crop quality, and species recognition; (b) livestock management, including applications on animal
welfare and livestock production; (c) water management; and (d) soil management. The filtering
and classification of the presented articles demonstrate how agriculture will benefit from machine
learning technologies. By applying machine learning to sensor data, farm management systems are
evolving into real time artificial intelligence enabled programs that provide rich recommendations
and insights for farmer decision support and action.
crop management; water management; soil management; livestock management; artificial
intelligence; planning; precision agriculture
1. Introduction
Agriculture plays a critical role in the global economy. Pressure on the agricultural system will
increase with the continuing expansion of the human population. Agri-technology and precision
farming, now also termed digital agriculture, have arisen as new scientific fields that use data
intense approaches to drive agricultural productivity while minimizing its environmental impact.
The data generated in modern agricultural operations is provided by a variety of different sensors that
enable a better understanding of the operational environment (an interaction of dynamic crop, soil,
and weather conditions) and the operation itself (machinery data), leading to more accurate and faster
decision making.
Machine learning (ML) has emerged together with big data technologies and high-performance
computing to create new opportunities to unravel, quantify, and understand data intensive processes
in agricultural operational environments. Among other definitions, ML is defined as the scientific field
that gives machines the ability to learn without being strictly programmed [
]. Year by year, ML applies
in more and more scientific fields including, for example, bioinformatics [
], biochemistry [
Sensors 2018,18, 2674; doi:10.3390/s18082674
Sensors 2018,18, 2674 2 of 29
medicine [
], meteorology [
], economic sciences [
], robotics [
], aquaculture [
and food security [19,20], and climatology [21].
In this paper, we present a comprehensive review of the application of ML in agriculture.
A number of relevant papers are presented that emphasise key and unique features of popular
ML models. The structure of the present work is as follows: the ML terminology, definition, learning
tasks, and analysis are initially given in Section 2, along with the most popular learning models and
algorithms. Section 3presents the implemented methodology for the collection and categorization of
the presented works. Finally, in Section 4, the advantages derived from the implementation of ML in
agri-technology are listed, as well as the future expectations in the domain.
Because of the large number of abbreviations used in the relative scientific works, Tables 14list
the abbreviations that appear in this work, categorized to ML models, algorithms, statistical measures,
and general abbreviations, respectively.
Table 1. Abbreviations for machine learning models.
Abbreviation Model
ANNs artificial neural networks
BM bayesian models
DL deep learning
DR dimensionality reduction
DT decision trees
EL ensemble learning
IBM instance based models
SVMs support vector machines
Table 2. Abbreviations for machine learning algorithms.
Abbreviation Algorithm
ANFIS adaptive-neuro fuzzy inference systems
Bagging bootstrap aggregating
BBN bayesian belief network
BN bayesian network
BPN back-propagation network
CART classification and regression trees
CHAID chi-square automatic interaction detector
CNNs convolutional neural networks
CP counter propagation
DBM deep boltzmann machine
DBN deep belief network
DNN deep neural networks
ELMs extreme learning machines
EM expectation maximisation
ENNs ensemble neural networks
GNB gaussian naive bayes
GRNN generalized regression neural network
KNN k-nearest neighbor
LDA linear discriminant analysis
LS-SVM least squares-support vector machine
LVQ learning vector quantization
LWL locally weighted learning
MARS multivariate adaptive regression splines
MLP multi-layer perceptron
MLR multiple linear regression
MOG mixture of gaussians
OLSR ordinary least squares regression
Sensors 2018,18, 2674 3 of 29
Table 2. Cont.
Abbreviation Algorithm
PCA principal component analysis
PLSR partial least squares regression
RBFN radial basis function networks
RF random forest
SaE-ELM self adaptive evolutionary-extreme learning machine
SKNs supervised kohonen networks
SOMs self-organising maps
SPA-SVM successive projection algorithm-support vector machine
SVR support vector regression
Table 3. Abbreviations for statistical measures for the validation of machine learning algorithms.
Abbreviation Measure
APE average prediction error
MABE mean absolute bias error
MAE mean absolute error
MAPE mean absolute percentage error
MPE mean percentage error
NS nash-sutcliffe coefficient
R radius
R2coefficient of determination
RMSE root mean squared error
RMSEP root mean square error of prediction
RPD relative percentage difference
RRMSE average relative root mean square error
Table 4. General abbreviations.
AUS aircraft unmanned system
Cd cadmium
FBG fiber bragg grating
HSV hue saturation value color space
K potassium
MC moisture content
Mg magnesium
ML machine learning
NDVI normalized difference vegetation index
NIR near infrared
OC organic carbon
Rb rubidium
RGB red green blue
TN total nitrogen
UAV unmanned aerial vehicle
VIS-NIR visible-near infrared
2. An Overview on Machine Learning
2.1. Machine Learning Terminology and Definitions
Typically, ML methodologies involves a learning process with the objective to learn from
“experience” (training data) to perform a task. Data in ML consists of a set of examples. Usually,
an individual example is described by a set of attributes, also known as features or variables. A feature
can be nominal (enumeration), binary (i.e., 0 or 1), ordinal (e.g., A+ or B
), or numeric (integer, real
Sensors 2018,18, 2674 4 of 29
number, etc.). The performance of the ML model in a specific task is measured by a performance
metric that is improved with experience over time. To calculate the performance of ML models and
algorithms, various statistical and mathematical models are used. After the end of the learning process,
the trained model can be used to classify, predict, or cluster new examples (testing data) using the
experience obtained during the training process. Figure 1shows a typical ML approach.
Sensors2018,18,xFORPEERREVIEW 4of31
Figure 1. A typical machine learning approach.
ML tasks are typically classified into different broad categories depending on the learning type
(supervised/unsupervised), learning models (classification, regression, clustering, and dimensionality
reduction), or the learning models employed to implement the selected task.
2.2. Tasks of Learning
ML tasks are classified into two main categories, that is, supervised and unsupervised learning,
depending on the learning signal of the learning system. In supervised learning, data are presented
with example inputs and the corresponding outputs, and the objective is to construct a general rule that
maps inputs to outputs. In some cases, inputs can be only partially available with some of the target
outputs missing or given only as feedback to the actions in a dynamic environment (reinforcement
learning). In the supervised setting, the acquired expertise (trained model) is used to predict the
missing outputs (labels) for the test data. In unsupervised learning, however, there is no distinction
between training and test sets with data being unlabeled. The learner processes input data with the
goal of discovering hidden patterns.
2.3. Analysis of Learning
Dimensionality reduction (DR) is an analysis that is executed in both families of supervised
and unsupervised learning types, with the aim of providing a more compact, lower-dimensional
representation of a dataset to preserve as much information as possible from the original data. It is
usually performed prior to applying a classification or regression model in order to avoid the effects of
dimensionality. Some of the most common DR algorithms are the following: (i) principal component
analysis [22], (ii) partial least squares regression [23], and (iii) linear discriminant analysis [24].
2.4. Learning Models
The presentation of the learning models in ML is limited to the ones that have been implemented
in works presented in this review.
2.4.1. Regression
Regression constitutes a supervised learning model, which aims to provide the prediction of an
output variable according to the input variables, which are known. Most known algorithms include
linear regression and logistic regression [
], as well as stepwise regression [
]. Also, more complex
regression algorithms have been developed, such as ordinary least squares regression [
], multivariate
Sensors 2018,18, 2674 5 of 29
adaptive regression splines [
], multiple linear regression, cubist [
], and locally estimated scatterplot
smoothing [30].
2.4.2. Clustering
Clustering [
] is a typical application of unsupervised learning model, typically used to
find natural groupings of data (clusters). Well established clustering techniques are the k-means
technique [32], the hierarchical technique [33], and the expectation maximisation technique [34].
2.4.3. Bayesian Models
Bayesian models (BM) are a family of probabilistic graphical models in which the analysis is
undertaken within the context of Bayesian inference. This type of model belongs to the supervised
learning category and can be employed for solving either classification or regression problems.
Naive bayes [
], gaussian naive bayes, multinomial naive bayes, bayesian network [
], mixture
of gaussians [
], and bayesian belief network [
] are some of the most prominent algorithms in
the literature.
2.4.4. Instance Based Models
Instance based models (IBM) are memory-based models that learn by comparing new examples
with instances in the training database. They construct hypotheses directly from the data available,
while they do not maintain a set of abstractions, and generate classification or regression predictions
using only specific instances. The disadvantage of these models is that their complexity grows with
data. The most common learning algorithms in this category are the k-nearest neighbor [
], locally
weighted learning [40], and learning vector quantization [41].
2.4.5. Decision Trees
Decision trees (DT) are classification or regression models formulated in a tree-like
architecture [
]. With DT, the dataset is progressively organized in smaller homogeneous subsets
(sub-populations), while at the same time, an associated tree graph is generated. Each internal node
of the tree structure represents a different pairwise comparison on a selected feature, whereas each
branch represents the outcome of this comparison. Leaf nodes represent the final decision or prediction
taken after following the path from root to leaf (expressed as a classification rule). The most common
learning algorithms in this category are the classification and regression trees [
], the chi-square
automatic interaction detector [44], and the iterative dichotomiser [45].
2.4.6. Artificial Neural Networks
Artificial neural networks (ANNs) are divided into two categories; “Traditional ANNs” and
“Deep ANNs”.
ANNs are inspired by the human brain functionality, emulating complex functions such as
pattern generation, cognition, learning, and decision making [
]. The human brain consists of billions
of neurons that inter-communicate and process any information provided. Similarly, an ANN as a
simplified model of the structure of the biological neural network, consists of interconnected processing
units organized in a specific topology. A number of nodes are arranged in multiple layers including
the following:
1. An input layer where the data is fed into the system,
2. One or more hidden layers where the learning takes place, and
3. An output layer where the decision/prediction is given.
Sensors 2018,18, 2674 6 of 29
ANNs are supervised models that are typically used for regression and classification problems.
The learning algorithms commonly used in ANNs include the radial basis function networks [
perceptron algorithms [
], back-propagation [
], and resilient back-propagation [
Also, a large
number of ANN-based learning algorithms have been reported, such as counter propagation
algorithms [
], adaptive-neuro fuzzy inference systems [
], autoencoder, XY-Fusion, and supervised
Kohonen networks [
], as well as Hopfield networks [
], multilayer perceptron [
], self-organising
maps [
], extreme learning machines [
], generalized regression neural network [
], ensemble neural
networks or ensemble averaging, and self-adaptive evolutionary extreme learning machines [59].
Deep ANNs are most widely referred to as deep learning (DL) or deep neural networks
(DNNs) [
]. They are a relatively new area of ML research allowing computational models that
are composed of multiple processing layers to learn complex data representations using multiple
levels of abstraction. One of the main advantages of DL is that in some cases, the step of feature
extraction is performed by the model itself. DL models have dramatically improved the state-of-the-art
in many different sectors and industries, including agriculture. DNN’s are simply an ANN with
multiple hidden layers between the input and output layers and can be either supervised, partially
supervised, or even unsupervised. A common DL model is the convolutional neural network (CNN),
where feature maps are extracted by performing convolutions in the image domain. A comprehensive
introduction on CNNs is given in the literature [
]. Other typical DL architectures include deep
Boltzmann machine, deep belief network [62], and auto-encoders [63].
2.4.7. Support Vector Machines
Support vector machines (SVMs) were first introduced in the work of [
] on the foundation of
statistical learning theory. SVM is intrinsically a binary classifier that constructs a linear separating
hyperplane to classify data instances. The classification capabilities of traditional SVMs can be
substantially enhanced through transformation of the original feature space into a feature space of
a higher dimension by using the “kernel trick”. SVMs have been used for classification, regression,
and clustering. Based on global optimization, SVMs deal with overfitting problems, which appear
in high-dimensional spaces, making them appealing in various applications [
]. Most used SVM
algorithms include the support vector regression [
], least squares support vector machine [
and successive projection algorithm-support vector machine [69].
2.4.8. Ensemble Learning
Ensemble learning (EL) models aim at improving the predictive performance of a given statistical
learning or model fitting technique by constructing a linear combination of simpler base learner.
Considering that each trained ensemble represents a single hypothesis, these multiple-classifier systems
enable hybridization of hypotheses not induced by the same base learner, thus yielding better results in
the case of significant diversity among the single models. Decision trees have been typically used as the
base learner in EL models, for example, random forest [
], whereas a large number of boosting and
bagging implementations have been also proposed, for example, boosting technique [
], adaboost [
and bootstrap aggregating or bagging algorithm [73].
3. Review
The reviewed articles have been, on a first level, classified in four generic categories; namely, crop
management, livestock management, water management, and soil management. The applications of
ML in the crop section were divided into sub-categories including yield prediction, disease detection,
weed detection crop quality, and species recognition. The applications of ML in the livestock section
were divided into two sub-categories; animal welfare and livestock production.
Sensors 2018,18, 2674 7 of 29
The search engines implemented were Scopus, ScienceDirect and PubMed. The selected articles
regard works presented solely in journal papers. Climate prediction, although very important for
agricultural production, has not been included in the presented review, considering the fact that ML
applications for climate prediction is a complete area by itself. Finally, all articles presented here regard
the period from 2004 up to the present.
3.1. Crop Management
3.1.1. Yield Prediction
Yield prediction, one of the most significant topics in precision agriculture, is of high importance
for yield mapping, yield estimation, matching of crop supply with demand, and crop management to
increase productivity. Examples of ML applications include in those in the works of [
]; an efficient,
low-cost, and non-destructive method that automatically counted coffee fruits on a branch. The method
calculates the coffee fruits in three categories: harvestable, not harvestable, and fruits with disregarded
maturation stage. In addition, the method estimated the weight and the maturation percentage of the
coffee fruits. The aim of this work was to provide information to coffee growers to optimise economic
benefits and plan their agricultural work. Another study that used for yield prediction is that by
the authors of [
], in which they developed a machine vision system for automating shaking and
catching cherries during harvest. The system segments and detects occluded cherry branches with
full foliage even when these are inconspicuous. The main aim of the system was to reduce labor
requirements in manual harvesting and handling operations. In another study [
], authors developed
an early yield mapping system for the identification of immature green citrus in a citrus grove under
outdoor conditions. As all other relative studies, the aim of the study was to provide growers with
yield-specific information to assist them to optimise their grove in terms of profit and increased yield.
In another study [
], the authors developed a model for the estimation of grassland biomass (kg dry
matter/ha/day) based on ANNs and multitemporal remote sensing data. Another study dedicated
to yield prediction, and specifically to wheat yield prediction, was presented in another study [
The developed method used satellite imagery and received crop growth characteristics fused with
soil data for a more accurate prediction. The authors of [
] presented a method for the detection of
tomatoes based on EM and remotely sensed red green blue (RGB) images, which were captured by an
unmanned aerial vehicle (UAV). Also, in the work of [
], authors developed a method for the rice
development stage prediction based on SVM and basic geographic information obtained from weather
stations in China. Finally, a generalized method for agricultural yield predictions, was presented in
another study [
]. The method is based on an ENN application on long-period generated agronomical
data (1997–2014). The study regards regional predictions (specifically in in Taiwan) focused on the
supporting farmers to avoid imbalances in market supply and demand caused or hastened by harvest
crop quality.
Table 5summarizes the above papers for the case of yield prediction sub-category.
Sensors 2018,18, 2674 8 of 29
Table 5. Crop: yield prediction table.
Article Crop Observed Features Functionality Models/Algorithms Results
[74] Coffee Forty-two (42) color features in digital
images illustrating coffee fruits
Automatic count of coffee
fruits on a coffee branch SVM
Ripe/overripe: 82.54–87.83%
visibility percentage
Semi-ripe: 68.25–85.36%
visibility percentage
Not harvestable:
Unripe: 76.91–81.39%
visibility percentage
[75] Cherry Colored digital images depicting leaves,
branches, cherry fruits, and the background
Detection of cherry
branches with full foliage BM/GNB 89.6% accuracy
[76] Green citrus
Image features (form 20 ×20 pixels digital
images of unripe green citrus fruits)
such as coarseness, contrast, directionality,
line-likeness, regularity, roughness,
granularity, irregularity, brightness,
smoothness, and fineness
Identification of the number
of immature green citrus
fruit under natural
outdoor conditions
SVM 80.4% accuracy
[77] Grass Vegetation indices, spectral bands of red
and NIR
Estimation of grassland
biomass (kg dry
matter/ha/day) for two
managed grassland farms
in Ireland; Moorepark
and Grange
R2= 0.85
RMSE = 11.07
R2= 0.76
RMSE = 15.35
[78] Wheat Normalized values of on-line predicted soil
parameters and the satellite NDVI
Wheat yield prediction
within field variation ANN/SNKs 81.65% accuracy
[79] Tomato High spatial resolution RGB images
Detection of tomatoes via
RGB images captured
by UAV
Recall: 0.6066
Precision: 0.9191
F-Measure: 0.7308
Sensors 2018,18, 2674 9 of 29
Table 5. Cont.
Article Crop Observed Features Functionality Models/Algorithms Results
[80] Rice
Agricultural, surface weather, and soil
physico-chemical data with yield or
development records
Rice development stage
prediction and yield
Middle-season rice:
Tillering stage:
RMSE (kg h1m2) = 126.8
Heading stage:
RMSE (kg h1m2) = 96.4
Milk stage:
RMSE (kg h1m2) = 109.4
Early rice:
Tillering stage:
RMSE (kg h1m2) = 88.3
Heading stage:
RMSE (kg h1m2) = 68.0
Milk stage:
RMSE (kg h1m2) = 36.4
Late rice:
Tillering stage:
RMSE (kg h1m2) = 89.2
Heading stage:
RMSE (kg h1m2) = 69.7
Milk stage:
RMSE (kg h1m2) = 46.5
[81] General Agriculture data: meteorological,
environmental, economic, and harvest
Method for the accurate
analysis for agricultural
yield predictions
BPN based 1.3% error rate
Sensors 2018,18, 2674 10 of 29
3.1.2. Disease Detection
Disease detection and yield prediction are the sub-categories with the higher number of articles
presented in this review. One of the most significant concerns in agriculture is pest and disease
control in open-air (arable farming) and greenhouse conditions. The most widely used practice
in pest and disease control is to uniformly spray pesticides over the cropping area. This practice,
although effective, has a high financial and significant environmental cost. Environmental impacts
can be residues in crop products, side effects on ground water contamination, impacts on local
wildlife and eco-systems, and so on. ML is an integrated part of precision agriculture management,
where agro-chemicals input is targeted in terms of time and place. In the literature [
], a tool
is presented for the detection and discrimination of healthy Silybum marianum plants and those
infected by smut fungus Microbotyum silybum during vegetative growth. In the work of [
], authors
developed a new method based on image processing procedure for the classification of parasites
and the automatic detection of thrips in strawberry greenhouse environment, for real-time control.
The authos of [
] presented a method for detection and screening of Bakanae disease in rice seedlings.
More specifically, the aim of the study was the accurate detection of pathogen Fusarium fujikuroi for
two rice cultivars. The automated detection of infected plants increased grain yield and was less
time-consuming compared with naked eye examination.
Wheat is one of the most economically significant crops worldwide. The last five studies presented
in this sub-category are dedicated to the detection and discrimination between diseased and healthy
wheat crops. The authors of [
] developed a new system for the detection of nitrogen stressed,
and yellow rust infected and healthy winter wheat canopies based on hierarchical self-organizing
classifier and hyperspectral reflectance imaging data. The study aimed at the accurate detection of these
categories for a more effective usage of fungicides and fertilizers according to the plant’s needs. In the
next case study [
], the development of a system was presented that automatically discriminated
between water stressed Septoria tritici infected and healthy winter wheat canopies. The approach used
an least squares (LS)-SVM classifier with optical multisensor fusion. The authors of [
] presented a
method to detect either yellow rust infected or healthy wheat, based on ANN models and spectral
reflectance features. The accurate detection of either infected or healthy plants enables the precise
targeting of pesticides in the field. In the work of [
], a real time remote sensing system is presented
for the detection of yellow rust infected and healthy wheat. The system is based on a self-organising
map (SOM) neural network and data fusion of hyper-spectral reflection and multi-spectral fluorescence
imaging. The goal of the study was the accurate detection, before it can visibly detected, of yellow
rust infected winter wheat cultivar “Madrigal”. Finally, the authors of [
] presented a method for
the simultaneous identification and discrimination of yellow rust infected, and nitrogen stressed and
healthy wheat plants of cultivar “Madrigal”. The approach is based on an SOM neural network and
hyperspectral reflectance imaging. The aim of the study was the accurate discrimination between
the plant stress, which is caused by disease and nutrient deficiency stress under field conditions.
Finally, the author of [
] presented a CNN-based method for the disease detection diagnosis based
on simple leaves images with sufficient accuracy to classify between healthy and diseased leaves in
various plants.
Table 6summarizes the above papers for the case of the disease detection sub-category.
Sensors 2018,18, 2674 11 of 29
Table 6. Crop: disease detection table.
Author Crop Observed Features Functionality Models/Algorithms Results
[82]Silybum marianum
Images with leaf spectra
using a handheld visible
and NIR spectrometer
Detection and
discrimination between
healthy Silybum marianum
plants and those that are
infected by smut fungus
Microbotyum silybum
ANN/XY-Fusion 95.16% accuracy
[83] Strawberry
Region index: ratio of
major diameter to minor
diameter; and color
indexes: hue, saturation,
and intensify
Classification of parasites
and automatic detection
of thrips
SVM MPE = 2.25%
[84] Rice
Morphological and color
traits from healthy and
infected from Bakanae
disease, rice seedlings,
for cultivars Tainan 11
and Toyonishiki
Detection of Bakanae
disease, Fusarium fujikuroi,
in rice seedlings
SVM 87.9% accuracy
[85] Wheat Hyperspectral reflectance
imaging data
Detection of nitrogen
stressed, yellow rust
infected and healthy
winter wheat canopies
Nitrogen stressed: 99.63% accuracy
Yellow rust: 99.83% accuracy
Healthy: 97.27% accuracy
[86] Wheat Spectral reflectance and
fluorescence features
Detection of water
stressed, Septoria tritici
infected, and healthy
winter wheat canopies
Four scenarios:
Control treatment, healthy and well
supplied with water: 100% accuracy
Inoculated treatment, with Septoria
tritici and well supplied with water:
98.75% accuracy
Healthy treatment and deficient water
supply: 100% accuracy
Inoculated treatment and deficient
water supply: 98.7% accuracy
Sensors 2018,18, 2674 12 of 29
Table 6. Cont.
Author Crop Observed Features Functionality Models/Algorithms Results
[87] Wheat Spectral reflectance
Detection of yellow rust
infected and healthy
winter wheat canopies
ANN/MLP Yellow rust infected wheat: 99.4% accuracy
Healthy: 98.9% accuracy
[88] Wheat
Data fusion of
hyper-spectral reflection
and multi-spectral
fluorescence imaging
Detection of yellow rust
infected and healthy
winter wheat under field
ANN/SOM Yellow rust infected wheat: 99.4% accuracy
Healthy: 98.7% accuracy
[89] Wheat Hyperspectral
reflectance images
Identification and
discrimination of yellow
rust infected, nitrogen
stressed, and healthy
winter wheat in
field conditions
Yellow rust infected wheat: 99.92% accuracy
Nitrogen stressed: 100% accuracy
Healthy: 99.39% accuracy
approach for various
crops (25 in total)
Simple leaves images
of healthy and
diseased plants
Detection and diagnosis
of plant diseases DNN/CNN 99.53% accuracy
Sensors 2018,18, 2674 13 of 29
3.1.3. Weed Detection
Weed detection and management is another significant problem in agriculture. Many producers
indicate weeds as the most important threat to crop production. The accurate detection of weeds is
of high importance to sustainable agriculture, because weeds are difficult to detect and discriminate
from crops. Again, ML algorithms in conjunction with sensors can lead to accurate detection and
discrimination of weeds with low cost and with no environmental issues and side effects. ML for
weed detection can enable the development of tools and robots to destroy weeds, which minimise
the need for herbicides. Two studies on ML applications for weed detection issues in agriculture
have been presented. In the first study [
], authors presented a new method based on counter
propagation (CP)-ANN and multispectral images captured by unmanned aircraft systems (UAS) for
the identification of Silybum marianum, a weed that is hard to eradicate and causes major loss on
crop yield. In the second study [
], the authors developed a new method based on ML techniques
and hyperspectral imaging, for crop and weed species recognition. More specifically, the authors
created an active learning system for the recognition of Maize (Zea mayas), as crop plant species
and Ranunculus repens, Cirsium arvense, Sinapis arvensis, Stellaria media, Tarraxacum officinale,
Poa annua, Polygonum persicaria, Urtica dioica, Oxalis europaea, and Medicago lupulina as weed
species. The main goal was the accurate recognition and discrimination of these species for economic
and environmental purposes. In another study [
], the authors developed a weed detection method
based on SVN in grassland cropping.
Table 7summarizes the above papers for the case of weed detection sub-category.
Table 7. Crop: Weed detection table.
Author Observed Features Functionality Models/Algorithms Results
Spectral bands of
red, green, and NIR
and texture layer
Detection and
mapping of
Silybum marianum
ANN/CP 98.87% accuracy
Spectral features
from hyperspectral
Recognition and
discrimination of
Zea mays and
weed species
SOM and
Zea mays: SOM = 100%
accuracy MOG = 100%
Weed species: SOM =
53–94% accuracy
MOG = 31–98% accuracy
Camera images of
grass and various
weeds types
Reporting on
performance of
methods for grass
vs. weed detection
97.9% Again Rumex
94.65% Urtica
95.1% for mixed
weed and mixed
weather conditions
3.1.4. Crop Quality
The penultimate sub-category for the crop category is studies developed for the identification
of features connected with the crop quality. The accurate detection and classification of crop quality
characteristics can increase product price and reduce waste. In the first study [
], the authors presented
and developed a new method for the detection and classification of botanical and non-botanical
foreign matter embedded inside cotton lint during harvesting. The aim of the study was quality
improvement while the minimising fiber damage. Another study [
] regards pears production and,
more specifically, a method was presented for the identification and differentiation of Korla fragrant
pears into deciduous-calyx or persistent-calyx categories. The approach applied ML methods with
hyperspectral reflectance imaging. The final study for this sub-category was by the authors of [
in which a method was presented for the prediction and classification of the geographical origin for
Sensors 2018,18, 2674 14 of 29
rice samples. The method was based on ML techniques applied on chemical components of samples.
More specifically, the main goal was the classification of the geographical origin of rice, for two
different climate regions in Brazil; Goias and Rio Grande do Sul. The results showed that Cd, Rb, Mg,
and K are the four most relevant chemical components for the classification of samples.
Table 8summarizes the above presented articles.
Table 8. Crop: crop quality table.
Author Crop Observed Features Functionality Models/Algorithms Results
[94] Cotton
Short wave infrared
transmittance images
depicting cotton along
with botanical and
non-botanical types of
foreign matter
Detection and
classification of common
types of botanical and
non-botanical foreign
matter that are embedded
inside the cotton lint
According to the
optimal selected
the classification
accuracies are over
95% for the spectra
and the images.
[95] Pears Hyperspectral
reflectance imaging
Identification and
differentiation of Korla
fragrant pears into
deciduous-calyx or
persistent-calyx categories
Deciduous-calyx pears:
93.3% accuracy
Persistent-calyx pears:
96.7% accuracy
[96] Rice
Twenty (20) chemical
components that were
found in composition of
rice samples with
inductively coupled
plasma mass spectrometry
Prediction and
classification of
geographical origin of a
rice sample
EL/RF 93.83% accuracy
3.1.5. Species Recognition
The last sub-category of crop category is the species recognition. The main goal is the automatic
identification and classification of plant species in order to avoid the use of human experts, as well as
to reduce the classification time. A method for the identification and classification of three legume
species, namely, white beans, red beans, and soybean, via leaf vein patterns has been presented in [
Vein morphology carries accurate information about the properties of the leaf. It is an ideal tool for
plant identification in comparison with color and shape.
Table 9summarizes the above study for the case of species recognition sub-category.
Table 9. Crop: Species recognition.
Author Crop Observed
Features Functionality Models/Algorithms Results
[97] Legume
Vein leaf
images of white
and red beans
as well as and
Identification and
classification of three
legume species:
soybean, and white
and red bean
White bean: 90.2%
Red bean: 98.3%
Soybean: 98.8%
accuracy for five
CNN layers
3.2. Livestock Management
The livestock category consists of two sub-categories, namely, animal welfare and livestock
production. Animal welfare deals with the health and wellbeing of animals, with the main application
of ML in monitoring animal behaviour for the early detection of diseases. On the other hand, livestock
production deals with issues in the production system, where the main scope of ML applications is the
accurate estimation of economic balances for the producers based on production line monitoring.
Sensors 2018,18, 2674 15 of 29
3.2.1. Animal Welfare
Several articles are reported to belong to the animal welfare sub-category. In the first article [98],
a method is presented for the classification of cattle behaviour based on ML models using data collected
by collar sensors with magnetometers and three-axis accelerometers. The aim of the study was the
prediction of events such as the oestrus and the recognition of dietary changes on cattle. In the second
article [
], a system was presented for the automatic identification and classification of chewing
patterns in calves. The authors created a system based on ML applying data from chewing signals of
dietary supplements, such as hay and ryegrass, combined with behaviour data, such as rumination
and idleness. Data was collected by optical FBG sensors. In another study [
], an automated
monitoring system based on ML was presented for animal behavior tracking, including tracking of
animal movements by depth video cameras, for monitoring various activities of the animal (standing,
moving, feeding, and drinking).
Table 10 summarizes the features of the above presented articles.
3.2.2. Livestock Production
The sub-category of livestock production regards studies developed for the accurate prediction
and estimation of farming parameters to optimize the economic efficiency of the production system.
This sub-category consists of the presentation of four articles, three with cattle production and one for
hens’ eggs production. In the work of [
], a method for the prediction of the rumen fermentation
pattern from milk fatty acids was presented. The main aim of the study was to achieve the most
accurate prediction of rumen fermentations, which play a significant role for the evaluation of diets for
milk production. In addition, this work showed that milk fatty acids have ideal features to predict
the molar proportions of volatile fatty acids in the rumen. The next study [
] was related to hen
production. Specifically, a method based on SVM model was presented for the early detection and
warning of problems in the commercial production of eggs. Based on SVM methods [
], a method for
the accurate estimation of bovine weight trajectories over time was presented. The accurate estimation
of cattle weights is very important for breeders. The last article of the section [
] deals with the
development of a function for the prediction of carcass weight for beef cattle of the Asturiana de los
Valles breed based on SVR models and zoometric measurements features. The results show that the
presented method can predict carcass weights 150 days prior to the slaughter day. The authors of [
presented a method based on convolutional neural networks (CNNs) applied in digital images for
pig face recognition. The main aim of the research was the identification of animals without the need
for radio frequency identification (RFID) tags, which involve a distressing activity for the animal, are
limited in their range, and are a time-consuming method.
Table 11 summarizes the features of the above presented works.
3.3. Water Management
Water management in agricultural production requires significant efforts and plays a significant
role in hydrological, climatological, and agronomical balance.
Sensors 2018,18, 2674 16 of 29
Table 10. Livestock: animal welfare.
Author Animal Species Observed Features Functionality Models/Algorithms Results
[98] Cattle
Features like grazing,
ruminating, resting,
and walking, which were
recorded using collar
systems with three-axis
accelerometer and
Classification of
cattle behaviour
EL/Bagging with
tree learner 96% accuracy
[99] Calf
Data: chewing signals
from dietary supplement,
Tifton hay, ryegrass,
rumination, and idleness.
Signals were collected
from optical FBG sensors
Identification and
classification of chewing
patterns in calves
DT/C4.5 94% accuracy
[100] Pigs 3D motion data by using
two depth cameras
Animal tracking and
behavior annotation of the
pigs to measure
behavioral changes in
pigs for welfare and
health monitoring
BM: Gaussian Mixture
Models (GMMs)
Animal tracking: mean multi-object tracking
precision (MOTP) = 0.89 accuracy behavior
annotation: standing: control R2= 0.94,
treatment R2= 0.97 feeding: control
R2= 0.86, treatment R2= 0.49
Sensors 2018,18, 2674 17 of 29
Table 11. Livestock: livestock production table.
Author Animal Species Observed Features Functionality Models/Algorithms Results
[101] Cattle Milk fatty acids
Prediction of rumen
fermentation pattern from
milk fatty acids
RMSE = 2.65%
Propionate: RMSE = 7.67%
Butyrate: RMSE = 7.61%
[102] Hens
Six (6) features, which
were created from
mathematical models
related to farm’s egg
production line and
collected over a period
of seven (7) years.
Early detection and
warning of problems in
production curves of
commercial hens eggs
SVM 98% accuracy
[103] Bovine
Geometrical relationships
of the trajectories of
weights along the time
Estimation of cattle
weight trajectories for
future evolution with only
one or a few weights.
Angus bulls from Indiana Beef Evaluation Program:
weights 1, MAPE = 3.9 + 3.0%
Bulls from Association of Breeder of Asturiana de
los Valles: weights 1, MAPE = 5.3 + 4.4%
Cow from Wokalup Selection Experiment in
Western Australia: weights 1, MAPE = 9.3 + 6.7%
[104] Cattle
Zoometric measurements
of the animals 2 to 222
days before the slaughter
Prediction of carcass
weight for beef cattle 150
days before the slaughter
SVM/SVR Average MAPE = 4.27%
[105] Pigs 1553 color images with
pigs faces Pigs face recognition DNNs: Convolutional
Neural Networks (CNNs) 96.7% Accuracy
Sensors 2018,18, 2674 18 of 29
This section consists of four studies that were mostly developed for the estimation of daily, weekly,
or monthly evapotranspiration. The accurate estimation of evapotranspiration is a complex process
and is of a high importance for resource management in crop production, as well as for the design
and the operation management of irrigation systems. In another study [
], the authors developed a
computational method for the estimation of monthly mean evapotranspiration for arid and semi-arid
regions. It used monthly mean climatic data of 44 meteorological stations for the period 1951–2010.
In another study dedicated to ML applications on agricultural water management [
], two scenarios
were presented for the estimation of the daily evapotranspiration from temperature data collected
from six meteorological stations of a region during the long period (i.e., 1961–2014). Finally, in another
study [
], authors developed a method based on ELM model fed with temperature data for the
weekly estimation of evapotranspiration for two meteorological weather stations. The purpose was
the accurate estimation of weekly evapotranspiration in arid regions of India based on limited data
scenario for crop water management.
Daily dew point temperature, on the other hand, is a significant element for the identification of
expected weather phenomena, as well as for the estimation of evapotranspiration and evaporation.
In another article [
], a model is presented for the prediction of daily dew point temperature, based
on ML. The weather data were collected from two different weather stations.
Table 12 summarizes the above papers for the case of the water management sub-category.
3.4. Soil Management
The final category of this review concerns ML application on prediction-identification of
agricultural soil properties, such as the estimation of soil drying, condition, temperature, and moisture
content. Soil is a heterogeneous natural resource, with complex processes and mechanisms that are
difficult to understand. Soil properties allow researchers to understand the dynamics of ecosystems
and the impingement in agriculture. The accurate estimation of soil conditions can lead to improved
soil management. Soil temperature alone plays a significant role for the accurate analysis of the
climate change effects of a region and eco-environmental conditions. It is a significant meteorological
parameter controlling the interactive processes between ground and atmosphere. In addition, soil
moisture has an important role for crop yield variability. However, soil measurements are generally
time-consuming and expensive, so a low cost and reliable solution for the accurate estimation of soil
can be achieved with the usage of computational analysis based on ML techniques. The first study for
this last sub-category is the work of [
]. More specifically, this study presented a method for the
evaluation of soil drying for agricultural planning. The method accurately evaluates the soil drying,
with evapotranspiration and precipitation data, in a region located in Urbana, IL of the United States.
The goal of this method was the provision of remote agricultural management decisions. The second
study [
] was developed for the prediction of soil condition. In particular, the study presented the
comparison of four regression models for the prediction of soil organic carbon (OC), moisture content
(MC), and total nitrogen (TN). More specifically, the authors used a visible-near infrared (VIS-NIR)
spectrophotometer to collect soil spectra from 140 unprocessed and wet samples of the top layer of
Luvisol soil types. The samples were collected from an arable field in Premslin, Germany in August
2013, after the harvest of wheat crops. They concluded that the accurate prediction of soil properties
can optimize soil management. In a third study [
], the authors developed a new method based on a
self adaptive evolutionary-extreme learning machine (SaE-ELM) model and daily weather data for
the estimation of daily soil temperature at six different depths of 5, 10, 20, 30, 50, and 100 cm in two
different in climate conditions regions of Iran; Bandar Abbas and Kerman. The aim was the accurate
estimation of soil temperature for agricultural management. The last study [
] presented a novel
method for the estimation of soil moisture, based on ANN models using data from force sensors on a
no-till chisel opener.
Table 13 summarizes the above papers for the case of soil management sub-category.
Sensors 2018,18, 2674 19 of 29
Table 12. Water: Water management table.
Author Property Observed Features Functionality Models/Algorithms Results
[106] Evapotranspiration
Data such as maximum,
minimum, and mean
temperature; relative
humidity; solar radiation;
and wind speed
Estimation of monthly
mean reference
evapotranspiration arid
and semi-arid regions
MAE = 0.05
RMSE = 0.07
R = 0.9999
[107] Evapotranspiration
Temperature data:
maximum and minimum
temperature, air
temperature at 2 m height,
mean relative humidity,
wind speed at 10 m height,
and sunshine duration
Estimation of daily
evapotranspiration for
two scenarios (six regional
meteorological stations).
Scenario A: Models
trained and tested from
local data of each Station
(2). Scenario B: Models
trained from pooled data
from all stations
Scenario A: RRMSE = 0.198 MAE =
0.267 mm d1NS = 0.891
Scenario B: RRMSE = 0.194 MAE =
0.263 mm d1NS = 0.895
[108] Evapotranspiration
Locally maximum and
minimum air temperature,
extraterrestrial radiation,
and extrinsic
Estimation of weekly
evapotranspiration based
on data from two
weather stations
ANN/ELM Station A: RMSE = 0.43 mm d1
Station B: RMSE = 0.33 mm d1
[109]Daily dew point
Weather data such as
average air temperature,
relative humidity,
atmospheric pressure,
vapor pressure,
and horizontal global
solar radiation
Prediction of daily dew
point temperature ANN/ELM
Region case A:
MABE = 0.3240 C
RMSE = 0.5662 C
R = 0.9933
Region case B:
MABE = 0.5203 C
RMSE = 0.6709 C
R = 0.9877
Sensors 2018,18, 2674 20 of 29
Table 13. Soil management table.
Author Property Observed Features Functionality Models/Algorithms Results
[110] Soil drying
Precipitation and
evapotranspiration data
Evaluation of soil drying
for agricultural planning IBM/KNN and ANN/BP Both performed with 91–94% accuracy
[111] Soil condition 140 soil samples from top
soil layer of an arable field
Prediction of soil OC, MC,
and TN
OC: RMSEP = 0.062% & RPD = 2.20 (LS-SVM)
MC: RMSEP = 0.457% & RPD = 2.24 (LS-SVM)
TN: RMSEP = 0.071% & RPD = 1.96 (Cubist)
[112] Soil temperature
Daily weather data:
maximum, minimum,
and average air
temperature; global solar
radiation; and
atmospheric pressure.
Data were collected for
the period of 1996–2005
for Bandar Abbas and for
the period of 1998–2004
for Kerman
Estimation of soil
temperature for six (6)
different depths 5, 10, 20,
30, 50, and 100 cm, in two
different in climate
conditions Iranian regions;
Bandar Abbas and
Bandar Abbas station:
MABE = 0.8046 to 1.5338 C
RMSE = 1.0958 to 1.9029 C
R = 0.9084 to 0.9893
Kerman station:
MABE = 1.5415 to 2.3422 C
RMSE = 2.0017 to 2.9018 C
R = 0.8736 to 0.9831 depending on the depth
[113] Soil moisture
Dataset of forces acting on
a chisel and speed
Estimation of soil
moisture ANN/MLP and RBF
RMSE = 1.27%
R2= 0.79
APE = 3.77%
RMSE = 1.30%
R2= 0.80
APE = 3.75%
Sensors 2018,18, 2674 21 of 29
4. Discussion and Conclusions
The number of articles included in this review was 40 in total. Twenty-five (25) of the presented
articles were published in the journal «Computer and Electronics in Agriculture», six were published
in the journal of «Biosystems Engineering», and the rest of the articles were published to the
following journals: «Sensors», «Sustainability», «Real-Time Imagining», «Precision Agriculture»,
«Earth Observations and Remote Sensing», «Saudi Journal of Biological Sciences», «Scientific Reports»,
and «Computers in Industry». Among the articles, eight of them are related to applications of ML
in livestock management, four articles are related to applications of ML in water management, four
are related to soil management, while the largest number of them (i.e., 24 articles) are related to
applications of ML in crop management. Figure 2presents the distribution of the articles according to
these application domains and to the defined sub-categories.
Sensors2018,18,xFORPEERREVIEW 23of31
Figure 2. Pie chart presenting the papers according to the application domains.
From the analysis of these articles, it was found that eight ML models have been implemented in
total. More specifically, five ML models were implemented in the approaches on crop management,
where the most popular models were ANNs (with most frequent crop at hand—wheat). In livestock
management category, four ML models were implemented, with most popular models being SVMs
(most frequent livestock type at hand—cattle). For water management in particular evapotranspiration
estimation, two ML models were implemented and the most frequently implemented were ANNs.
Finally, in the soil management category, four ML models were implemented, with the most popular
one again being the ANN model. In Figure 3, the eight ML models with their total rates are presented,
and in Figure 4and Table 14, the ML models for all studies according to the sub-category are
Sensors 2018,18, 2674 22 of 29
presented. Finally, in Figure 5and Table 15, the future techniques that were used according to
each sub-category are presented (it is noting that the figure and table provide the same information in
different demonstration purposes).
Sensors2018,18,xFORPEERREVIEW 24of31
Figure 3. Presentation of machine learning (ML) models with their total rate.
Sensors2018,18,xFORPEERREVIEW 24of31
Figure 4.
The total number of ML models according to each sub-category of the four main categories.
Sensors 2018,18, 2674 23 of 29
Table 14.
The total number of ML models according to each sub-category of the four main categories.
ML Models Per Section
Crop Livestock Water Soil
Model Yield
models 1 1
3 3 1 3 3 1
learning 1 1
Artificial &
3 6 2 1 2 4 4
Regression 1 1
trees 1
Clustering 1 1
Total 8 9 4 4 1 3 5 5 7
Sensors2018,18,xFORPEERREVIEW 25of31
models1 1 
machines3313 31
learning   11 
Regression       11
models        1
Decisiontrees     1 
Clustering11   
431 111 
NIR111 
NDVI1 
Spectral22 
Hyperspectral412  
Fluoresence2  
Figure 5.
Data resources usage according to each sub-category. NDVI—normalized difference
vegetation index; NIR—near infrared.
Table 15. Data resources usage according to each sub-category.
Feature Collection
Crop Livestock Water Soil
Technique Yield
images and
4 3 1 1 1 1
NIR 1 1 1
Data records 2 2 1 2 4 4 4
Spectral 2 2
4 1 2
Fluoresence 2
Sensors 2018,18, 2674 24 of 29
From the above figures and tables, we show that ML models have been applied in multiple
applications for crop management (61%); mostly yield prediction (20%) and disease detection (22%).
This trend in the applications distribution reflects the data intense applications within crop and
high use of images (spectral, hyperspectral, NIR, etc.). Data analysis, as a mature scientific field,
provides the ground for the development of numerous applications related to crop management
because, in most cases, ML-based predictions can be extracted without the need for fusion of data from
other resources. In contrast, when data recordings are involved, occasionally at the level of big data,
the implementations of ML are less in number, mainly because of the increased efforts required for
the data analysis task and not for the ML models per se. This fact partially explains the almost equal
distribution of ML applications in livestock management (19%), water management (10%), and soil
management (10%). It is also evident from the analysis that most of the studies used ANN and SVM
ML models. More specifically, ANNs were used mostly for implementations in crop, water, and soil
management, while SVMs were used mostly for livestock management.
By applying machine learning to sensor data, farm management systems are evolving into real
artificial intelligence systems, providing richer recommendations and insights for the subsequent
decisions and actions with the ultimate scope of production improvement. For this scope, in the future,
it is expected that the usage of ML models will be even more widespread, allowing for the possibility
of integrated and applicable tools. At the moment, all of the approaches regard individual approaches
and solutions and are not adequately connected with the decision-making process, as seen in other
application domains. This integration of automated data recording, data analysis, ML implementation,
and decision-making or support will provide practical tolls that come in line with the so-called
knowledge-based agriculture for increasing production levels and bio-products quality.
Author Contributions:
Writing-Original Draft Preparation, K.G.L., D.B. and P.B.; Methodology, D.M., S.P.
and P.B.; Investigation, K.G.L. and D.M.; Conceptualization D.B. and D.M.; Writing-Review & Editing, S.P.;
Supervision, D.B.
Funding: This review work was partly supported by the project “Research Synergy to address major challenges
in the nexus: energy–environment–agricultural production (Food, Water, Materials)”—NEXUS, funded by the
Greek Secretariat for Research and Technology (GSRT)—Pr. No. MIS 5002496.
Conflicts of Interest: The authors declare no conflict of interest.
Samuel, A.L. Some Studies in Machine Learning Using the Game of Checkers. IBM J. Res. Dev.
206–226. [CrossRef]
Kong, L.; Zhang, Y.; Ye, Z.Q.; Liu, X.Q.; Zhao, S.Q.; Wei, L.; Gao, G. CPC: Assess the protein-coding potential
of transcripts using sequence features and support vector machine. Nucleic Acids Res.
,35, 345–349.
[CrossRef] [PubMed]
Mackowiak, S.D.; Zauber, H.; Bielow, C.; Thiel, D.; Kutz, K.; Calviello, L.; Mastrobuoni, G.; Rajewsky, N.;
Kempa, S.; Selbach, M.; et al. Extensive identification and analysis of conserved small ORFs in animals.
Genome Biol. 2015,16, 179. [CrossRef] [PubMed]
Richardson, A.; Signor, B.M.; Lidbury, B.A.; Badrick, T. Clinical chemistry in higher dimensions:
Machine-learning and enhanced prediction from routine clinical chemistry data. Clin. Biochem.
49, 1213–1220. [CrossRef] [PubMed]
Wildenhain, J.; Spitzer, M.; Dolma, S.; Jarvik, N.; White, R.; Roy, M.; Griffiths, E.; Bellows, D.S.; Wright, G.D.;
Tyers, M. Prediction of Synergism from Chemical-Genetic Interactions by Machine Learning. Cell Syst.
1, 383–395. [CrossRef] [PubMed]
Kang, J.; Schwartz, R.; Flickinger, J.; Beriwal, S. Machine learning approaches for predicting radiation therapy
outcomes: A clinician’s perspective. Int. J. Radiat. Oncol. Biol. Phys.
,93, 1127–1135. [CrossRef]
Asadi, H.; Dowling, R.; Yan, B.; Mitchell, P. Machine learning for outcome prediction of acute ischemic stroke
post intra-arterial therapy. PLoS ONE 2014,9, e88225. [CrossRef] [PubMed]
Sensors 2018,18, 2674 25 of 29
Zhang, B.; He, X.; Ouyang, F.; Gu, D.; Dong, Y.; Zhang, L.; Mo, X.; Huang, W.; Tian, J.; Zhang, S. Radiomic
machine-learning classifiers for prognostic biomarkers of advanced nasopharyngeal carcinoma. Cancer Lett.
2017,403, 21–27. [CrossRef] [PubMed]
Cramer, S.; Kampouridis, M.; Freitas, A.A.; Alexandridis, A.K. An extensive evaluation of seven machine
learning methods for rainfall prediction in weather derivatives. Expert Syst. Appl.
,85, 169–181.
Rhee, J.; Im, J. Meteorological drought forecasting for ungauged areas based on machine learning: Using
long-range climate forecast and remote sensing data. Agric. For. Meteorol.
,237–238, 105–122. [CrossRef]
Aybar-Ruiz, A.; Jiménez-Fernández, S.; Cornejo-Bueno, L.; Casanova-Mateo, C.; Sanz-Justo, J.;
Salvador-González, P.; Salcedo-Sanz, S. A novel Grouping Genetic Algorithm-Extreme Learning Machine
approach for global solar radiation prediction from numerical weather models inputs. Sol. Energy
129–142. [CrossRef]
Barboza, F.; Kimura, H.; Altman, E. Machine learning models and bankruptcy prediction. Expert Syst. Appl.
2017,83, 405–417. [CrossRef]
Zhao, Y.; Li, J.; Yu, L. A deep learning ensemble approach for crude oil price forecasting. Energy Econ.
66, 9–16. [CrossRef]
Bohanec, M.; Kljaji´c Borštnar, M.; Robnik-Šikonja, M. Explaining machine learning models in sales predictions.
Expert Syst. Appl. 2017,71, 416–428. [CrossRef]
Takahashi, K.; Kim, K.; Ogata, T.; Sugano, S. Tool-body assimilation model considering grasping motion
through deep learning. Rob. Auton. Syst. 2017,91, 115–127. [CrossRef]
Gastaldo, P.; Pinna, L.; Seminara, L.; Valle, M.; Zunino, R. A tensor-based approach to touch modality
classification by using machine learning. Rob. Auton. Syst. 2015,63, 268–278. [CrossRef]
López-Cortés, X.A.; Nachtigall, F.M.; Olate, V.R.; Araya, M.; Oyanedel, S.; Diaz, V.; Jakob, E.;
Ríos-Momberg, M.; Santos, L.S. Fast detection of pathogens in salmon farming industry. Aquaculture
2017,470, 17–24. [CrossRef]
Zhou, C.; Lin, K.; Xu, D.; Chen, L.; Guo, Q.; Sun, C.; Yang, X. Near infrared computer vision and neuro-fuzzy
model-based feeding decision system for fish in aquaculture. Comput. Electron. Agric.
,146, 114–124.
Fragni, R.; Trifirò, A.; Nucci, A.; Seno, A.; Allodi, A.; Di Rocco, M. Italian tomato-based products
authentication by multi-element approach: A mineral elements database to distinguish the domestic
provenance. Food Control 2018,93, 211–218. [CrossRef]
Maione, C.; Barbosa, R.M. Recent applications of multivariate data analysis methods in the authentication of
rice and the most analyzed parameters: A review. Crit. Rev. Food Sci. Nutr.
, 1–12. [CrossRef] [PubMed]
Fang, K.; Shen, C.; Kifer, D.; Yang, X. Prolongation of SMAP to Spatiotemporally Seamless Coverage of
Continental U.S. Using a Deep Learning Neural Network. Geophys. Res. Lett.
,44, 11030–11039.
Pearson, K. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag.
J. Sci. 1901,2, 559–572. [CrossRef]
Wold, H. Partial Least Squares. In Encyclopedia of Statistical Sciences; John Wiley & Sons: Chichester, NY, USA,
1985; Volume 6, pp. 581–591, ISBN 9788578110796.
Fisher, R.A. The use of multiple measures in taxonomic problems. Ann. Eugen.
,7, 179–188. [CrossRef]
25. Cox, D.R. The Regression Analysis of Binary Sequences. J. R. Stat. Soc. Ser. B 1958,20, 215–242. [CrossRef]
26. Efroymson, M.A. Multiple regression analysis. Math. Methods Digit. Comput. 1960,1, 191–203. [CrossRef]
Craven, B.D.; Islam, S.M.N. Ordinary least-squares regression. SAGE Dict. Quant. Manag. Res.
, 224–228.
28. Friedman, J.H. Multivariate Adaptive Regression Splines. Ann. Stat. 1991,19, 1–67. [CrossRef]
29. Quinlan, J.R. Learning with continuous classes. Mach. Learn. 1992,92, 343–348.
Cleveland, W.S. Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc.
829–836. [CrossRef]
Tryon, R.C. Communality of a variable: Formulation by cluster analysis. Psychometrika
,22, 241–260.
32. Lloyd, S.P. Least Squares Quantization in PCM. IEEE Trans. Inf. Theory 1982,28, 129–137. [CrossRef]
33. Johnson, S.C. Hierarchical clustering schemes. Psychometrika 1967,32, 241–254. [CrossRef] [PubMed]
Sensors 2018,18, 2674 26 of 29
Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm.
J. R. Stat. Soc. Ser. B Methodol. 1977,39, 1–38. [CrossRef]
Russell, S.J.; Norvig, P. Artificial Intelligence: A Modern Approach; Prentice Hall: Upper Saddle River, NJ, USA,
1995; Volume 9, ISBN 9780131038059.
36. Pearl, J. Probabilistic Reasoning in Intelligent Systems. Morgan Kauffmann San Mateo 1988,88, 552.
Duda, R.O.; Hart, P.E. Pattern Classification and Scene Analysis; Wiley: Hoboken, NJ, USA, 1973; Volume 7,
ISBN 0471223611.
38. Neapolitan, R.E. Models for reasoning under uncertainty. Appl. Artif. Intell. 1987,1, 337–366. [CrossRef]
Fix, E.; Hodges, J.L. Discriminatory Analysis–Nonparametric discrimination consistency properties.
Int. Stat. Rev. 1951,57, 238–247. [CrossRef]
Atkeson, C.G.; Moorey, A.W.; Schaalz, S.; Moore, A.W.; Schaal, S. Locally Weighted Learning. Artif. Intell.
1997,11, 11–73. [CrossRef]
41. Kohonen, T. Learning vector quantization. Neural Netw. 1988,1, 303. [CrossRef]
Belson, W.A. Matching and Prediction on the Principle of Biological Classification. Appl. Stat.
,8, 65–75.
Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Routledge: Abingdon,
UK, 1984; Volume 19, ISBN 0412048418.
Kass, G.V. An Exploratory Technique for Investigating Large Quantities of Categorical Data. Appl. Stat.
29, 119. [CrossRef]
Quinlan, J.R. C4.5: Programs for Machine Learning; Morgan Kaufmann Publishers Inc.: San Francisco, CA,
USA, 1992; Volume 1, ISBN 1558602380.
46. McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys.
1943,5, 115–133. [CrossRef]
Broomhead, D.S.; Lowe, D. Multivariable Functional Interpolation and Adaptive Networks. Complex Syst.
1988,2, 321–355.
Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain.
Psychol. Rev. 1958,65, 386–408. [CrossRef] [PubMed]
49. Linnainmaa, S. Taylor expansion of the accumulated rounding error. BIT 1976,16, 146–160. [CrossRef]
Riedmiller, M.; Braun, H. A direct adaptive method for faster backpropagation learning: The RPROP
algorithm. In Proceedings of the IEEE International Conference on Neural Networks, San Francisco, CA,
USA, 28 March–1 April 1993; pp. 586–591. [CrossRef]
51. Hecht-Nielsen, R. Counterpropagation networks. Appl. Opt. 1987,26, 4979–4983. [CrossRef] [PubMed]
Jang, J.S.R. ANFIS: Adaptive-Network-Based Fuzzy Inference System. IEEE Trans. Syst. Man Cybern.
23, 665–685. [CrossRef]
Melssen, W.; Wehrens, R.; Buydens, L. Supervised Kohonen networks for classification problems.
Chemom. Intell. Lab. Syst. 2006,83, 99–113. [CrossRef]
Hopfield, J.J. Neural networks and physical systems with emergent collective computational abilities.
Proc. Natl. Acad. Sci. USA 1982,79, 2554–2558. [CrossRef] [PubMed]
Pal, S.K.; Mitra, S. Multilayer Perceptron, Fuzzy Sets, and Classification. IEEE Trans. Neural Netw.
683–697. [CrossRef] [PubMed]
56. Kohonen, T. The Self-Organizing Map. Proc. IEEE 1990,78, 1464–1480. [CrossRef]
Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K. Extreme learning machine: Theory and applications. Neurocomputing
2006,70, 489–501. [CrossRef]
Specht, D.F. A general regression neural network. IEEE Trans. Neural Netw.
,2, 568–576. [CrossRef]
Cao, J.; Lin, Z.; Huang, G. Bin Self-adaptive evolutionary extreme learning machine. Neural Process. Lett.
2012,36, 285–305. [CrossRef]
60. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015,521, 436–444. [CrossRef] [PubMed]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; pp. 216–261.
62. Salakhutdinov, R.; Hinton, G. Deep Boltzmann Machines. Aistats 2009,1, 448–455. [CrossRef]
Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.-A. Stacked Denoising Autoencoders: Learning
Useful Representations in a Deep Network with a Local Denoising Criterion Pierre-Antoine Manzagol.
J. Mach. Learn. Res. 2010,11, 3371–3408. [CrossRef]
Sensors 2018,18, 2674 27 of 29
64. Vapnik, V. Support vector machine. Mach. Learn. 1995,20, 273–297.
Suykens, J.A.K.; Vandewalle, J. Least Squares Support Vector Machine Classifiers. Neural Process. Lett.
9, 293–300. [CrossRef]
Chang, C.; Lin, C. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol.
1–39. [CrossRef]
Smola, A. Regression Estimation with Support Vector Learning Machines. Master’s Thesis, The Technical
University of Munich, Munich, Germany, 1996; pp. 1–78.
Suykens, J.A.K.; Van Gestel, T.; De Brabanter, J.; De Moor, B.; Vandewalle, J. Least Squares Support Vector
Machines; World Scientific: Singapore, 2002; ISBN 9812381511.
Galvão, R.K.H.; Araújo, M.C.U.; Fragoso, W.D.; Silva, E.C.; José, G.E.; Soares, S.F.C.; Paiva, H.M. A variable
elimination method to improve the parsimony of MLR models using the successive projections algorithm.
Chemom. Intell. Lab. Syst. 2008,92, 83–91. [CrossRef]
70. Breiman, L. Random Forests. Mach. Learn. 2001,45, 5–32. [CrossRef]
Schapire, R.E. A brief introduction to boosting. In Proceedings of the IJCAI International Joint Conference
on Artificial Intelligence, Stockholm, Sweden, 31 July–6 August 1999; Volume 2, pp. 1401–1406.
Freund, Y.; Schapire, R.E. Experiments with a New Boosting Algorithm. In Proceedings of the Thirteenth
International Conference on International Conference on Machine Learning, Bari, Italy, 3–6 July 1996; Morgan
Kaufmann Publishers Inc.: San Francisco, CA, USA, 1996; pp. 148–156.
73. Breiman, L. Bagging Predictors. Mach. Learn. 1996,24, 123–140. [CrossRef]
Ramos, P.J.; Prieto, F.A.; Montoya, E.C.; Oliveros, C.E. Automatic fruit count on coffee branches using
computer vision. Comput. Electron. Agric. 2017,137, 9–22. [CrossRef]
Amatya, S.; Karkee, M.; Gongal, A.; Zhang, Q.; Whiting, M.D. Detection of cherry tree branches with full
foliage in planar architecture for automated sweet-cherry harvesting. Biosyst. Eng.
,146, 3–15. [CrossRef]
Sengupta, S.; Lee, W.S. Identification and determination of the number of immature green citrus fruit in a
canopy under different ambient light conditions. Biosyst. Eng. 2014,117, 51–61. [CrossRef]
Ali, I.; Cawkwell, F.; Dwyer, E.; Green, S. Modeling Managed Grassland Biomass Estimation by Using
Multitemporal Remote Sensing Data—A Machine Learning Approach. IEEE J. Sel. Top. Appl. Earth Obs.
Remote Sens. 2016,10, 3254–3264. [CrossRef]
Pantazi, X.-E.; Moshou, D.; Alexandridis, T.K.; Whetton, R.L.; Mouazen, A.M. Wheat yield prediction using
machine learning and advanced sensing techniques. Comput. Electron. Agric. 2016,121, 57–65. [CrossRef]
Senthilnath, J.; Dokania, A.; Kandukuri, M.; Ramesh, K.N.; Anand, G.; Omkar, S.N. Detection of tomatoes
using spectral-spatial methods in remotely sensed RGB images captured by UAV. Biosyst. Eng.
16–32. [CrossRef]
Su, Y.; Xu, H.; Yan, L. Support vector machine-based open crop model (SBOCM): Case of rice production in
China. Saudi J. Biol. Sci. 2017,24, 537–547. [CrossRef] [PubMed]
Kung, H.-Y.; Kuo, T.-H.; Chen, C.-H.; Tsai, P.-Y. Accuracy Analysis Mechanism for Agriculture Data Using
the Ensemble Neural Network Method. Sustainability 2016,8, 735. [CrossRef]
Pantazi, X.E.; Tamouridou, A.A.; Alexandridis, T.K.; Lagopodi, A.L.; Kontouris, G.; Moshou, D.
Detection of Silybum marianum infection with Microbotryum silybum using VNIR field spectroscopy.
Comput. Electron. Agric. 2017,137, 130–137. [CrossRef]
Ebrahimi, M.A.; Khoshtaghaza, M.H.; Minaei, S.; Jamshidi, B. Vision-based pest detection based on SVM
classification method. Comput. Electron. Agric. 2017,137, 52–58. [CrossRef]
Chung, C.L.; Huang, K.J.; Chen, S.Y.; Lai, M.H.; Chen, Y.C.; Kuo, Y.F. Detecting Bakanae disease in rice
seedlings by machine vision. Comput. Electron. Agric. 2016,121, 404–411. [CrossRef]
Pantazi, X.E.; Moshou, D.; Oberti, R.; West, J.; Mouazen, A.M.; Bochtis, D. Detection of biotic and abiotic
stresses in crops by using hierarchical self organizing classifiers. Precis. Agric.
,18, 383–393. [CrossRef]
Moshou, D.; Pantazi, X.-E.; Kateris, D.; Gravalos, I. Water stress detection based on optical multisensor
fusion with a least squares support vector machine classifier. Biosyst. Eng. 2014,117, 15–22. [CrossRef]
Moshou, D.; Bravo, C.; West, J.; Wahlen, S.; McCartney, A.; Ramon, H. Automatic detection of “yellow rust”
in wheat using reflectance measurements and neural networks. Comput. Electron. Agric.
,44, 173–188.
Sensors 2018,18, 2674 28 of 29
Moshou, D.; Bravo, C.; Oberti, R.; West, J.; Bodria, L.; McCartney, A.; Ramon, H. Plant disease detection
based on data fusion of hyper-spectral and multi-spectral fluorescence imaging using Kohonen maps.
Real-Time Imaging 2005,11, 75–83. [CrossRef]
Moshou, D.; Bravo, C.; Wahlen, S.; West, J.; McCartney, A.; De Baerdemaeker, J.; Ramon, H. Simultaneous
identification of plant stresses and diseases in arable crops using proximal optical sensing and self-organising
maps. Precis. Agric. 2006,7, 149–164. [CrossRef]
Ferentinos, K.P. Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric.
2018,145, 311–318. [CrossRef]
Pantazi, X.E.; Tamouridou, A.A.; Alexandridis, T.K.; Lagopodi, A.L.; Kashefi, J.; Moshou, D. Evaluation of
hierarchical self-organising maps for weed mapping using UAS multispectral imagery.
Comput. Electron. Agric.
2017,139, 224–230. [CrossRef]
Pantazi, X.-E.; Moshou, D.; Bravo, C. Active learning system for weed species recognition based on
hyperspectral sensing. Biosyst. Eng. 2016,146, 193–202. [CrossRef]
Binch, A.; Fox, C.W. Controlled comparison of machine vision algorithms for Rumex and Urtica detection in
grassland. Comput. Electron. Agric. 2017,140, 123–138. [CrossRef]
Zhang, M.; Li, C.; Yang, F. Classification of foreign matter embedded inside cotton lint using short wave
infrared (SWIR) hyperspectral transmittance imaging. Comput. Electron. Agric.
,139, 75–90. [CrossRef]
Hu, H.; Pan, L.; Sun, K.; Tu, S.; Sun, Y.; Wei, Y.; Tu, K. Differentiation of deciduous-calyx and persistent-calyx
pears using hyperspectral reflectance imaging and multivariate analysis. Comput. Electron. Agric.
150–156. [CrossRef]
Maione, C.; Batista, B.L.; Campiglia, A.D.; Barbosa, F.; Barbosa, R.M. Classification of geographic origin of
rice by data mining and inductively coupled plasma mass spectrometry. Comput. Electron. Agric.
101–107. [CrossRef]
Grinblat, G.L.; Uzal, L.C.; Larese, M.G.; Granitto, P.M. Deep learning for plant identification using vein
morphological patterns. Comput. Electron. Agric. 2016,127, 418–424. [CrossRef]
Dutta, R.; Smith, D.; Rawnsley, R.; Bishop-Hurley, G.; Hills, J.; Timms, G.; Henry, D. Dynamic cattle
behavioural classification using supervised ensemble classifiers. Comput. Electron. Agric.
,111, 18–28.
Pegorini, V.; Karam, L.Z.; Pitta, C.S.R.; Cardoso, R.; da Silva, J.C.C.; Kalinowski, H.J.; Ribeiro, R.; Bertotti, F.L.;
Assmann, T.S.
In vivo
pattern classification of ingestive behavior in ruminants using FBG sensors and
machine learning. Sensors 2015,15, 28456–28471. [CrossRef] [PubMed]
Matthews, S.G.; Miller, A.L.; PlÖtz, T.; Kyriazakis, I. Automated tracking to measure behavioural changes in
pigs for health and welfare monitoring. Sci. Rep. 2017,7, 17582. [CrossRef] [PubMed]
Craninx, M.; Fievez, V.; Vlaeminck, B.; De Baets, B. Artificial neural network models of the rumen
fermentation pattern in dairy cattle. Comput. Electron. Agric. 2008,60, 226–238. [CrossRef]
Morales, I.R.; Cebrián, D.R.; Fernandez-Blanco, E.; Sierra, A.P. Early warning in egg production curves from
commercial hens: A SVM approach. Comput. Electron. Agric. 2016,121, 169–179. [CrossRef]
Alonso, J.; Villa, A.; Bahamonde, A. Improved estimation of bovine weight trajectories using Support Vector
Machine Classification. Comput. Electron. Agric. 2015,110, 36–41. [CrossRef]
Alonso, J.; Castañón, Á.R.; Bahamonde, A. Support Vector Regression to predict carcass weight in beef cattle
in advance of the slaughter. Comput. Electron. Agric. 2013,91, 116–120. [CrossRef]
Hansen, M.F.; Smith, M.L.; Smith, L.N.; Salter, M.G.; Baxter, E.M.; Farish, M.; Grieve, B. Towards on-farm pig
face recognition using convolutional neural networks. Comput. Ind. 2018,98, 145–152. [CrossRef]
Mehdizadeh, S.; Behmanesh, J.; Khalili, K. Using MARS, SVM, GEP and empirical equations for estimation
of monthly mean reference evapotranspiration. Comput. Electron. Agric. 2017,139, 103–114. [CrossRef]
Feng, Y.; Peng, Y.; Cui, N.; Gong, D.; Zhang, K. Modeling reference evapotranspiration using extreme learning
machine and generalized regression neural network only with temperature data.
Comput. Electron. Agric.
2017,136, 71–78. [CrossRef]
Patil, A.P.; Deka, P.C. An extreme learning machine approach for modeling evapotranspiration using extrinsic
inputs. Comput. Electron. Agric. 2016,121, 385–392. [CrossRef]
Mohammadi, K.; Shamshirband, S.; Motamedi, S.; Petkovi´c, D.; Hashim, R.; Gocic, M. Extreme learning
machine based prediction of daily dew point temperature. Comput. Electron. Agric.
,117, 214–225.
Sensors 2018,18, 2674 29 of 29
Coopersmith, E.J.; Minsker, B.S.; Wenzel, C.E.; Gilmore, B.J. Machine learning assessments of soil drying for
agricultural planning. Comput. Electron. Agric. 2014,104, 93–104. [CrossRef]
Morellos, A.; Pantazi, X.-E.; Moshou, D.; Alexandridis, T.; Whetton, R.; Tziotzios, G.; Wiebensohn, J.; Bill, R.;
Mouazen, A.M. Machine learning based prediction of soil total nitrogen, organic carbon and moisture
content by using VIS-NIR spectroscopy. Biosyst. Eng. 2016,152, 104–116. [CrossRef]
Nahvi, B.; Habibi, J.; Mohammadi, K.; Shamshirband, S.; Al Razgan, O.S. Using self-adaptive evolutionary
algorithm to improve the performance of an extreme learning machine for estimating soil temperature.
Comput. Electron. Agric. 2016,124, 150–160. [CrossRef]
Johann, A.L.; de Araújo, A.G.; Delalibera, H.C.; Hirakawa, A.R. Soil moisture modeling based on stochastic
behavior of forces on a no-till chisel opener. Comput. Electron. Agric. 2016,121, 420–428. [CrossRef]
2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (
... It draws on a range of other disciplines, such as mathematics, statistics, and computer science. ML algorithms have been successfully applied along with the disciplines in several industries, including agriculture, healthcare, marketing, and finance [4]- [6]. Adopting AI techniques in detecting and diagnosing diseases outperforms the conventional approach to diagnosis [7]- [16]. ...
... commonly used SVM algorithms are the support vector regression, least squares SVM, and successive projection algorithm-SVM [4]. SVMs are widely used in pattern recognition and classification and have been effectively used in various real-world problems [25]- [27]. ...
Full-text available
Infectious diseases are a group of medical conditions caused by infectious agents such as parasites, bacteria, viruses, or fungus. Patients who are undiagnosed may unwittingly spread the disease to others. Because of the transmission of these agents, epidemics, if not pandemics, are possible. Early detection can help to prevent the spread of an outbreak or put an end to it. Infectious disease prevention, early identification, and management can be aided by machine learning (ML) methods. The implementation of ML algorithms such as logistic regression, support vector machine, Naive Bayes, decision tree, random forest, K-nearest neighbor, artificial neural network, convolutional neural network, and ensemble techniques to automate the process of infectious disease diagnosis is investigated in this study. We examined a number of ML models for tuberculosis (TB), influenza, human immunodeficiency virus (HIV), dengue fever, COVID-19, cystitis, and nonspecific urethritis. Existing models have constraints in data handling concerns such data types, amount, quality, temporality, and availability. Based on the research, ensemble approaches, rather than a typical ML classifier, can be used to improve the overall performance of diagnosis. We highlight the need of having enough diverse data in the database to create a model or representation that closely mimics reality. © 2022, Institute of Advanced Engineering and Science. All rights reserved.
... ML algorithms have been used to improve livestock welfare; increase livestock production; improve yield prediction, crop management, disease detection, weed detection, crop quality improvement, and species distinction; and improve water and soil management [4,12,38,39]. ...
Full-text available
Sustainable agriculture is currently being challenged under climate change scenarios since extreme environmental processes disrupt and diminish global food production. For example, drought-induced increases in plant diseases and rainfall caused a decrease in food production. Machine Learning and Agricultural Big Data are high-performance computing technologies that allow analyzing a large amount of data to understand agricultural production. Machine Learning and Agricultural Big Data are high-performance computing technologies that allow the processing and analysis of large amounts of heterogeneous data for which intelligent IT and high-resolution remote sensing techniques are required. However, the selection of ML algorithms depends on the types of data to be used. Therefore, agricultural scientists need to understand the data and the sources from which they are derived. These data can be structured, such as temperature and humidity data, which are usually numerical (e.g., float); semi-structured, such as those from spreadsheets and information repositories, since these data types are not previously defined and are stored in No-SQL databases; and unstructured, such as those from files such as PDF, TIFF, and satellite images, since they have not been processed and therefore are not stored in any database but in repositories (e.g., Hadoop). This study provides insight into the data types used in Agricultural Big Data along with their main challenges and trends. It analyzes 43 papers selected through the protocol proposed by Kitchenham and Charters and validated with the PRISMA criteria. It was found that the primary data sources are Databases, Sensors, Cameras, GPS, and Remote Sensing, which capture data stored in Platforms such as Hadoop, Cloud Computing, and Google Earth Engine. In the future, Data Lakes will allow for data integration across different platforms, as they provide representation models of other data types and the relationships between them, improving the quality of the data to be integrated.
... In this context, studies on products such as corn, rice, wheat, soybeans and barley have been carried out to determine the applicability of computer vision in sensitive agriculture for the five most produced grain production in the world. In these studies, many articles have been presented in the last five years with different approaches regarding disease detection, grain quality and phenotyping [7]. ...
Full-text available
Automated classification of corn is important for corn sorting in intelligent agriculture. Corn classification process is a necessary and accurate process in many places in the world today. Correct corn classification is important to identify product quality and to distinguish good from bad. In this study, a hybrid model was proposed to classify the 3 corn species belonging to the Zea mays family. In the hybrid model, 12 different morphological features of corn were obtained. These morphological features were used for the classification process in the hybrid model created using machine learning (ML) algorithms. When morphological features were given as input to ML algorithms for normal classification, the test score was 96.66% for Decision Tree (DT), 97.32% for Random Forest (RF) and 96.66% for Naive Bayes (NB). With the proposed hybrid model, this rate has reached 100% test score in all three algorithms. Test processes were measured by statistical models. While Accuracy was 97.67% as a result of normal classification, this rate was 100% in hybrid model. The experimental results demonstrated the effectiveness of the proposed corn classification system.
... Deep learning methods are proved to be efficient in various fields of agriculture like identification of plant diseases [28], pest identification, severity estimation, weeds detection [23], etc. Two basic methods are there in identification of diseases: one is deep learning-based and another one is traditional handcrafted-based technique [13]. Kaur et al. [9] identified different diseases in soyabean such as downy mildew, frog eye, septoria leaf blight using different color and texture features and combination of color and texture features, three clustering techniques, and SVM classifier for classification of diseases. ...
Full-text available
Identification of plant diseases plays an important and challenging role in the protection of agricultural crops and also their quality. Several works are in progress to improve the existing leaf image-based disease identification using deep learning. In this paper, we have studied some of the existing plant disease identification techniques and proposed a novel plant disease identification model based on deep convolutional neural network (CNN) along with different ensemble classifiers. In our model, features used for classification are obtained using the Deep CNN model and classified using different classifiers such as Support Vector Machine (SVM), K Nearest Neighbor, Random Forest, Naive Bayes, and Logistic Regression (LR). The obtained results are compared with different existing deep learning classifiers. The result shows that the SVM and LR classifier outperforms some of the other pre-trained deep learning models in terms of accuracy, precision, and recall. It is also observed that using significantly less number of parameters, we have achieved better classification accuracy than some pre-trained deep learning models.
... From traditional hedge fund management firms to FinTech service providers, many financial firms are investing in data science and ML expertise [3]. ML has also made a significant contribution to the agriculture sector by creating new opportunities to unravel, quantify, and understand data intensive processes in agricultural operational environments [4]. ...
Full-text available
Federated Learning (FL) has emerged as a promising distributed learning paradigm with an added advantage of data privacy. With the growing interest in having collaboration among data owners, FL has gained significant attention of organizations. The idea of FL is to enable collaborating participants train machine learning (ML) models on decentralized data without breaching privacy. In simpler words, federated learning is the approach of ``bringing the model to the data, instead of bringing the data to the mode''. Federated learning, when applied to data which is partitioned vertically across participants, is able to build a complete ML model by combining local models trained only using the data with distinct features at the local sites. This architecture of FL is referred to as vertical federated learning (VFL), which differs from the conventional FL on horizontally partitioned data. As VFL is different from conventional FL, it comes with its own issues and challenges. In this paper, we present a structured literature review discussing the state-of-the-art approaches in VFL. Additionally, the literature review highlights the existing solutions to challenges in VFL and provides potential research directions in this domain.
The main bottleneck to accelerating the development of new sugarcane varieties with desirable traits to meet the demands of the sugar-energy sector and adaptation to climate change is the absence of high-throughput phenotyping methods for evaluating varieties in the field. Traditional methods of field phenotyping depend on trained specialists for visual evaluations that are slow, laborious, and subjective. In this study, we investigated UAV-based multispectral data and machine learning algorithms to improve efficiency in the evaluation of field phenotyping of sugarcane varieties regarding the resistance to infection by orange and brown rusts. Spectral data from five bands (Blue, Green, Red, Red-edge, and NIR) and 14 vegetation indices were tested in direct correlations with infection scores collected in the field for the two types of rust. Sugarcane varieties were classified according to their resistance to rusts using three machine learning algorithms (Random Forest, radial SVM, and KNN). Correlations between the Red band data and infection scores of the two types of rust were significant (r = 0.67) for evaluations made at 165 days after planting (DAP). Conversely, regarding the varietal classification into three resistance classes, a high level of overall (88.1%) and balanced (Resistant = 90.3, Moderately resistant = 88.6, and Susceptible = 82.1) accuracy was reached at 195 DAP with the radial SVM model. UAV-based multispectral data is able to assist in the phenotyping of new sugarcane varieties regarding the resistance to these diseases.
AI applications have significantly evolved over the past few years and have found its applications in almost every business sector. AI is making a huge impact in all domains of the industry. Every industry looking to automate certain jobs through the use of intelligent machinery. This paper reviews the work of numerous researchers to get a brief overview about the current implementation of automation in agriculture. The aim of this paper is to identify gaps within the agricultural literature, and gaps in AI guidelines, that may need to be addressed. Moreover, Artificial Intelligence is now a reality in the Higher Education sector as we have started experimenting with the technology and reaping the benefits from the same. But this reality is marginal as there is still a long way to go for AI in the context of development and application in the Education sector. Healthcare organizations in different specialties are also getting more interested in how artificial intelligence can make accurate readings and results to several biological reports and thus gain a better diagnosis of the disease. The aim of this review is to keep track of new scientific accomplishments, to understand the availability of technologies, to appreciate the tremendous potential of AI in biomedicine, and to provide researchers in related fields with inspiration. New progress and breakthroughs will continue to push the frontier and widen the scope of AI applications, and fast developments are envisioned in the foreseeable future.
Der Ökologische Landbau hat als Ziel die Biodiversität, die vor allem durch die Flora beeinflusst wird, zu stärken. In Verbindung mit dem Ertragsziel der landwirtschaftlichen Produktion ergibt sich dadurch die Frage, mit welchen Maßnahmen sowohl eine höhere Beikrautbiodiversität als auch die Ertragssicherung kombiniert werden können. Hierzu wurde in der Nähe von Osnabrück ein einjähriger Versuch mit ökologisch angebauter Wintertriticale angelegt. Der Versuch wurde als Split-Plot-Design mit 4 Blöcken und Variationen von Saattermin, Saatstärke und mechanischer Beikrautregulierung an zwei Terminen im Frühjahr, auch mit Hackvarianten in 25 cm Reihenabstand, angelegt und von März 2022 bis zur Ernte im Juli 2022 regelmäßig auf vegetative und später auch generative Triticaleparameter sowie auf das Auftreten verschiedener Beikrautarten und den Beikrautdeckungsgrad geprüft. Durch den frühen Saattermin am 18.10.2021 konnte verglichen mit dem späten Saattermin am 01.11.2021 aufgrund der längeren Wachstumsphase ein erhöhtes vegetatives Wachstum der Triticale erreicht werden. Ebenso wurde die Zahl der Beikrautindividuen und -arten gefördert, da sich insbesondere Herbstkeimer durch die zeitige Aussaat besser etablieren konnten. Eine erhöhte Konkurrenzwirkung konnte aufgrund von indifferentem Beikrautdeckungsgrad und gleichbleibender Beikrautbiomasse nicht festgestellt werden. Der Kornertrag wurde nicht beeinflusst, aber der Proteingehalt stieg um etwa 0,5 Prozentpunkte an. Entsprechend kann geschlussfolgert werden, dass die Beikrautbiodiversität durch die frühere Aussaat gefördert wurde, während der Strohertrag stieg, der Kornertrag gleichblieb und der Proteingehalt marginal sank. Die erhöhte Saatstärke führte wie zu erwarten zu einer höheren Bestandesdichte, die auch zu einem höheren Kulturdeckungsgrad führte. Die daraus resultierende höhere Konkurrenzkraft der Triticale führte zu weniger Beikrautbiomasse und weniger Beikrautindividuen sowie -arten. Allerdings konnte in puncto Kornertrag kein Einfluss der Saatstärke festgestellt werden, da die höhere Ährendichte bei höherer Saatstärke durch das Ährengewicht ausgeglichen wurde. Der Proteingehalt stieg sogar durch die reduzierte Saatstärke. Entsprechend kann die reduzierte Aussaatstärke empfohlen werden, da sie positiv für die Beikräuter, indifferent für Stroh- und Kornertrag und positiv für den Proteingehalt eingeschätzt wird. Darüber hinaus können Saatgutkosten eingespart werden. Den stärksten Einfluss auf Beikräuter und Triticale hatte allerdings die mechanische Beikrautregulierung. Das vegetative und generative Wachstum der einfachen Hackvariante in 25 cm Reihenabstand hat im Vergleich zur Striegelvariante deutliche Vorteile gezeigt. Auch der Proteingehalt und -ertrag konnte erhöht werden. Die intensive Hackvariante mit zusätzlichem Einsatz einer Rotary Hoe verhielt sich intermediär, da sie vermutlich die Kulturpflanze geschädigt hat. Dagegen wies die Striegelvariante im Vergleich zur intensiven Hackvariante die höchste Zahl an Beikrautarten auf, während allerdings auch Beikrautdeckungsgrad, -biomasse und -individuenzahl maximal waren. Die intensive Hackvariante wies dagegen das ausgeglichenste Beikrautartenverhältnis auf (geringe Dominanz, hohes SDI). Bezüglich der Beikrautparameter verhielt sich die normale Hackvariante intermediär. Bei der normalen Hackvariante konnten entsprechend sämtliche Ertragsparameter maximiert werden, während sich die Beikrautparameter nicht signifikant veränderten. Die Beikrautvielfalt und -abundanz tendierte zu sinken, während die Ausgeglichenheit der Beikrautarten zunahm und entsprechend dominante Arten, wie Acker-Frauenmantel (Aphanes arvensis L.) an Bedeutung verloren haben, was eine Zönose vorbeugt.
Full-text available
Identification of individual livestock such as pigs and cows has become a pressing issue in recent years as intensification practices continue to be adopted and precise objective measurements are required (e.g. weight). Current best practice involves the use of RFID tags which are time-consuming for the farmer and distressing for the animal to fit. To overcome this, non-invasive biometrics are proposed by using the face of the animal. We test this in a farm environment, on 10 individual pigs using three techniques adopted from the human face recognition literature: Fisherfaces, the VGG-Face pre-trained face convolutional neural network (CNN) model and our own CNN model that we train using an artificially augmented data set. Our results show that accurate individual pig recognition is possible with accuracy rates of 96.7% on 1553 images. Class Activated Mapping using Grad-CAM is used to show the regions that our network uses to discriminate between pigs.
Full-text available
Rice is one of the most important staple foods around the world. Authentication of rice is one of the most addressed concerns in the present literature, which includes recognition of its geographical origin and variety, certification of organic rice and many other issues. Good results have been achieved by multivariate data analysis and data mining techniques when combined with specific parameters for ascertaining authenticity and many other useful characteristics of rice, such as quality, yield and others. This paper brings a review of the recent research projects on discrimination and authentication of rice using multivariate data analysis and data mining techniques. We found that data obtained from image processing, molecular and atomic spectroscopy, elemental fingerprinting, genetic markers, molecular content and others are promising sources of information regarding geographical origin, variety and other aspects of rice, being widely used combined with multivariate data analysis techniques. Principal component analysis and linear discriminant analysis are the preferred methods, but several other data classification techniques such as support vector machines, artificial neural networks and others are also frequently present in some studies and show high performance for discrimination of rice.
Full-text available
Since animals express their internal state through behaviour, changes in said behaviour may be used to detect early signs of problems, such as in animal health. Continuous observation of livestock by farm staff is impractical in a commercial setting to the degree required to detect behavioural changes relevant for early intervention. An automated monitoring system is developed; it automatically tracks pig movement with depth video cameras, and automatically measures standing, feeding, drinking, and locomotor activities from 3D trajectories. Predictions of standing, feeding, and drinking were validated, but not locomotor activities. An artificial, disruptive challenge; i.e., introduction of a novel object, is used to cause reproducible behavioural changes to enable development of a system to detect the changes automatically. Validation of the automated monitoring system with the controlled challenge study provides a reproducible framework for further development of robust early warning systems for pigs. The automated system is practical in commercial settings because it provides continuous monitoring of multiple behaviours, with metrics of behaviours that may be considered more intuitive and have diagnostic validity. The method has the potential to transform how livestock are monitored, directly impact their health and welfare, and address issues in livestock farming, such as antimicrobial use.
Full-text available
The Soil Moisture Active Passive (SMAP) mission has delivered high-qualified and valuable sensing of surface soil moisture since 2015. However, its short time span, coarse resolution, and irregular revisit schedule have limited its use. Utilizing a state-of-the-art deep-in-time neural network, Long Short-Term Memory (LSTM), we created a system that predicts SMAP level-3 soil moisture data using climate forcing, model-simulated moisture, and static physical attributes as inputs. The system removes most of the bias with model simulations and also improves predicted moisture climatology, achieving a testing accuracy of 0.025 to 0.3 in many parts of Continental United States (CONUS). As the first application of LSTM in hydrology, we show that it is more robust than simpler methods in either temporal or spatial extrapolation tests. LSTM generalizes better across regions with distinct climates and physiography by synthesizing model simulations and environmental variables. With high fidelity to SMAP products, our data can aid various applications including data assimilation and weather forecasting.
Full-text available
We aimed to identify optimal machine-learning methods for radiomics-based prediction of local failure and distant failure in advanced nasopharyngeal carcinoma (NPC). We enrolled 110 patients with advanced NPC. A total of 970 radiomic features were extracted from MRI images for each patient. Six feature selection methods and nine classification methods were evaluated in terms of their performance. We applied the 10-fold cross-validation as the criterion for feature selection and classification. We repeated each combination for 50 times to obtain the mean area under the curve (AUC) and test error. We observed that the combination methods Random Forest (RF) + RF (AUC, 0.8464 ± 0.0069; test error, 0.3135 ± 0.0088) had the highest prognostic performance, followed by RF + Adaptive Boosting (AdaBoost) (AUC, 0.8204 ± 0.0095; test error, 0.3384 ± 0.0097), and Sure Independence Screening (SIS) + Linear Support Vector Machines (LSVM) (AUC, 0.7883 ± 0.0096; test error, 0.3985 ± 0.0100). Our radiomics study identified optimal machine-learning methods for the radiomics-based prediction of local failure and distant failure in advanced NPC, which could enhance the applications of radiomics in precision oncology and clinical practice.
It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: (1) the slow gradient-based learning algorithms are extensively used to train neural networks, and (2) all the parameters of the networks are tuned iteratively by using such learning algorithms. Unlike these conventional implementations, this paper proposes a new learning algorithm called extreme learning machine (ELM) for single-hidden layer feedforward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide good generalization performance at extremely fast learning speed. The experimental results based on a few artificial and real benchmark function approximation and classification problems including very large complex applications show that the new algorithm can produce good generalization performance in most cases and can learn thousands of times faster than conventional popular learning algorithms for feedforward neural networks.1
In this study, we propose a novel mineral elements database for the authentication of Italian processed tomato, able to discriminate the domestic provenance from the Chinese, US and Spanish ones. Multi-element analyses by Inductively Coupled Plasma orthogonal acceleration Time-of-Flight Mass Spectrometry (ICP-oa-TOF-MS) and Inductively Coupled Plasma Optical Emission Spectrometry (ICP-OES) were used for quantifying 26 mineral elements (Li, Be, Na, Mg, Al, K, Ca, V, Cr, Mn, Co, Cu, Zn, Ga, As, Rb, Sr, Ag, Cd, In, Cs, Ba, Tl, Pb, Bi and U) in 183 tomato-based samples of different origin (Italy, China, US and Spain) collected in three different years of production (2013, 2015 and 2017). Linear Discriminant Analysis (LDA) applied to 28 variables (single elements + elemental ratios) allowed excellent separation between Italian and non-Italian tomato samples. Three elemental ratios (Li/Cu, Co/Rb and Sr/Cd) resulted highly effective in identifying the domestic provenance of tomato (100% prediction ability of the model and 98.8% in cross-validation). This result highlighted that ratios between elements were more important than single elements in discrimination.
In aquaculture, the feeding efficiency of fish is of great significance for improving production and reducing costs. In recent years, automatic adjustments of the feeding amount based on the needs of the fish have become a developing trend. The purpose of this study was to achieve automatic feeding decision making based on the appetite of fish. In this study, a feeding control method based on near infrared computer vision and neuro-fuzzy model was proposed. The specific objectives of this study were as follows: (1) to develop an algorithm to extract an index that can describe and quantify the feeding behavior of fish in near infrared images, (2) to design an algorithm to realize feeding decision (continue or stop) during the feeding process, and (3) to evaluate the performance of the method. The specific implementation process of this study was as follows: (1) the quantitative index of feeding behavior (flocking level and snatching strength) was extracted by Delaunay Triangulation and image texture; (2) the adaptive network-based fuzzy inference system (ANFIS) was established based on fuzzy control rules and used to achieve automatically on-demand feeding; and (3) the performance of the method was evaluated by the specific growth rate, weight gain rate, feed conversion rate and water quality parameters. The results indicated that the feeding decision accuracy of the ANFIS model was 98%. In addition, compared with the feeding table, although this method did not present significant differences in promoting fish growth, the feed conversion rate (FCR) can be reduced by 10.77% and water pollution can also be reduced. This system provides an important contribution to realizing the real-time control of fish feeding processes and feeding decision on demand, and it lays a theoretical foundation for developing fine feeding equipment and guiding practice.
In this paper, convolutional neural network models were developed to perform plant disease detection and diagnosis using simple leaves images of healthy and diseased plants, through deep learning methodologies.Training of the models was performed with the use of an open database of 87,848 images, containing 25 different plants in a set of 58 distinct classes of [plant, disease] combinations, including healthy plants. Several model architectures were trained, with the best performance reaching a 99.53% success rate in identifying the corresponding [plant, disease] combination (or healthy plant). The significantly high success rate makes the model a very useful advisory or early warning tool, and an approach that could be further expanded to support an integrated plant disease identification system to operate in real cultivation conditions.
Automated robotic weeding of grassland will improve the productivity of dairy and sheep farms while helping to conserve their environments. Previous studies have reported results of machine vision methods to separate grass from grassland weeds but each use their own datasets and report only performance of their own algorithm, making it impossible to compare them. A definitive, large-scale independent study is presented of all major known grassland weed detection methods evaluated on a new standardised data set under a wider range of environment conditions. This allows for a fair, unbiased, independent and statistically significant comparison of these and future methods for the first time. We test features including linear binary patterns, BRISK, Fourier and Watershed; and classifiers including support vector machines, linear discriminants, nearest neighbour, and meta-classifier combinations. The most accurate method is found to use linear binary patterns together with a support vector machine.