ArticlePDF Available

Abstract

Data mining in agriculture is a relatively novel research field. Agriculture data are highly diversified in terms of nature, interdependency and use of resources for farming. The major problem of using data mining in agriculture is that to solve issues based on the available data and its meaningful outcomes. In data mining, clustering and classification technique make ingenious information in research and knowledge acquisition from integrated farming. And that producesbetter solution for the farmers about their cultivation (yield). Forthcoming data mining in agriculture rising research field in crop yield analysis. Different data mining techniques such as K-Means, K-Nearest Neighbor (KNN), Artificial Neural Networks (ANN) and Support Vector Machines (SVM) are used for new application research in agriculture field. In this paper, the techniques of predicting yield production of a crop are the centre of focus. Yield prediction is very important in agriculture. The problemof yield prediction can be solved by utilizing data mining techniques.
AGRES An International e. Journal (2017) Vol. 6, Issue 4:637-645 ISSN : 2277-9663
______________________________________________________________________________
www.arkgroup.co.in Page 637
DATA MINING TRENDS IN AGRICULTURE : A REVIEW
*PATEL, AMIKSHA A.1 AND KATHIRIYA, DHAVAL R.2
CENTER FOR AGRICULTURAL INFORMATION AND COMMUNICATION
TECHNOLOGY
SARDARKRUSHINAGAR DANTIWADA AGRICULTURAL UNIVERSITY
SARDARKRUSHINAGAR 385 506, GUJARAT, INDIA
*E-MAIL: amiksha_patel@yahoo.com
______________________________________________________________________________
1.Assistant Professor (Computer Science), Centre for Agricultural Information & Computer Technology, Sardarkrushinagar
Dantiwada Agricultural University, Sardarkrushinagar 385506 Dist. Banaskantha, Gujarat.
2.Director, Information Technology, Anand Agricultural University, Anand, Gujarat
ABSTRACT
Data mining in agriculture is a relatively novel research field. Agriculture data are
highly diversified in terms of nature, interdependency and use of resources for farming. The
major problem of using data mining in agriculture is that to solve issues based on the
available data and its meaningful outcomes. In data mining, clustering and classification
technique make ingenious information in research and knowledge acquisition from integrated
farming. And that produces better solution for the farmers about their cultivation (yield).
Forthcoming data mining in agriculture rising research field in crop yield analysis. Different
data mining techniques such as K-Means, K-Nearest Neighbor (KNN), Artificial Neural
Networks (ANN) and Support Vector Machines (SVM) are used for new application research
in agriculture field. In this paper, the techniques of predicting yield production of a crop are
the centre of focus. Yield prediction is very important in agriculture. The problem of yield
prediction can be solved by utilizing data mining techniques.
KEY WORDS: Artificial Neural Networks, Data mining, K-Means, K-Nearest Neighbour,
Support Vector Machines
INTRODUCTION
The Indian agriculture is highly
differentiated in terms of its climate, soil,
water, crops, horticultural crops, plantation
crops, medicinal crops, livestock, etc.
Today, India ranks second worldwide in
farm output. Agriculture is facing the
problem of changing in the resources that
are directly affecting to the crop yield, so the
agricultural productivities in India are
unpredictable. For balanced and sustainable
growth of agriculture, these resources need
to be evaluated, monitored and analysed, so
that proper methods can construct.
Accurate and reliable information
about crop yield prediction is important for
taking decisions for agricultural risk
management. Crop yield prediction is also
important for supply chain operation of
companies engaged in industries that use
agricultural produce as raw material
(Gleason, 1982). Livestock, food, animal
feed, chemical, poultry, fertilizer, pesticides,
seed, paper and many other industries uses
agricultural products as intergradient in their
production processes. An accurate estimate
of crop yield helps these companies in
planning supply chain decision like
AGRES An International e. Journal (2017) Vol. 6, Issue 4:637-645 ISSN : 2277-9663
______________________________________________________________________________
www.arkgroup.co.in Page 638
production scheduling and it is useful for
business such as seed, fertilizer,
agrochemical and agricultural machinery
industries and marketing activities based on
crop yield.
Data mining techniques till now used
widely in business and corporate sectors
may be used in agriculture for data
characterization, discrimination and
predictive and forecasting purposes. Some
use of data mining in soil characteristic
evaluation has already been attempted.
Different techniques have been
intended for mining data over the years. The
most used data mining techniques are
discussed in this paper. The application of
data mining techniques like k-means, bi
clustering, k nearest neighbor, Artificial
Neural Networks, Support Vector Machine
and Naïve Bayes Classifier in the agriculture
field. Data mining techniques can be,
therefore, grouped in two different ways. An
analysis of the results of crop cutting
experiments in agriculture for yield of
various crops (Deshpande, 2003). They can
be clustering or classification techniques.
Additionally, some of them provides a list of
information for clustering or classification
purpose, while other learns from the
available data for how to perform
classifications.
So we can say that “Data mining is
an expressively production of previously
unrevealed set of records to probably useful
and materialistic data. It is the process of
examining data from different perceptions
and summarizing it into useful information.”
Data mining techniques
Data mining techniques can be
divided in two groups: Classification and
Clustering techniques. Classification
techniques are designed for classifying
unknown samples using information
provided by a set of classified samples. This
set is usually referred to as a training set,
because, in general, it is used to train the
classification technique how to perform its
classification. For example, Neural
Networks and Support Vector Machines
exploit training sets for tuning their
parameters in order to solve a particular
classification problem. In other words, these
two classification techniques learn from a
training set how to classify unknown
samples. Another classification technique,
the k nearest neighbor does not have any
learning phase, because it uses the training
set, every time a classification must be
performed. In the event, a training set is not
available, there is no previous knowledge
about the data to classify. In this case,
clustering technique can be used to split a
set of unknown samples into cluster. One of
the most used clustering techniques is the k
means method. This paper mainly focuses
on the most used techniques in agriculture
related fields.
The main techniques for data mining
include association rules, classification,
clustering and regression. The different data
mining techniques used for solving different
agricultural problem has been discussed
(Mucherino et al. 2009).
Association rules
Association rules mining technique
is one of the most efficient techniques of
data mining to search unseen or desired
pattern among the vast amount of data. In
this method, the focus is on finding
relationships between the different items in a
transactional database. Association rules are
used to find out elements that co-occur
repeatedly within a dataset consisting of
many independent selections of elements
(such as purchasing transactions), and to
discover rules.
The simple problem statement is:
Given a set of transactions, where each
transaction is a set of literals, an association
rule is a phrase of the form A = > B, where
A, B are sets of objects. The instinctive
AGRES An International e. Journal (2017) Vol. 6, Issue 4:637-645 ISSN : 2277-9663
______________________________________________________________________________
www.arkgroup.co.in Page 639
meaning of such a rule is that transactions of
the database which contain A tend to contain
B. (Srikant et al., 1997). An application of
the association rules mining is the market
basket analysis, customer segmentation,
store layout, catalog design and
telecommunication alarm prediction. The
different association rules mining algorithm
are Apriori Algorithm (AA), Partition,
Dynamic Hashing and Pruning (DHP),
Dynamic Itemset Counting (DIC), FP
Growth (FPG), SEAR, Spear, Eclatand
Declat, MaxEclat, etc. (Zaki, 1999).
Classification
Classification and prediction are two
forms of data analysis that can be used to
extract models describing important data
classes or to predict future data trends. It is a
process in which a model learns to predict a
class label from a set of training data which
can then be used to predict discrete class
labels on new samples. To maximize the
predictive accuracy obtained by the
classification model when classifying
examples in the test set unseen during
training is one of the major goals of
classification algorithm. Data mining
classification algorithms can follow three
different learning approaches: semi-
supervised learning, supervised learning and
unsupervised learning. The different
classification techniques for discovering
knowledge are Rule Based Classifiers,
Bayesian Networks (BN), Decision Tree
(DT), Nearest Neighbour (NN), Artificial
Neural Network (ANN), Support Vector
Machine (SVM), Rough Sets, Fuzzy Logic,
Genetic Algorithms, etc. (Beniwal and
Arora, 2012).
Clustering
In clustering, the focus is on finding
a partition of data records into clusters such
that the points within each cluster are close
to one another. Clustering groups the data
instances into subsets in such a manner that
similar instances are assembled together,
while dissimilar instances belong to diverse
groups. Since the aim of clustering is to find
out a new set of categories, the latest groups
are of interest in themselves, and their
assessment is intrinsic (Xu and Wunsch,
2005). There is no prior knowledge about
data. The different clustering methods are
Hierarchical Methods (HM), Partitioning
Methods (PM), Density-based Methods
(DBM), Model-based Clustering Methods
(MBCM), Grid-based Methods and Soft-
computing Methods [fuzzy, neural network
based], Squared Error-Based Clustering
(Vector Quantization), network data and
Clustering graph (Fayyad et al., 1996).
Regression
Regression is learning a function that
maps a data item to a real-valued prediction
variable. The different applications of
regression are predicting the amount of
biomass present in a forest, estimating the
probability of patient will survive or not on
the set of his diagnostic tests, predicting
consumer demand for a new product
(Sawaitul et al., 2012). Here the model is
trained to predict a continuous target.
Regression tasks are often treated as
classification tasks with quantitative class
tag. The methods for prediction are
Nonlinear Regression (NLR) and Linear
Regression (LR).
Regression analysis (Sellam and
Poovammal, 2016), linear are cited.
Described about various environmental
factors that influence the crop yield and the
relationship among these parameters is also
established.
Applications of data mining techniques in
agriculture
There are many studies which have
been accepted on the application of data
mining techniques for agricultural data sets.
Naive Bayes data mining technique is used
to classify soils that analyse large soil
profile experimental data sets. Decision tree
algorithm in data mining is used for
AGRES An International e. Journal (2017) Vol. 6, Issue 4:637-645 ISSN : 2277-9663
______________________________________________________________________________
www.arkgroup.co.in Page 640
predicting soil fertility. By using clustering
techniques based on Partitioning Algorithms
and Hierarchical Algorithm, the land
utilization for agriculture and non-
agriculture areas for the past ten years has
been determined.
As early into the growing season as
possible, a farmer is always concerned with
how much yield of his crop. In the past, this
yield prediction has been relied on farmer’s
experience for particular yield, crops and
climatic conditions. However, this
knowledge might also be available, but not
exactly for the small scale. Accurate data
which can collect in seasons using a
multitude of seasons. Advancement and
consistence of the agricultural production at
a faster in time is one of the basic necessary
for agricultural development.
In India, area and yield production of
different crops are the results, and reflection
of the combined effect of many factors, like
agro-climatic conditions, resource
endowment, technology level, infrastructure,
social and economic conditions. Many
schemes have been invented to maximize
the productivity of various crops in different
agro-climate region, institution, seed,
fertilizer, pesticide, fungicide companies and
many other activities are actively engaged in
the productivity of different crops in
different regions and under different
condition (Just and Weinenger, 1999;
Veenadhari et al., 2011).
Now-a-days, IT had grown to be
more and more part of our day by day
activities. With IT developments in
efficiency can be made in almost any part of
industry and services, and now this is true
for agriculture. A farmer not harvests only
crops but also emerging quantity of data.
These data are accurate and in very less
amount. There is a lot of data available
having information about agriculture. Here
soil and yield assets that should be useful in
a way that farmers are beneficial. This is a
general problem for which data mining is
been there. Data mining techniques aim at
finding that information in the data that are
both important and beneficial to the farmer.
A common particular problem of farmers is
yield prediction.
Neural networks
Sawaitul et al. (2012) focuses the
information about weather. The recorded
parameters are used to forecast weather. If
there is a change in any one of the recorded
parameters like wind speed, wind direction,
temperature, rainfall, humidity, then the
upcoming climatic condition can be
predicted using artificial neural networks,
back propagation techniques. The increase
in signal range will work in large areas as
well. Neural network models for
predicting flowering and physiological
maturity of soybean (Elizondo et al., 1994).
Somvanshi et al. (2006) deliberated the
modelling and prediction of rainfall using
artificial neural networks and Box- Jenkins
methodology along with other applications
of artificial neural networks in hydrology is
forecasting daily water hassle and flow
forecasting.
Maier and Dandy (2000) used BP
neural network and simulated the result
using MATLAB. They found suitable data
model for achieving high accuracy for price
prediction. The prediction is mainly based
on only price. Neural networks for the
prediction and forecasting of water
resources variables.
K-means
Data mining is the process of
discovering meaningful patterns and trends
by shifting through huge amount of data,
using pattern detection technologies as well
as statistical and mathematical techniques.
Data mining techniques are often used to
study soil characteristics. As an example, the
K-Mean approach is used for classifying
AGRES An International e. Journal (2017) Vol. 6, Issue 4:637-645 ISSN : 2277-9663
______________________________________________________________________________
www.arkgroup.co.in Page 641
soils in combination with GPS based
techniques (Verheyen et al., 2001).
Urtubia et al. (2007) stated that the
prediction of wine fermentation problems
can be performed by using a k-means
approach. Knowing in advance that the wine
fermentation process could get jammed or
be slow can help the enologist to correct it
and ensure a good fermentation process. The
K-Means algorithm is used in performing
atmosphere pollution forecast (Jorquera et
al., 2001). Verheyen et al. (2001) used K-
Means approach to classify soils and plants
and Camps-Valls et al. (2003) used SVMs to
classify crops. Apples are checked using
different approaches before sending them to
the market (Breiman et al., 1984).
Fathima, G. N. and Geetha (2014)
used k means and Appriori algorithm, crop
type and irrigation parameters and focused
on the policies that government could frame
by the cropping practices of farmers.
Fuzzy set
Jagielska et al. (1999) described the
applications to agricultural related areas
such as yield prediction is a very important
agricultural problem. Any farmer might be
interested in knowing how much yield is
expected. In the past, yield prediction was
achieved by considering farmer's experience
on particular field, crop and climate
condition. They have discussed additional
information about data like probability in
probability theory, grade of membership in
fuzzy set theory.
Tellaeche et al. (2007) summarized
an automatic computer vision system for the
detection and differential spraying of Avena
sterilis, a toxic weed growing in cereal
crops. With such purpose, it have been
designed a hybrid decision making system
based on the Bayesian and Fuzzy k-Means
classifiers, where the a priori probability
required by the Bayes framework is supplied
by the Fuzzy k-Means. To classifying plant,
soil and residue regions of interest from
colour images using fuzzy clusters (Meyer et
al., 2004).
Naive Bayes, J 48, random forests,
support vector machines, artificial neural
networks were implemented (Sujatha and
Isakki, 2016). Bhargavi and Jyothi (2009)
used climate data and crop parameters for
crop yield prediction. Predicted yield using
Naive Bayes, Apriorityz algorithm, the main
focus was on various soil parameters like
pH, nitrogen, moisture, etc. and comparison
accuracy is also presented. Only 77 per cent
of accuracy is achieved (Hemageetha,
2016).
Decision tree and Bayesian classification
Veenadhari (2007) considered the
influence of climatic factors on major kharif
and rabi crops production in Bhopal District
of Madhya Pradesh State. The findings of
the study revealed that the decision tree
analysis indicated that the productivity of
soybean crop was mostly influenced by
comparative humidity followed by
temperature and rainfall. The decision tree
analysis shows that the productivity of
paddy crop was mostly inclined by rainfall
followed by comparative evaporation and
humidity. For wheat crop, the analysis
showed that the productivity is mostly
influenced by temperature followed by
relative humidity and rainfall. The results of
decision tree were confirmed from Bayesian
classification. The rules formed from the
decision tree are useful for identifying the
conditions intended for high or low crop
productivity.
Shalvi and De Claris (1998) stated
that Bayesian network is a powerful tool and
broadly used in agriculture datasets. The
model developed for agriculture application
based on the Bayesian network learning
method. The results showed that Bayesian
Networks are feasible and efficient.
Bayesian approach improves hydro
geological site characterization even when
using low-resolution resistivity surveys.
AGRES An International e. Journal (2017) Vol. 6, Issue 4:637-645 ISSN : 2277-9663
______________________________________________________________________________
www.arkgroup.co.in Page 642
K-nearest neighbour
The k-nearest neighbor classification
algorithmic rule may be divided into two
phases: coaching section and testing section.
Bermejo associated Cabestany urged a
reconciling learning algorithmic rule to
permit fewer information points to be
utilized in coaching information set. Several
different techniques are projected to scale
back procedure burden of k-nearest
neighbour algorithms (Chinchulunn et al.,
2010). A number of studies have been
carried out on the application of data mining
techniques for agricultural data sets. For
example, the K-Nearest Neighbor is applied
for simulating daily precipitations and other
weather variables (Rajagopalan and Lall,
1999). K-Nearest Neighbor approach was
used to analyse and estimate forest variables
for analyzing satellite imagery (Holmgren
and Thuresson, 1998). A KNearest-
Neighbor approach is simulator for daily
precipitation and other weather variables
(Rajagopalan, and Lall, 1999).
Support vector machine
The main plan of Support Vector
Machine (SVM) is to classify information
samples into two disjoint categories. The
essential plan behind is classifying the
sample information into linearly severable.
Support Vector Machine (SVM) area unit a
group of connected supervised learning
ways used for classification and regression
(Veenadhari et al., 2011).
The SVM-based data mining is
applied to future climate predictions from
the second generation Coupled Global
Climate Model (CGCM2) to obtain future
projections of precipitation. The results are
then analysed to assess the crash of climate
change on rainfall over India. It is shown
that SVMs provide a promising alternative
to conventional artificial neural networks for
statistical downscaling and are appropriate
for conducting climate impact studies
(Tripathi et al., 2006).
Various changes of the weather
scenarios are analysed using SVMs. Data
mining techniques are also applied to study
sound recognition problems. Fagerlund
(2007) used SVMs for classification of the
sound of birds and other different sounds.
Camps-Valls et al. (2003) used Support
vector machines for crop classification using
hyper spectral data in Pattern recognition
and image analysis.
In the field of Agriculture two or more
Data Mining techniques can be applied.
Some are related to weather conditions or
forecasts
To predict the rainfall, used data
mining over two techniques and compares
yield prediction based on rainfall between
MLR Technique and K-Means. The
estimation of average production was 98 per
cent using MLR Technique and 96 per cent
using K-Means algorithm was given as
accuracy (Ramesh and Vishnu Vardhan,
2013). In the agriculture k-means, ID3
algorithms, the k nearest neighbor, support
vector machines, artificial neural networks
presented the purpose of data mining
techniques and were detailed discussed
(Veenadhari et al., 2014).
For weather forecasting, Bendre et
al. (2015) used Map Reduce and Linear
Regression algorithm. The effective model
to improve the accuracy of rainfall
forecasting is investigated. The forecasting
is done based on only a weather data.
CONCLUSION
Agriculture is the most important
application area mainly in the developing
countries like India. Use of information
technology in agriculture can change the
condition of decision making and farmers
can yield in better way. Data mining plays a
crucial role for decision making on several
issues related to agriculture field. It
AGRES An International e. Journal (2017) Vol. 6, Issue 4:637-645 ISSN : 2277-9663
______________________________________________________________________________
www.arkgroup.co.in Page 643
discusses about the role of data mining in
the agriculture field and their related work
by several authors in context to agriculture
domain. There are growing applications for
data mining techniques in agriculture. This
is relatively a new research field and it is
expected to grow in the future. Using data
mining techniques in agriculture can take a
revolution the current condition of decision
making and farmers yield in an advanced
way. Several data mining techniques related
to agriculture domain is useful for
researchers to get information of current
scenario of data mining techniques and
applications in context to agriculture field.
REFERENCES
Bendre, M. R.; Thool, R. C. and Thool, V.
R. (2015). Big data in precision
agriculture: weather forecasting for
future farming. 1st International
Conference on Next Generation
Computing Technologies, pp.744-
750.
Beniwal, S. and Arora, J. (2012).
Classification and feature selection
techniques in data mining, Int. J.
Engg. Res. Tech. 1(6): 1-6.
Bhargavi, P, and Jyothi, S. (2009). Applying
Naive Bayes data mining technique
for classification of agricultural land
soils. Int. J. Compt. Sci. Network
Security, 9(8): 117-122.
Breiman, L.; Friedman, J. H.; Olshen, A. R.
and Stone, C. J. (1984).
Classification and Regression Trees.
Monterey, Calif., U.S.A.:
Wadsworth, Inc.
Camps-Valls, G.; Gómez-Chova, L.; Calpe-
Maravilla, J.; Soria-Olivas, E.;
Martín-Guerrero, J. D. and Moreno,
J. (2003). Support vector machines
for crop classification using hyper
spectral data. In: Iberian Conference
on Pattern Recognition and Image
Analysis Pattern. pp. 134-141.
Chinchulunn, A.; Xanthopoulos, P.;
Tomaino, V. and Pardalos, P. M.
(2010). Data Mining Techniques in
Agricultural and Environmental
Sciences. Int. J. Agril. Environ. Info.
Syst., 1(1): 26-40.
Deshpande, R. S. (2003). An analysis of the
results of crop cutting experiments.
Agricultural Development and Rural
Transformation Unit, Institute for
Social and Economic Change
Nagarbhavi, Bangalore, p. 6.
Elizondo, D. A.; McClendon, R. W. And
Hoogenboom, G. (1994). Neural
network models for predicting
flowering and physiological maturity
of soybean. Transactions of the
American Aoc. Agric. Engineers
(USA), 37(3): 981-988.
Fagerlund, S. (2007). Bird species
recognition using Support Vector
Machines. EURASIP J. Adv. Signal
Processing, Article ID 8637: 1-8.
Fathima, G. N. and Geetha, R. (2014).
Agriculture crop pattern using data
mining techniques. Int. J. Adv. Res.
Computer Sci. Engg., 4(5): 781-786.
Fayyad, U.; Piatetsky-Shapiro, G. and
Smyth, P. (1996). From data mining
to knowledge discovery in databases.
AI Magazine, 17(3): 37-54.
Gleason, C. P. (1982). Large area yield
estimation/forecasting using plant
process models. Presentation at the
Winter Meeting American Society of
Agricultural Engineers, Palmer
House, Chicago, Illinois. December
14-17, 1982.
Hemageetha, N. (2016). A survey on
application of data mining
techniques to analyse the soil for
agriculturalpurpose, 3rd International
Conference on Computing for
Sustainable Global Development
(INDIACom), pp.3112-3117.
AGRES An International e. Journal (2017) Vol. 6, Issue 4:637-645 ISSN : 2277-9663
______________________________________________________________________________
www.arkgroup.co.in Page 644
Holmgren, P. and Thuresson, T. (1998).
Satellite remote sensing for forestry
planning: a review. Scand. J. For.
Res., 13(1): 90 110.
Jagielska, L.; Mattehews, C. and Whitfort,
T. 1999. An investigation into the
application of neural networks, fuzzy
logic, genetic algorithms, and rough
sets to automated knowledge
acquisition for classification
problems, Neurocomputing, 24: 37-
54.
Jorquera, H.; Perez, R.; Cipriano, A. and
Acuna, G. (2001). Short term
forecasting of air pollution episodes.
In: Zannetti, P. (eds). Environmental
Modeling 4. WIT Press, UK.
Just R. E. and Weinenger Q. (1999). Are
crop yields normally distributed?
American J. Agric. Econ., 81: 287-
304.
Maier, H. R. and Dandy, G. C. (2000).
Neural networks for the prediction
and forecasting of water resources
variables: a review of modelling
issues and applications. Environ.
Modeling Software, 15(1): 101-124.
Meyer, G. E.; Camargo Neto, J.; Jones, D.
D. And Hindman, T. W. (2004).
Intensified fuzzy clusters for
classifying plant, soil, and residue
regions of interest from color
images. Comput. Electronics Agric.,
42(3), 161-180.
Mucherino, A.; Papajorgji, P. and
Pardalos, P. M. (2009). A survey of
data mining technique applied to
agriculture. Operational Res., 9(2):
121-140.
Rajagopalan, B. and Lall, U. (1999). A K-
nearest neighbor simulator for daily
precipitation and other weather
variable, Water Resour. Res., 35:
3089-3101.
Ramesh, D. and Vishnu Vardhan, B. (2013).
Data mining techniques and
applications to agricultural yield
data. Int. J. Adv. Res. Compu.
Communi. Engg., 2(9): 3477-3480.
Sawaitul, S. D.; Wagh, K. P. and Chatur, P.
N. (2012). Classification and
prediction of future weather by using
back propagation algorithm - An
approach. Int. J. Emerging Tech.
Adv. Engg., 2(1): 110-113.
Sellam, V. and Poovammal, E. (2016).
Prediction of crop yield using
regression analysis. Indian J. Sci.
Tech., 9(38): 1-5.
Shalvi, D. and De Claris, N. (1998).
Unsupervised neural network
approach to medical data mining
techniques. In: Proceedings of IEEE
International Joint Conference on
Neural Networks, (Alaska), pp. 171-
176.
Somvanshi, V. K.; Pandey, O. P.; Agrawal,
P. K.; Kalanker, N. V.; Ravi Prakash,
M. and Chand, R. (2006). Modeling
and predicaion of rainfall using
artificial neural and ARIMA
techniques. J. Indian Geophys
Union, 10(2): 141-151.
Srikant, R.; Quoc, V. and Agrawal, R.
(1997). Mining association rules
with item constraints. Proceedings of
the Third International Conference
on Knowledge Discovery and Data
Mining. pp. 67-73.
Sujatha, R. and Isakki, P. (2016). A study on
crop yield forecasting using
classification techniques,
International Conference on
Computing Technologies and
Intelligent Data Engineering
(ICCTIDE), pp.1-4.
Tellaeche, A.; BurgosArtizzu, X. P.; Pajares,
G. And Ribeiro, A. (2007). A vision-
based hybrid classifier for weeds
detection in precision agriculture
through the Bayesian and Fuzzy k-
Means paradigms. In: Innovations in
AGRES An International e. Journal (2017) Vol. 6, Issue 4:637-645 ISSN : 2277-9663
______________________________________________________________________________
www.arkgroup.co.in Page 645
Hybrid Intelligent Systems. Springer
Berlin Heidelberg.
Tripathi, S.; Srinivas, V. V. and Nanjundiah,
R. S. (2006). Downscaling of
precipitation for climate change
scenarios: a support vector machine
approach. J. Hydro., 330(3): 621-
640.
Urtubia, A.; Pérez-Correa, J. R.; Soto, A.,
and Pszczolkowski, P. (2007). Using
data mining techniques to predict
industrial wine problem
fermentations. Food Control, 18(12):
1512-1517.
Veenadhari, S. (2007). Crop productivity
mapping based on decision tree and
Bayesian classification. M. Tech
Thesis (Unpublished) submitted to
Makhanlal Chaturvedi National
University of Journalism and
Communication, Bhopal.
Veenadhari, S.; Misra, B. and Singh, C. D.
(2011). Data mining techniques for
predicting crop productivity A
review article. Int. J. Comp. Sci.
Tech., 2(1): 98-100.
Veenadhari, S.; Misra, B. and Singh, C. D.
(2014). Machine learning approach
for forecasting crop yield based on
climatic parameters. International
Conference on Computer
Communication and Informatics,
pp.1-5.
Verheyen, K.; Adriaens, D.; Hermy, M. and
Deckers, S. (2001). High resolution
continuous soil classification using
morphological soil profile
descriptions. Geoderma, 101: 31-48.
Xu, R. and Wunsch, D. (2005). Survey of
clustering algorithms. IEEE
Transactions on Neural Networks ,
16(3): 645-678.
Zaki, M. J. (1999). Parallel and distributed
association mining: A survey. IEEE
concurrency, 7(4): 14-25.
[MS received : September 22, 2017] [MS accepted : October 07, 2017]
... Data mining in agriculture has been the rece method adopted in minimizing agricultural risk and ensuring food security. Data mining is an expressively production of previously unrevealed set of records to probably useful and materialistic data [32]. Data mining techniques can be divided into two groups which are classification and cluster techniques (Patel et al., 2017). ...
... Data mining is an expressively production of previously unrevealed set of records to probably useful and materialistic data [32]. Data mining techniques can be divided into two groups which are classification and cluster techniques (Patel et al., 2017). Naive Bayes data mining techniques is used to classify soils that analyse large soil profile experimental data set. ...
... Naive Bayes data mining techniques is used to classify soils that analyse large soil profile experimental data set. Likewise, Decision tree algorithm in data mining is used for predicting soil fertility [32]. Data mining techniques in agriculture are but not limited to: neural network, k-means, Decision tree, Bayasian classification and k-nearest neighbour. ...
Conference Paper
Full-text available
Achieving the food and environment security in the future climate changes is a great challenge for agriculture society. Food security is an essential precursor to environmental protection. Food production is likely to maintain priority over environmental protection.This article reviews the potential of applying control system and digital techniques in agricultural operations for food and environmental security. The likely impacts of control system and digital techniques for food and environmental security on the other important dimensions of food security are discussed qualitatively. Finally, the current assessment studies are discussed, suggesting improvements and proposing technique for new approaches. Therefore, in modern agriculture, the application of smart control system and digital techniques is very crucial for sustaining future food and environment security. The system enables to integrate and manage natural resources, human resources, pest, disease, climatic conditions, nutrients, and other resources efficiently and sustainably. Therefore, this publication provides plenty of information by analyzing a 50 years data from 108 countries, intends to summarize and discuss the past and current evidences, suggest improvements, and propose control systems and digital techniques for achieving smarter agriculture for environmental and food security.
... Besides, every farmer is eager to estimate their crops' yields. Applying data mining techniques makes the process of crop yield prediction simple, accurate, and rapid [13]. ...
... Many studies applied data mining techniques such as k nearest neighbor, Artificial Neural Network, bi-clustering, kmeans, Naïve Byes Classifier, and Support Vector Machine to predict crop yield [13]. As an example, in [14] the researchers used data mining to predict crop yield by using crop parameters and climate data. ...
... As an example, in [14] the researchers used data mining to predict crop yield by using crop parameters and climate data. Classification and clustering algorithms can be used to cluster and group yields [13]. One of the effective conditions on crop yield is the climate. ...
... The subdivisions of this tree are placed in several possible subtrees according to the regression function (linear model), usually in the leaves (Arora & Dhir, 2017; Yu-Xun et al., 2014). lBk(lazy.IBK) is a distance weightier of K-nearest neighbors, selecting the appropriate value of K based on cross-validation; MLP (MultiLayerPercepton) is composed of an output layer and one or two intermediate layer; and NäiveBayes use independent data as it is a probabilistic classifier (Frank et al., 2016 andPatel &Kathiriya, 2017;Harrison, 2019). ...
... The subdivisions of this tree are placed in several possible subtrees according to the regression function (linear model), usually in the leaves (Arora & Dhir, 2017; Yu-Xun et al., 2014). lBk(lazy.IBK) is a distance weightier of K-nearest neighbors, selecting the appropriate value of K based on cross-validation; MLP (MultiLayerPercepton) is composed of an output layer and one or two intermediate layer; and NäiveBayes use independent data as it is a probabilistic classifier (Frank et al., 2016 andPatel &Kathiriya, 2017;Harrison, 2019). ...
Article
Full-text available
The seed sector faces several challenges when it comes to ensuring a quick and accurate decision making when working with large amounts of data on physiological quality of seed lots, which makes the process time-consuming and inefficient. Thus, artificial intelligence (AI) emerges as a new technological option in the seed sector to solve database problems in the post-harvest stages. This study aims to use machine learning to classify maize seed lots. Data were obtained from eight maize seed crops from a private company. These data were mined using the following classifiers: J48 (DecisionTree), RandomForest, CVR (ClassificationViaRegression ) , lBk (lazy.IBK), MLP (MultiLayerPercepton), and NäiveBayes. Cross-validation was used for data measurement, with the data set, including training and testing data, being divided into 10 subsets. The described steps were performed using the Weka software. It is concluded that results obtained allow the classification of maize seed lots with high accuracy and precision, and these algorithms can better classify the maize seed lot through vigor attributes, thus enabling more accurate decision making based on vigor tests on a reduced evaluation time. quality control; classification; artificial intelligence; corn; data mining
... Patel Amikshaetol suggested that accurate and reliable information about crop yield prediction is important for taking decisions for agricultural risk management. Crop yield prediction is also important for supply chain operation of companies engaged in industries that use agricultural produce as raw material [9].Carmen Ana Anton etol proposedthat the collaborative data mining technique involves an extensive process of analysing the participating data and their sources, requiring a correlation between sources and high accuracy for the data used [10]. ...
Article
Full-text available
The dissipated agricultural production diversified agricultural consumption and the lack of cooperation and docking between small scale farmers and market stimulates exit to agriculture. This paper presents the agricultural merchandising information endorsement system by using data sets processed by data mining for precisely suggesting the crops to be sown, fertilizers and pesticides to the farmers based on the season and location by using K-means clustering algorithm and Hierarchical clustering algorithm. This agricultural merchandising information endorsement system apprise the farmers regarding the crops to be sown in a specific season and also make the farmers aware of the contemporary market price of the agricultural product. This sort of system is worthwhile for young generation to customize themselves to the long established traditional farming technique.
... For anticipating the reaping and increment of the benefit for agri-business in the harvest yields by utilizing K-Means, KNN, ANN, and SVM. The issue of expectation was to investigate the harvest yields in agribusiness [1]. ...
... The best hyperplane is defined as the one which is having the maximum distance between the closest point and the hyperplane of the two classes. They are more popular for their capability to handle numerous continuous and definite variables [10]. The margin classifier is shown in Fig. 5. ...
... Data mining is the process of extracting useful and important information from large sets of data. Data mining in agriculture field is a relatively novel research field [11] [14]. Data mining, through better management and data analysis, can assist agricultural organizations to achieve greater profit. ...
Article
Full-text available
Agriculture is the most important resource of livelihood and an emerging field the forms the backbone of India. Present challenges of the agriculture domain include uncertain climatic changes, poor irrigation facilities, weather uncertainty. Machine learning is one such technique that is employed to predict the fertility of the soil in agriculture. ensemble machine learning techniques aim to create meta-classifiers to produce better predictive performance. The primary focus of this paper is to analyze the soil data that is collected from the soil testing laboratory to predict fertility from a collected dataset by using multiple ensemble machine learning algorithms such as bagging, boosting, and stacking for better prediction, accuracy, and higher consistency. The soil fertility classes were evaluated using 10 selected attributes. Measurements of different soil parameters have been used for predicting soil fertility. The experimental result shows that the boosting method on the C5.0 algorithm achieved higher accuracy than other ensemble classifiers with 98.15%.
Chapter
Full-text available
In the 21st century, the application of technology in the agriculture sector is the area of attention to the researcher. Technology is applied for smart farming in all the different stages, including preparation of soil, sowing, adding manure and fertilizers, irrigation, harvesting, and storage. To date, image processing, machine learning, deep learning, the internet of things, data mining, and wireless sensor networks are employed in the agriculture sector. In this article, we perform a survey of almost 170 articles on which the latest methodologies are applied. Further, we examine the suggested methods and reported advantages and limitations. This chapter aims to provide a brief summary to the researchers who are working in this field. This study results in substantial awareness of the existing expertise gap and identifying possible future research opportunities for smart farming and precision farming.
Article
Full-text available
The advances in computing and information storage have provided vast amounts of data. The challenge has been to extract knowledge from this raw data that has lead to new methods and techniques such as data mining that can bridge the knowledge gap. This research aimed to assess these new data mining techniques and apply them to a soil science database to establish if meaningful relationships can be found. A large data set of Soil database is extracted from the Department of Soil Sciences and Agricultural Chemistry, S V Agricultural College, Tirupati, The database contains measurements of soil profile data from various locations of Chandragiri Mandal, Chittoor District. The research establishes whether Soils are Classified Using various data mining techniques. In addition, comparison was made between Naive bayes classification and analyse the most effective technique. The outcome of the research may have many benefits, t o agriculture, soil management and environmental.
Article
Full-text available
In this paper an attempt has been made to review the research studies on application of data mining techniques in the field of agriculture. Some of the techniques, such asID3 algorithms, the k-means, the k nearest neighbor, artificial neural networks and support vector machines applied in the field of agriculture were presented. Data mining in application in agriculture is a relatively new approach for forecasting / predicting of agricultural crop/animal management. This article explores the applications of data mining techniques in the field of agriculture and allied sciences.
Article
Full-text available
Data mining is a form of knowledge discovery essential for solving problems in a specific domain. Classification is a technique used for discovering classes of unknown data. Various methods for classification exists like bayesian, decision trees, rule based, neural networks etc. Before applying any mining technique, irrelevant attributes needs to be filtered. Filtering is done using different feature selection techniques like wrapper, filter, embedded technique. This paper is an introductory paper on different techniques used for classification and feature selection.
Conference Paper
This paper gives an idea about how to discover additional insights from precision agriculture data through big data approach. We present a scenario for the use of Information and Communication Technology (ICT) services in agricultural big data environment to collect huge data. Big data analytics in agriculture applications provide a new insight to give advance weather decisions, improve yield productivity and avoid unnecessary cost related to harvesting, use of pesticide and fertilizers. Paper list out the different sources of big data in precision agriculture using ICT components and types of structured and unstructured data. Also discussed big data in precision agriculture, an ICT scenario for agricultural big data, platform, its future applications and challenges in precision agriculture. Finally, we have discussed results using a programming model and distributed algorithm for data processing and forecasting application of weather.
Article
It is important for farmers to know when various plant development stages occur for making appropriate and timely crop management decisions. Although computer simulation models have been developed to simulate plant growth and development, these models have not always been very accurate in predicting plant development for a wide range of environmental conditions. The objective of this study was to develop a neural network model to predict flowering and physiological maturity for soybean (Glycine max L. Merr.). An artificial neural network is a computer software system consisting of various simple and highly interconnected processing elements similar to the neuron structure found in the human brain. A neural network model was used because it has the capabilities to identify relationships between variables of rather large and complex data bases. For this study, field-observed flowering dates for the cultivar 'Bragg' from experimental studies conducted in Gainesville and Quincy, Florida, and Clayton, North Carolina, were used. Inputs considered for the neural network model were daily maximum and minimum air temperature, photoperiod, and days after planting or days after flowering. The data sets were split into training sets to develop the models and independent data sets to test the models. The average relative error of the test data sets for date of flowering prediction was + 0.143 days (n = 21, R2 = 0.987) and for date of physiological maturity prediction was + 2.19 days (n = 21, R2 = 0.950). It can be concluded from this study that the use of neural network models to predict flowering and physiological maturity dates is promising and needs to be explored further.
Article
Data mining techniques are largely used in different sectors of the economy and they increasingly are playing an important role in agriculture and environment-related areas. This paper aims to show our vision on the importance of knowing and effciently using data mining and machine learning-related techniques for knowledge discovery in the feld of agriculture and environment. Efforts for searching hidden patterns in data are not a recent phenomenon. History shows that extensive observations on data have helped discover empirical laws in different felds of research. Therefore, it is important to provide researchers in agriculture and environmental-related areas with the most advanced knowledge discovery techniques. Data mining is the process of extracting important and useful information from large sets of data. This information can be converted into useful knowledge that could help to better understand the problem in study and to better predict future developments. The paper presents the state of the art in data mining and knowledge discovery techniques and provides discussions for future directions.
Article
The Climate impact studies in hydrology often rely on climate change information at fine spatial resolution. However, general circulation models (GCMs), which are among the most advanced tools for estimating future climate change scenarios, operate on a coarse scale. Therefore the output from a GCM has to be downscaled to obtain the information relevant to hydrologic studies. In this paper, a support vector machine (SVM) approach is proposed for statistical downscaling of precipitation at monthly time scale. The effectiveness of this approach is illustrated through its application to meteorological sub-divisions (MSDs) in India. First, climate variables affecting spatio-temporal variation of precipitation at each MSD in India are identified. Following this, the data pertaining to the identified climate variables (predictors) at each MSD are classified using cluster analysis to form two groups, representing wet and dry seasons. For each MSD, SVM- based downscaling model (DM) is developed for season(s) with significant rainfall using principal components extracted from the predictors as input and the contemporaneous precipitation observed at the MSD as an output. The proposed DM is shown to be superior to conventional downscaling using multi-layer back-propagation artificial neural networks. Subsequently, the SVM-based DM is applied to future climate predictions from the second generation Coupled Global Climate Model (CGCM2) to obtain future projections of precipitation for the MSDs. The results are then analyzed to assess the impact of climate change on precipitation over India. It is shown that SVMs provide a promising alternative to conventional artificial neural networks for statistical downscaling, and are suitable for conducting climate impact studies.
Article
Soil grid data were gathered from 156 points in the 30-ha Muizen forest (Ranst, Belgium). At each grid point, soil profiles were examined morphologically by augering to 120-cm depth. In the laboratory, pH(KCl) was determined on samples from every horizon. To allow numerical analyses, all the morphological attributes were given ordinal scores. The analysis consisted of two parts. First, the master horizons were split up into subtypes using Principal Components Analysis and a non-hierarchical clustering technique. This was necessary to overcome the problem of the anisotropy of the soil profiles, which makes it impossible to pool the data of all the horizons and analyse them together. Next, the distinguished horizon subtypes were used as input for the continuous soil profile classification with the ‘fuzzy k-means with extragrades’ algorithm.Five different soil classes plus an extragrade class were distinguished. The distinguished soil classes exhibited a fair degree of spatial autocorrelation and correlated well with the Belgian Soil Map.The technique developed ensures the compatibility with national or global soil classification systems based on diagnostic horizons and properties on the one hand and the production of high-resolution soil classes for local use on the other. Furthermore, the developed technique allows reanalysis and optimisation of data from previous surveys.
Article
Winemakers currently lack the tools to identify early signs of undesirable fermentation behavior and so are unable to take possible mitigating actions. Data collected from tracking 24 industrial fermentations of Cabernet sauvignon were used in this study to explore how useful is data mining to detect anomalous behaviors in advance. A database held periodic measurements of 29 components that included sugar, alcohols, organic acids and amino acids. Owing to the scale of the problem, we used a two-stage classification procedure. First PCA was used to reduce system dimensionality while preserving metabolite interaction information. Cluster analysis (K-Means) was then performed on the lower-dimensioned system to group fermentations into clusters of similar behavior. Numerous classifications were explored depending on the data used. Initially data from just the first three days were assessed, and then the entire data set was used. Information from the first three days’ fermentation behavior provides important clues about the final classification. We also found a strong association between problematic fermentations and specific patterns found by the data mining tools. In short, data from the first three days contain sufficient information to establish the likelihood of a fermentation finishing normally. Results from this study are most encouraging. Data from many more fermentations and of different varieties needs to be collected, however, to develop a reliable and more broadly applicable diagnostic tool.