ArticlePDF Available

Big Data in Agriculture: A Challenge for the Future



This article examines the challenge and opportunities of Big Data, and concludes that these technologies will lead to relevant analysis at every stage of the agricultural value chain. Big Data is defined by several characteristics beyond size, particularly, the volume, velocity, variety, and veracity of the data. We discuss a set of analytical techniques that are increasingly relevant to our profession as one addresses these issues. Ultimately, we resolve that agricultural and applied economists are uniquely positioned to contribute to the research and outreach agenda on Big Data. We believe there are relevant policy, farm management, supply chain, consumer demand, and sustainability issues where our profession can make major contributions. The authors are thankful to the anonymous reviewers and editor Craig Gundersen for helpful comments. Support was provided by the Mississippi Agricultural and Forestry Experiment Station Special Research Initiative. © The Author(s) 2018. Published by Oxford University Press on behalf of the Agricultural and Applied Economics Association. All rights reserved.
Submitted Article
Big Data in Agriculture: A Challenge for the
Keith H. Coble*, Ashok K. Mishra, Shannon Ferrell, and
Terry Griffin
Keith H. Coble is the Giles Distinguished Professor, Department of Agricultural
Economics, Mississippi State University. Ashok Mishra is the Kemper and Ethel
Marley Foundation Chair in Food Management, W.P Carey Morrison School of
Agribusiness, Arizona State University. Shannon Ferrell is an associate professor,
Department of Agricultural Economics, Oklahoma State University. Terry Griffin is
an assistant professor, Department of Agricultural Economics, Kansas State
*Correspondence to be sent to:
Submitted 27 March 2017; editorial decision 21 October 2017.
Abstract This article examines the challenge and opportunities of Big Data, and
concludes that these technologies will lead to relevant analysis at every stage of the
agricultural value chain. Big Data is defined by several characteristics beyond size,
particularly, the volume, velocity, variety, and veracity of the data. We discuss a set
of analytical techniques that are increasingly relevant to our profession as one
addresses these issues. Ultimately, we resolve that agricultural and applied econo-
mists are uniquely positioned to contribute to the research and outreach agenda on
Big Data. We believe there are relevant policy, farm management, supply chain,
consumer demand, and sustainability issues where our profession can make major
contributions. The authors are thankful to the anonymous reviewers and editor
Craig Gundersen for helpful comments. Support was provided by the Mississippi
Agricultural and Forestry Experiment Station Special Research Initiative.
Key words: Big Data, precision agriculture, analytical methods.
JEL codes: K11, Q12, Q16, Q18.
A variety of indicators suggest that the availability of sensors, mapping
technology, and tracking technologies have changed many farming systems
and the management of the food system as it flows from producers to con-
sumers. Big Data has significant potential to address the issues of modern
societies, including the needs of consumers, financial analysts, marketing
agents, producers, and decision makers. While some of these information
technologies have been available for some time, adoption surveys such as
CThe Author(s) 2018. Published by Oxford University Press on behalf of the Agricultural and Applied
Economics Association. All rights reserved. For Permissions, please e-mail:
Applied Economic Perspectives and Policy (2018) volume 40, number 1, pp. 79–96.
Downloaded from
by Kansas State University Libraries user
on 21 February 2018
Griffin et al. (2017),Schimmelpfennig (2016),Erickson and Widmar (2015),
and Hennessy, L
apple, and Moran (2016) suggest continued increased rates
of adoption of the various forms of these technologies.
Dyer (2016) suggests we have moved to an informational revolution in
the agricultural sector. In many cases, sensor technology and data analytics
from other industries are now applied to agricultural applications. Robert
Fraley, Chief Technology Officer of Monsanto, has stated that “Monsanto
executives are seeking to reposition the company as a business built on data
science and services, as well as its traditional chemicals, seeds and genetic
traits operations”. The $930 million acquisition of Climate Corp in 2013 by
Monsanto evidences this trend (Upbin 2013). The AgFunder Agtech
Investing Report (AgFunder 2017) identifies approximately $1.4 billion of
total investments in two categories in 2015, including robotics, mechaniza-
tion, and other hardware, along with farm management software, sensing,
and Internet of Things (IoT).
A variety of technological advances have created the opportunities of Big
Data (Sonka 2015). In many cases, computational capacity both in terms of
speed and volume allows for novel analyses previously not possible. First, it
is now possible to conduct analysis on large volumes of data (such as
weather data) and use it for actionable decision-making. Interestingly, data
from multiple sources including public data, machine and sensor data, and
other privately- held data are often integrated. In some applications
“macro” level analysis is possible that aggregates data to provide useful in-
dustry- or market-level analysis. Conversely, data can affordably be
obtained and utilized at a “micro” scale. In this case, management can occur
at a site-specific or unit level such as sub-field areas. This may mean site-
specific fertilization in crop agriculture, or tracking of the cuts from a beef
carcass to final consumer. These trends lead to numerous discussions of the
present and future impact of “Big Data,” but to date those discussions have
lacked a clear definition of what Big Data means. Coble et al. (2016) suggest
that it refers to “large, diverse, complex, longitudinal, and/or distributed
data sets generated from click streams, email, instruments, Internet transac-
tions, satellites, sensors, video, and/or all other digital sources available to-
day and in the future.” Stubbs suggests the term big data as it is applied to
agriculture is less about the size of the data and more about the combination
of technology and advanced analytics that creates a new way of processing
information in a way that is more useful and timely. Coble et al. (2016) sup-
port this approach by defining the data in terms of volume, velocity, vari-
ety, and veracity, with “volume” referring to the size of the data, “velocity”
measuring the flow of data, “variety” reflecting the frequent lack of struc-
ture or design to the data, and finally “veracity” reflecting the accuracy and
credibility of the data.
Information technologies provide new and useful data for decision mak-
ing and analysis, therefore they naturally align with the skills and interests
of applied economists. In fact, pockets of this type of economic analysis
have existed for some time. For example, the use of retail scanner data
(Capps 1989) has largely met the Big Data definition provided earlier. Other
areas legitimately claiming the Big Data label include some large-scale eco-
logical models, certain government-collected survey and government pro-
gram data. However, we sense that many new challenges are ahead. Here
are a few issues that appear imminent.
Applied Economic Perspectives and Policy
Downloaded from
by Kansas State University Libraries user
on 21 February 2018
Farm management, long a mainstay of the agricultural economics profes-
sion, has been a relatively dormant area for research in recent decades.
With precision agriculture advances and adoption, new opportunities
and requests are likely to confront our profession. Liu, Swinton, and
Miller (2006) provide a useful case study of how precision agriculture
poses new and relevant questions to our profession. In the present issue,
Featherstone (2018) also provides a useful forward-looking discussion of
these issues.
Food scanner and similar data may be classified as Big Data, given their
volume, velocity, and variety. Scanner data have a wide variety of appli-
cations, including research projects, program evaluations, regulatory im-
pact analysis, and data products. Real-time store scanner data can be
used to study healthy diets (Kuchler, Tegene, and Harris 2005). Food
scanner data has been linked to USDA nutritional data and the USDA’s
National Household Food Acquisition and Purchase Survey (FoodAPS)
to provide information about the food environment, such as prices and
offerings at stores where the consumers did not go shopping.
The use of geo-spatial techniques could improve modeling of crop yield
and, by extension, pricing of crop insurance products (Ker and Coble
2003;Ozaki, Ghosh, and Goodwin 2008;Annan et al. 2014;Woodard and
Verteramo-Chiu 2017). We expect the profession to find use of these
techniques across many sub-disciplines including environmental eco-
nomics ( e.g., determining demand for non-production services using
commercial satellite imagery data).
Food and agricultural policy analysis is likely to evolve with new data
and analytical techniques. Environmental management at a micro- and
macro-scale will be enhanced. For example, nitrogen management at the
sub-field level or for a major watershed will be possible with precision in
places that it was not possible before. Another example is the use of low -
cost commercial satellite imagery and Big Data to develop daily models
of non-market values of wildernesses.
The role of program data and government data collection will be changed
in fundamental ways to reflect new data sources and analytics. In some
instances, digital agriculture may allow enhanced analysis of government
program data (Woodard 2016). Tack et al. (2017) discuss potential compe-
tition between public surveys and private data. This discussion is illus-
trated by the USDA National Agricultural Statistical Service requesting a
National Academies of Sciences, Engineering, and Medicine panel review
of yield and cash rent estimation methods used by the USDA. A crucial
question addressed in this report was the integration of survey data with
government program data and models based on imagery. (National
Academies of Sciences, Engineering, and Medicine 2017).
Much of the useful Big Data produced today is in the hands of the private
sector. Coble et al. (2016) suggest that the landscape of public and private
farm data is likely to change and access to data for research will be a critical
issue. In many cases the data are held by a variety of firms ranging from
small individual farms to large corporate input suppliers. Tremblay (2017)
argues that agricultural research must reach beyond Fisher’s experimental
design and utilize analytical techniques capable of learning from the
machine and sensor data, that is, it must rely upon observation farm
production data along with data from controlled experiments.
Big Data in Agriculture: A Challenge for the Future
Downloaded from
by Kansas State University Libraries user
on 21 February 2018
Economic Research Priorities for Agricultural Big Data
Within the changing agricultural management landscape created by
advancements in Big Data, what areas should be prioritized for economic re-
search? Numerous opportunities present themselves, ranging from farm-
level to societal benefits.
Precision Agriculture and Farm Management
To understand the connection between “Small Data” and “Big Data” in
agriculture, it is useful to discuss the prevalence of farm-level sensors and
other precision agriculture technology (figure 1). Perhaps ironically, the evo-
lution and revolution in agricultural Big Data comes from the expansion of
“Small Data” in agriculture; that is, the remarkable growth in producers’
ability to collect data pertaining only to their own operation through the
growth of techniques and technologies such as grid soil sampling, telematics
systems for farm equipment, Global Navigation Satellite Systems (GNSS),
farm aerial imagery acquired via small unmanned aerial systems (sUAS),
and the like. In simplest terms, farms use “Small Data” when data are iso-
lated to the fields where the data originated. Farmers who use information
technology to conduct their own on-farm experiments, document yield pen-
alties from poor drainage, or negotiate crop share agreements are using data
that is considered “small.” Producer adoption of these information technolo-
gies has increased dramatically in recent years (Griffin et al. 2017), giving
rise to a profusion of agricultural data heretofore unseen (Erickson and
Widmar 2015).
The new abundance of field-level information provided by these technolo-
gies could improve the ability of producers to make profit-maximizing deci-
sions benefitting the producer operating the field, that is, “Small Data”
(Griffin et al. 2017). However, pooling the datasets of hundreds or thou-
sands of fields could hold a much greater potential value both to individual
producers and the agricultural industry as a whole. Agricultural Big Data—
farm data that has been combined into an aggregate form— has the poten-
tial to reveal undiscovered insights. Currently, only limited quantitative evi-
dence exists regarding the value of assembling data from precision
agriculture technology into a community; however, indirect evidence sug-
gests that farm data has economic value.
One conceptual example of farm-level decision making is analysis based
upon product-by-environment-by-management scenarios, or the so-called
GxExM relationship. Historically, agricultural research focused on the inter-
action between inputs (such as a specific grain variety) and the environment
(such as the presence of a given profile of soil nutrients and expected precip-
itation). One could refer to this as Genetics by Environment (“G x E” ) anal-
ysis. This approach generally excluded farmers’ management practice
variables in the analyses, in part because focus was still on the one-field-
at-a-time paradigm such that the specific farmer’s management practices
were held constant for that field. Big Data’s inclusion of outcomes from dif-
fering management strategies, from numerous fields employing a variety of
inputs and environmental conditions could enable evaluation of the pro-
ducers’ management decisions as a variable as well, creating genetics by en-
vironment by management (“G x E x M”) analyses where “genetics” loosely
represents any product or system. Traditional agricultural research has
Applied Economic Perspectives and Policy
Downloaded from
by Kansas State University Libraries user
on 21 February 2018
focused on the phenotypic product-by-environment interaction rather than
including the farmer and their management practices as a variable in the
analyses. The utilization of farm data originating from precision agricultural
technology is guiding decisions not only at the farm level but also for the
manufacturers of inputs and equipment. This possibility opens innumerable
avenues for research on the impacts of management practices on production
outcomes, and could profoundly impact the sub-discipline of farm manage-
ment. Farmers are but one of the many players attempting to benefit from
Big Data. The marginal benefit differs not only for each population of play-
ers, but also differs along its lifecycle. The economics of networks, that is,
network externalities, describe how individuals benefit from participation in
a community or network (Varian 1999). The data from farms aggregated
into the community are more valuable than data from any one farm would
be individually. Given the network effects, the value of the data community
is a function of the number of members of the system, and the data service
provider enjoys much greater benefits than any other groups in the long
run. However, in the short run, data service providers are likely to entice
farms to join their network at least up to the point where a critical number
of farms have joined (Varian 1999).
When farm data are aggregated into a community, the secondary uses of
the data have a greater value than the summation of the initial uses of that
data (Mayer-Scho¨nberger and Cukier 2014). The distinction between Small
Data and Big Data can be made clear by examining how the data fits into
the initial or primary use of data versus the re-use or secondary use of that
same data. For example, the initial uses of yield monitors may include docu-
menting yields near drainage structures, while farm-level data on soil nutri-
ent testing and subsequent as-applied variable rate fertility information are
used by the farmer to fine-tune sub-field production. In the aggregate these
same data— site-specific yield and soil test plus as-applied fertility— com-
bined with similar data from thousands of other farmers provide insights
into nutrient run-off. It is the re-use of farm data that gives rise to Big Data
and the ability to assess environmental issues.
At the current position along the lifecycle of Big Data, data service pro-
viders strive to entice a critical mass of farmers to submit farm data so that
the repository is replete (Coble et al. 2016). This is in part due to the fact that
the value of a farm data community eventually depends on the number of
farms and acres in the system, that is, the size of the network. Early in the
Figure 1 Proportion of Kansas farms using precision agriculture technology (N¼455)
1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015
Percent of farms
Big Data in Agriculture: A Challenge for the Future
Downloaded from
by Kansas State University Libraries user
on 21 February 2018
farm data community lifecycle, data service providers may entice farmers,
especially given the nonhomogeneous characteristics of farm data and farm-
ers. This lack of homogeneity may result from farms with varying levels of
data quality, for example, some farmers are known to calibrate yield moni-
tors properly while other farmers may not correctly tag corn hybrids to
fields. Further, some farms may be able to provide quality data from sub-
stantially larger acreages while other farms may have limited acreage that
precision agriculture sensors were utilized. In addition to quantity and qual-
ity concerns, some farms may be perceived as local leaders. When these lo-
cal leaders join the data community, other farmers are likely to follow.
However, it should be noted that only a few exceptions fit the above criteria;
and the overwhelming majority of farms are likely to voluntarily join the
system with most even paying a fee. Essentially, data service providers are
vying to become what is expected to be a natural monopoly. In the long run,
the group that controls the data system enjoys the majority of the value
(Mayer-Scho¨nberger and Cukier 2014). Therefore, the next wave of farm
management education is likely to focus on farm data issues and whether
farms should relinquish control of farm data to third parties.
Policy and Legal implications
As mentioned above, Big Data has the potential to expand and deepen
the tools for evaluation of farm-level decisions; by the same token, it could
also expand the ability to evaluate the effect of policy interventions on the
agricultural macroeconomy. In the long term, the growth of Big Data may
give rise to new models for the evaluation of policy shocks to economic sys-
tems, but in the near term, the availability of larger and potentially more ro-
bust datasets may increase the accuracy of existing model outputs.
While Big Data eventually may impact the ability to evaluate policy deci-
sions, its growth necessitates policy decisions today. Big Data carries the
ability for potentially market-distorting actions as discussed below, but be-
fore one can have Big Data, individual producers must be willing to share
their data. Concerns about data ownership and protections against both de-
liberate and inadvertent data disclosure abound among producers.
Currently, there is no federal legislation protecting farm data like there is
for health data (such as HIPAA) or personal financial data (FCRA 1970).
Significant discussion on these points have led to several public dialogues
calling for both public and private policies regarding farm data protections,
such as the “Privacy and Security Principles for Farm Data” coordinated by
the American Farm Bureau Federation (American Farm Bureau Federation
2017). Federal policymakers have taken note of these issues as well (House
Agriculture Committee 2015). While these policy discussions continue, there
has been no action at the federal level regarding farm data protections.
Until the data privacy issues are resolved, Big Data systems are reliant
upon farmers both trusting data aggregators and sharing farm-level data
for use in the aggregate. Farmers have typically readily shared their farm
production and financial data, including geo-referenced farm data (Griffin,
Reichlin, and Small 2008) with trusted partners such as universities; how-
ever, existing transfer systems are time-consuming and inefficient. Both par-
ties would benefit from an improved system of transferring data, preferably
wirelessly and in real-time. Farmers are being incentivized to share farm
data via low-cost or “freemium” models, and some services are providing
Applied Economic Perspectives and Policy
Downloaded from
by Kansas State University Libraries user
on 21 February 2018
rudimentary comparative analysis, that is, agronomic and financial bench-
marking, in exchange for providing data. Given that the agricultural indus-
try is currently in the infancy of Big Data services, it is expected that farmers
must be enticed to join data systems and share farm data. In the longer run,
it is expected that farmers will freely join and even pay to participate in Big
Data services; however, it is unclear when a critical mass of farms and acre-
age will enroll.
Economic theory applied to networks suggests that when a critical mass
of users, that is, farms or acreage, join the system, the membership will ex-
ponentially increase. However, until critical mass is achieved, the growth of
data services is expected to be slow. At least one example of farmers being
paid for data exists; in 2016, Farmobile guaranteed their customers in south-
ern Minnesota that they would receive at least $2 per acre (Farmobile 2016).
In 2017, the company expanded similar offering to other customers, but at
$1 per acre. During the infancy of Big Data, incentives such as this may be
relatively more common than at any other time along the life cycle. Once a
critical mass of acreage exists with one data company, farmers are expected
to freely join that company and submit data from their farm. Essentially,
farmers are poised to share farm data with third parties, especially if clear
benefit-cost analyses indicate some perceived tangible or even intangible
Asymmetric Information Implications
The Holy Grail for market participants is to obtain perfect information as
soon as it is knowable, and preferably before it is knowable to others. While
Big Data has a long, long way to go before achieving this, bigger steps to-
ward that goal are being taken faster than ever before. Thus, a significant
concern with aggregating agricultural data is whether— either legitimately
or not— a small number of market participants (or a single actor) could
gain access to information sufficient to move (or even manipulate) markets
faster than, or to the exclusion of, other market participants. While there are
numerous rules in place to deal with a broad range of market-manipulating
activities, none of these current rules contemplate the type of actions that
could take place with a sufficiently large aggregated dataset. Currently,
there are various rules restricting insider trading (see 17 C.F.R. §1.59(a); 17
C.F.R. § 1.3(ee)), and government employees are prohibited from using data
for financial gain that has not been disseminated to the public (7 U.S.C.
§6c(a)(3)). However, there are no rules governing “very good market
information” such as that which could be obtained through completely legal
means by aggregating sufficient telematics data (as an example). As a result,
research on the potential market effects of growing market asymmetries that
could be triggered by growing Big Data aggregations and the implications
of policies restricting the use of aggregated data in commodity market trans-
actions could do much to inform the development of law in the arena.
A farmer’s decision to join a farm data network is likely a function of how
they perceive their data and its value. Farmers who view farm data as an in-
tangible resource may fear that relinquishing data may reduce their local ne-
gotiation power with landowners, retailers, and other service providers
(Griffin et al. 2016). Further, some farmers may opt not to participate in data
communities for fear that others may disproportionately benefit from their
participation or the data that they bring into the system.
Big Data in Agriculture: A Challenge for the Future
Downloaded from
by Kansas State University Libraries user
on 21 February 2018
Sustainability and Traceability
While Big Data holds the promise of numerous economic benefits, it also
creates opportunities for environmental benefits. Agricultural pollutant run-
off has been a source of growing concern for water quality in the
Chesapeake Bay, the Gulf of Mexico, and the Great Lakes. Traditionally ag-
ricultural runoff has been difficult to regulate by virtue of the “non-point”
nature of such runoff, and the fact it is not directly regulated under the
Clean Water Act’s National Pollutant Discharge Elimination System
(NPDES; 40 C.F.R. § 122.3). Historically, nutrient runoff was addressed
through voluntary programs through the Clean Water Act’s non-point
source management program, which provides funding to state programs
aimed at reducing nutrient releases in agricultural storm water runoff (33
U.S.C. § 1329). However, in some circumstances (such as with pollution con-
cerns in the Chesapeake Bay), a “total maximum daily load” or TMDL may
be imposed with the effect of requiring states to develop enforceable nutri-
ent management plans (33 U.S.C. § 1313(d)). Both “small ag data” and Big
Data have prospective roles to play in helping address nutrient runoff con-
cerns. The increased adoption of precision agricultural tools at the farm level
holds the potential to actually decrease nutrient application by matching nu-
trient inputs more closely to plant needs; at the regional level, this could re-
duce overall nutrient loading to sensitive waterways. The “as-applied”
maps generated from precision agriculture tools could also facilitate farm-
ers’ ability to demonstrate compliance with nutrient management plans by
showing the specific amount and location of nutrient applications (though
this raises separate issues of sensor calibration and accuracy; Sisung 2016).
Big Data tools could also significantly advance the tools used to manage nu-
trient concerns at the regional level through improved evaluation of policy
tools instruments and modeling of nutrient management strategies such as
nutrient “cap and trade” systems.
Concerns about food safety and consumer desires for more information
about the sourcing of their food could also be addressed through small and
Big Data as well. Telematics systems from the tractor to the retail center cre-
ate the possibility of complete “farm to fork” tracking of foodstuffs which
would enable disease traceability, while metadata collected along the distri-
bution chain could be used to provide support for source verification and
compliance with any number of production practice requirements.
From this discussion, it can be seen that there are numerous potential eco-
nomic research questions to be answered through the application of preci-
sion agriculture and Big Data tools, and as the power of those tools grow, so
will the calls for agricultural economists to respond to these and other ques-
tions. But how will those tools actually help find answers? To unlock the po-
tential of evidence-based decision-making, entities or organizations need to
convert the high volume, high frequency, and diverse data into meaningful
insights. In this process, Labrinidis and Jagadish (2012) note that the extrac-
tion of insights can be broken down into two stages, namely, data manage-
ment and analytics. Data management, on the one hand, includes process
and supporting technologies to acquire and store data. Data is then pre-
pared, transformed, and retrieved for analysis. Diebold (2012) notes that Big
Data can lead to much stronger conclusions for data-mining applications.
On the other hand, analytics refers to techniques that can be used to analyze
and acquire information or intelligence from Big Data. Several Big Data
Applied Economic Perspectives and Policy
Downloaded from
by Kansas State University Libraries user
on 21 February 2018
techniques can be used to analyze both structured and unstructured data.
These include (a) text analytics; (b) audio analytics; (c) social media analytics;
(d) video analytics; and (e) predictive analytics. In applied economics, the
main focus is on predictive analytics.
Research Methods Using Big Data
Predictive analytics includes a variety of techniques or procedures that
can predict the future outcome based on either historical and/or current
data. For example, we can predict consumers’ buying habits based on what
they buy, when they buy, and even what they are writing about the product
that they bought on social media. One of the hallmarks of predictive analyt-
ics is seeking to uncover patterns and relationships in data; tools for accom-
plishing this can be subdivided into two groups. First, techniques such as
moving averages attempt to discover historical patterns in the outcome vari-
ables and then predict the future. Second is the regression analysis, which is
well known in our profession.
Recall that all precision agriculture is based on statistical methods, and
the statistical methods behind these methods may not apply to the problems
being addressed by Big Data. There are several reasons for this. For exam-
ple, conventional statistical methods are based on statistical significance,
where results from a small sample, (obtained from the population) are com-
pared to examine the significance of particular relationships, and the conclu-
sions are generalized with respect to the entire population. However, in the
case of Big Data, which are massive in size, the “sample” may actually rep-
resent the majority of, or the entire population; that is. the sample size
equals “all” (Mayer-Scho¨nberger and Cukier 2014). Therefore, any statistical
significance test is not relevant to Big Data, especially those tests aimed at
samples from a population. Finally, Fan, Han, and Lui (2014) point out that
Big Data has heterogeneity, noise accumulation, spurious correlations, and
incidental endogeneity. In other words, the underlying concept of Big Data
relies relatively more on correlation and less on causation than the theory-
based science upon which agricultural economics analyses have largely
been based.
Big Data is heterogeneous because it represents information from differ-
ent sub-populations and from different sources. The sheer size of Big Data
helps us in modeling heterogeneity and requires sophisticated statistical
techniques. Since estimation of predictive models using Big Data often
involves the simultaneous estimation of several parameters, it may give rise
to accumulated error terms. As a result, the true effect of variables may be
masked. In their study, Fan and Lv (2008), through simulation modeling,
show that the correlation between independent variables tends to increase
with the size of the dataset. Therefore, in Big Data analysis, because of high
dimensionality, we may see some variables that should not be in the model
(unrelated) may be correlated. Finally, recall that in regression modeling, we
assume exogeneity—the error term is independent of the predictors or the
explanatory variables. The assumption of exogeneity is usually met in small
samples, but incidental endogeneity is commonly present in Big Data.
Big Data in Agriculture: A Challenge for the Future
Downloaded from
by Kansas State University Libraries user
on 21 February 2018
Machine Learning
Machine learning, a branch of computer science and one of the major
areas of artificial intelligence, can be used to construct algorithms to exploit
the potential value of Big Data.
Note that for machines to become intelli-
gent like humans, they must learn like humans; human minds learn from
past data and experiences and then applies this learning to future decisions.
Machine learning is a two-step process. First, the machine has to learn the
input data; secondly, the machine has to interpret it and analyze the input
and output data to create machine algorithms. The algorithms can then con-
struct a system model, which is used to predict future values. Machine
learning methods are more flexible than conventional statistical methods be-
cause they do not rely on user-specified models. Instead, they self-improvise
using the available volume of data.
There are three types of machine learning algorithms: Supervised learning
(SL): If the output variables are provided, then the learning becomes super-
vised. In SL, the algorithm is given some training examples and the machine
studies input and corresponding outputs.
Therefore, popular SL algorithms
include artificial neural networks (Kaul et al. 2005;Uno 2005;Chen and
Mcnairn 2006;Khoshnevisan et al. 2014), decision trees (Veenadhari,
Mishra, and Singh 2011), K-means clustering (Shawe-Taylor and Cristianini
2004), support vector machines (Radhika and Shashi 2009), and Bayesian
networks (Bakker and Heskes 2003).
The artificial neural network (ANN)
algorithm has been widely used in the agricultural field. ANN is an inter-
connected set of inputs and output units where weight is associated with
each connection (see Drummond, Sudduth, and Birrell 2008).
The ANN has
an advantage over multiple regression because ANN can select an indepen-
dent variable in the data, learn complex relationships, and does not place
strict requirements a priori on a functional a functional form. The neural net-
work can discover more complex variables.
The second type of algorithm is unsupervised learning (UL): In UL, the al-
gorithm is not provided with outputs and learning helps us find interesting
information about our dataset solely looking at its features alone. Popular
UL algorithms are self-organizing maps (SOM), partial based clustering,
hierarchical clustering, K-means clustering, COBWEB, and density-based
spatial clustering.
To date, these techniques have rarely been used in agri-
culture and economics field.
The third type of algorithm is reinforcement learning (RL): With RL, the
learning process works on the principle of feedback. The notion is that every
Applications of machine learning are multi-disciplinary.
See Mucherino et al. 2009.
See Cheng and Titterington (1994) and Warner and Misra (1996). On one hand, Cheng and
Titterington (1994) have reviewed the artificial neural network (ANN) methodology. On the other hand,
Warner ad Misra (1996) emphasize understanding ANN as a statistical tool. The accuracy of ANN
increases with the volume of data. The advantages of the ANN is that: (a) ANN are capable of adopting
their complexity without knowing the underlying principles; (b) ANN can derive relationships between
input and output on any process.
Bayesian networks focus on two issues: estimating the conditional probability tables from training data
when the structure of the network is known;and learning a network’s structure from training data.
The ANN can be used in flood forecasting, modeling rainfall, and run-off relationships.
See Moshou et al. 2006.
The COBWEB is an incremental and unsupervised clustering algorithm that produces a hierarchy of
classes:its incremental nature allows clustering of new data without having to repeat the existing cluster-
ing. See Fisher’s Cobweb (1987).
Applied Economic Perspectives and Policy
Downloaded from
by Kansas State University Libraries user
on 21 February 2018
action has an impact on the system; the impact or information is then
reported back to the algorithm. Consequently, the algorithm modifies its be-
havior. Popular algorithms include genetic algorithms, and Markov decision
algorithms (e.g., Matis, Birkett, and Bourreaux 1989;Jain and
Ramasubramalliall 1998;Osman, Inglada, and Dejoux 2015).
An example of machine learning in agricultural economics is the predic-
tion of farmland values. Academic research and at least one commercial of-
fering has focused on predicting the value that a parcel of land will be sold
for using current and historical land sales, soil characteristics, climatic and
weather data, cropping systems, remotely sensed imagery, potential of ur-
ban sprawl (Livanis et al. 2006;Castle, Wu, and Weber 2011), and the gen-
eral economic situation including commodity prices and interest rates
(Irwin and Sanders 2011). The Big Data implication of predicted farmland
values is to determine if expected sales prices are over- or under-valued. A
commercial example is Granular AcreValue from DuPont.
The commonly used models are linear regression models (Shibayama
1991); polynomial regression models (Wilcox et al. 2001), and nonlinear re-
gression models (House 1979). Variable selection for the models can be
based on several methods including stepwise regression, principal compo-
nent regression, Bayesian information criterion, Akaike information crite-
rion, and partial least squares (for details, see Castle, Qin, and Reed 2009 ).
Varian (2014) show that the classical multivariate regression model can be
used to predict the outcome variable using predictor variables and adding a
penalty term to the classical minimization of the sum of squared residual—
a technique called elastic net regression (ENR). The complexity in numbers
and size of the predictors coming from Big Data tend to shrink the least
squares coefficients to zero, which can make ENR an attractive technique for
working with such datasets. The researchers can choose the coefficients in
ENR. In the case of both ENR and least absolute shrinkage and selection op-
erator (LASSO), some of the variables are set to be exactly zero—leading to
computation efficiency, feasibility, and providing good predictions (Varian
Spike and Slab Regression Analysis
Another regression technique useful for Big Data is spike and slab regres-
sion. This is a Bayesian technique, originally coined by Mitchell and
Beauchamp (1988), which refers to a type of prior probability distribution
(“prior”) used for the regression coefficients in linear regression models.
Note that the use of a normal prior was instrumental in facilitating efficient
Gibbs sampling of the posterior; this, in turn, made the spike and slab vari-
able selection method computationally attractive. In 2010, Ishwaran and
Rao (2010) developed a generalized ridge regression (GRR), which pos-
sesses unique advantages in high-dimensional correlated settings to esti-
mate the model; the weighted GRR is more effective than other tools in
many circumstances.
A technique related, but not identical to, the spike-and-slab method is
Bayesian moving averaging (BMA). Bayesian methods are becoming
It is assumed that the regression coefficients were mutually independent with a two-point mixture dis-
tribution made up of a uniform flat distribution (the slab) and a degenerate distribution at zero (the
Big Data in Agriculture: A Challenge for the Future
Downloaded from
by Kansas State University Libraries user
on 21 February 2018
increasingly popular as frameworks for model selection and forecasting
tools. In some cases, analysts ignore the uncertainty in model selection,
resulting in overconfident inferences and decisions that are riskier. The
BMA techniques are designed to account for this uncertainty. By averaging
over several different competing models, BMA incorporates the model un-
certainty into the parameters and predictions (see Jacobs et al. 1991). Zhou
et al. (2012) have proposed model selection and comparison in the case of
BMA. These authors constructed posterior probabilities properties and
model parameters based on sequential Monte Carlo sampling, and used
these properties to compare different models. A final model is obtained as a
weighted average of all models, where the weight of each model is its poste-
rior probability. Varian (2014) concludes that Bayesian techniques are com-
putationally efficient and preferred to exhaustive searches. Finally, using
Big Data, Ley and Steel (2009) have compared LASSO, Bayesian model aver-
aging, and spike-and-slab methods to show which variables are important
predictors of economic growth.
Time Series Analysis Using Big Data
Time series forecasting is a model used to predict future values based on
previously observed values. The time series analysis is important for crop
forecasting, stock prices, price movement, and futures and options. There
are various types of time series analysis methods, including parametric or
non-parametric, frequency domain and time domain, and linear, univari-
ate, and multivariate. Note that frequency domain analysis includes spectral
analysis and wavelet analysis; time domain includes auto-correlation and
cross-correlation. Parametric approaches include autoregressive or moving
average models; non-parametric approaches include covariance or spectrum
and usually focus on a smooth spectral density. The Bayesian Structural
Time Series (BSTS) model works well for handling the variable selection
problem in the case of time series analysis. Banbura, Giannone, and Reichlin
(2011) introduced “nowcasting” as a term in econometric time series analysis,
which refers to forecasting a current value instead of the future value.
nowcasting model has two components, namely a general trend, and sea-
sonal pattern in the data. In the case of Big Data, where the number of po-
tential predictors in the regression model is large (often larger than the
number of observations available to fit the model), a Markov chain Monte
Carlo (MCMC) sampling algorithm can be used to simulate from the poste-
rior distribution. Finally, one can use Bayesian model averaging to smooth
the predictions over a large number of potential models.
Applications in Agriculture and Applied Economics
Several applications of the above-mentioned techniques can be used to en-
hance the productivity of farms along with reducing their use of inputs.
Weather forecasting. Environmental factors like weather influence crop
growth and development as well as recreational demand for both agricul-
tural and non-agricultural lands. Production agriculture has spatial yield
Banbure et al. (2011) conclude that a good or effective nowcasting model should consider both past be-
havior of the series and easily observed contemporaneous signals. Now casting is a contraction term for
now and forecasting (Giannone, Reichlin, and Small 2008).
Applied Economic Perspectives and Policy
Downloaded from
by Kansas State University Libraries user
on 21 February 2018
variability, partly because of spatial variability in soil properties and interac-
tions with the weather, which is also spatially varied. Machine learning
techniques like Support Vector Machines (see Vapnik 1998) can be used to
predict the weather for farmers to aid in their decision-making (see Agrawal
and Mehta 2007 and Radhika and Shashi 2009).
Crop yield prediction and crop selection. Machine learning provides many ef-
fective algorithms, which can identify input and output relationships in
crop selection and yield prediction. Popular techniques such as artificial
neural networks, K-nearest neighbors, and decision trees have proven to be
effective in crop selection, which is based on various factors like climate,
soils, natural calamities, famine, and other inputs. Using several soil charac-
teristics (e.g., topsoil depth, phosphorous, potassium, salt, organic matter,
and magnesium saturation) as input and artificial neural networks,
Drummond, Suddeth, and Birrell (2008) accurately predicted corn and soy-
bean yields.
Irrigation systems. Agriculture consumes a major portion of world’s fresh
water. Variability in rainfall, climate change, and dropping of the water ta-
ble in developing countries is alarming. Using smart irrigation systems and
data collected by sensors can be used to make better decisions regarding wa-
ter usage. Several studies using artificial neural network algorithms have
been able to predict accurate water levels and rainfall runoffs (Ashaary,
Ishak, and Ku-Mahamud 2015;Chakravarti, Joshi, and Panjiar 2015).
Crop disease prediction. Early crop disease detection can be accomplished
through machine learning.
In their study, Drummond, Suddeth, and
Birrell 2008 note that ANN could be helpful in predicting pest attacks in ad-
vance. Such models deal well with noisy and multi-faceted data and ac-
count for wide ranges of possible factors (e.g., historical data, satellite/
sensor data, field conditions, images of leaves) to effectively learn and pre-
dict crop diseases. In 2010 Rumpf et al., used Support Vector Machines to
develop early crop disease detection algorithms.
Agricultural policy and trade. A large quantity of data on production output
of crops, changes in input costs, market demand and supply, market price
trends, cultivation costs, wages, transportation costs, and marketing costs
could be used by ANN algorithms to predict support prices for farmers by
governments in both developed and developing countries. For instance, Big
Data can be beneficial when simulating agricultural policy impacts. The ap-
plication to the Individual Farm Model for Common Agricultural Policy
Analysis (IFM-CAP) model in the European Union illustrates the capability
for assessing policy impacts at the farm level (Louhichi et al. 2015).
There is one more critical application—or, rather, implication—of Big
Data for the agricultural and applied economics profession. All the chal-
lenges discussed here beg to address the question of whether agricultural
economics departments should provide more graduate student instruction
for Big Data issues. Supplementing the traditional analytical tools our stu-
dents learn in classrooms and during their thesis research with computer
science and non-traditional statistics may increase the rate at which agricul-
tural economists can make meaningful contributions not only in applied
economics but other disciplines. This may mean that the departments of ag-
ricultural economics dedicated to providing research and education on Big
Data employ non-economist faculty in their ranks to provide specific
Factors like soil quality, crop rotation cycle, and seed quality can help detect crop diseases.
Big Data in Agriculture: A Challenge for the Future
Downloaded from
by Kansas State University Libraries user
on 21 February 2018
expertise. At the 2017 AAEA Symposium on Big Data, graduate students
were challenged to consider how they could replace themselves with an al-
gorithm in lieu of physically interacting with data. The challenge was ex-
tended to include the students considering how to identify outliers without
ever having the opportunity to “see” the data, but rather to build models to
anticipate erroneous data, flag it for omission, and continue the analysis.
Three reasons exist for the need to automate analytic processes: data may
be considered confidential such that no set of eyes can be on the data;
replacing human capital with an algorithm substantially lowers per unit
costs of analysis; and there are likely not enough analysts available to meet
the demand for analysis in the future.
Given our assessment of the needs and opportunities arising from the Big
Data expansion, we come to a few significant conclusions for our profession
and those who draw upon our work. First, there is a unique and important
role for agricultural and applied economists in this changing technological
environment. We see an opportunity for our profession to stand at the hub
of work within multi-disciplinary teams. Our profession is trained to handle
and draw valid inferences from non-experimental data. Further, most ap-
plied economists are trained and comfortable with unstructured, messy
data. Many in our discipline have already engaged in some form of big data
analysis and we understand the important distinctions between causation
and simple predictive models. Having noted some comparative advantages
of our profession, we also challenge agricultural and applied economists to
prepare a next generation of our profession with training in geo-spatial anal-
ysis and analytical techniques described in this paper. Furthermore, we
need to be the champions for the merit of research with these type of data,
and advocate for non-experimental data access and research funding.
We perceive an important role for academic researchers and land grant
personnel in this venue. First, there is a need for basic and applied multi-
disciplinary research that provides objective third-party analysis. Ground
truthing seed varieties may morph into ground truthing software and other
roles. There is also a clear role for extension to help train and educate pro-
ducers and agri-business professionals how to manage new tools and data.
Clearly, educational topics like data ownership and evaluation of precision
agriculture investment will be in demand.
Finally, we have touched upon several looming policy issues, which is not
surprising as many policy debates are stimulated by technological change.
First, there is room for discussion regarding data ownership of these data.
The returns and development of these technologies depend on the owner-
ship rules in place. Second, we find that infrastructure needs such as rural
broadband are potentially limiting the use of these technologies, as rural
broadband access provides a critical bridge between small data and Big
Data. Thus, to the extent that access to these technology provides a compar-
ative advantage to certain areas, largely rural areas are disadvantaged.
Third, we perceive opportunities and threats to public objective data collec-
tion and government program data. Ultimately, we advocate for a reimagin-
ing of agricultural data collection such that the greatest synergism can be
obtained from integrating private data, government program data, and spe-
cific data collection surveys meant to complement other available tools.
Applied Economic Perspectives and Policy
Downloaded from
by Kansas State University Libraries user
on 21 February 2018
AgFunder. 2017. AgTech Investing Report Year in Review 2016. Available at:
Agrawal, R., and S.C. Mehta. 2007. Weather Based Forecasting of Crop Yields, Pests,
and Diseases - IASRI Models. Journal of Indian Society Agricultural Statistics 62 (2):
American Farm Bureau Federation. Privacy and Security Principles for Farm Data.
Available at:
Annan, F., J.B. Tack, A. Harri, and K.H. Coble. 2014. Spatial Pattern of Yield
Distributions: Implications for Crop Insurance. American Journal of Agricultural
Economics 96 (1): 253–68.
Ashaary, N., W. Ishak, and K. Ku-Mahamud. 2015. Neural Network Application in
the Change of Reservoir Water Level Stage Forecasting. Indian Journal of Science and
Technology 8 (13): 1–6.
Bakker, B., and T. Heskes. 2003. Task Clustering and Gating for Bayesian
Multitasking Learning. Journal of Machine Learning Research 4 (5): 83–99.
Banbura, M., D. Giannone, and L. Reichlin. 2011. Nowcasting. In Oxford Handbook of
Economic Forecasting, eds. M.P. Clements and D.F. Hendry, 193–224. Oxford
University Press.
Capps, O. 1989. Utilizing Scanner Data to Estimate Retail Demand Functions for Meat
Products. American Journal of Agricultural Economics 71 (3): 750–60.
Castle, J., X. Qin, and R. Reed. 2009. How to Pick the Best Regression Equation:
A Review and Comparison of Model Selection Algorithms. Working Paper No. 13/
2009 Department of Economics and Finance College of Business and Economics
University of Canterbury.
Castle, E., J.J. Wu, and B. Weber. 2011. Place Orientation and Rural-Urban
Interdependence. Applied Economic Perspectives and Policy 33 (2): 179–204.
Chakravarti, A., N. Joshi, and H. Panjiar. 2015. Rainfall Runoff Analysis Using the
Artificial Neural Network. Indian Journal of Science and Technology 8 (14): 1–7.
Chen, C., and H. Mcnairn. 2006. A Neural Network Integrated Approach for Rice
Crop Monitoring. International Journal of Remote Sensing 27 (7): 1367–93.
Cheng, B., and D.M. Titterington. 1994. Neural Networks: A Review from Statistical
Perspective: Rejoinder. Statistical Science 9 (1): 49–54.
Co, H., and R. Boosarawongse. 2007. Forecasting Thailand’s Rice Export: Statistical
Techniques vs. Artificial Neural Networks. Computers and Industrial Engineering 53
(4): 610–27.
Coble, K., T.W. Griffin, M. Ahearn, S. Ferrell., J. McFadden, S. Sonka, and J. Fulton.
2016. Advancing U.S. Agricultural Competitiveness with Big Data and Agricultural
Economic Market Information, Analysis, and Research (No. 249847). Washington
DC: Council on Food, Agricultural, and Resource Economics.
Diebold, F.X. 2012. A Personal Perspective on the Origin(s) and Development of “Big
Data”: The Phenomenon, the Term, and the Discipline, Second Version. University
of Pennsylvania, Penn Institute for Economic Research, Working Paper No. 13-003.
Dyer, J. 2016. The Data Farm: An Investigation of the Implications of Collecting Data
on the Farm. Taunton, Somerset: Nuffield Australia Project No 1506.
Erickson, B., and D.A. Widmar. 2015. Precision Agricultural Services Dealership
Survey Results. West Lafayette, IN: Purdue University. Available at: http://agri–crop–life–purdue–precision–dealer–
Fair Credit Reporting Act (FCRA). 1970. 15 U.S. Code §§ 1681. Washington DC: U.S.
Fan, J., F. Han, and H. Liu. 2014. Challenges of Big Data Analysis. National Science
Review 1 (2): 293–314.
Big Data in Agriculture: A Challenge for the Future
Downloaded from
by Kansas State University Libraries user
on 21 February 2018
Fan, J., and J. Lv. 2008. Sure Independence Screening for Ultrahigh Dimensional
Feature Space. Journal of the Royal Statistical Society: Series B (Statistical Methodology)
70 (5): 849–911.
Farmobile. 2016. Farmobile Announces Data Store, Guarantees Minnesota Farmers at
Least $2 per Acre for Electronic Field Records. Available at: https://www.farmo–Data–Store–Press–Release.pdf.
Featherstone, A.M. 2018. The Farm Economy: Future Research and Education
Priorities. Applied Economic Perspectives and Policy 40 (1): 136–54.
Fisher, D.H. 1987. Knowledge Acquisition Via Incremental Conceptual Clustering.
Machine Learning 2: 139–72.
Giannone, D., L. Reichlin, and D. Small. 2008. Nowcasting: The Real-time
Informational Content of Macroeconomic Data. Journal of Monetary Economics 55 (4):
Griffin, T.W., C.L. Dobbins, T.J. Vyn, R.J.G.M. Florax, and J.M. Lowenberg-DeBoer.
2008. Spatial Analysis of Yield Monitor Data: Case Studies of On-farm Trials and
Farm Management Decision Making. Precision Agriculture 9 (5): 269–83.
Griffin, T.W., T.B. Mark, S. Ferrell, T. Janzen, G. Ibendahl, J.D. Bennett, J.L. Maurer,
and A. Shanoyan. 2016. Big Data Considerations for Rural Property Professionals.
Journal of American Society of Farm Managers and Rural Appraisers 79: 167–80.
Griffin, T.W., N.J. Miller, J. Bergtold, A. Shanoyan, A. Sharda, and I.A. Ciampitti.
2017. Farm’s Sequence of Adoption of Information-Intensive Precision Agricultural
Technology. Applied Engineering in Agriculture 33 (4): 521–7.
Health Insurance Portability and Accountability Act. 42 U.S. Code §§ 201 et seq., and
45 C.F.R Parts 160 and Part 164. Washington DC: U.S. Congress.
Hennessy, T., D. L
apple, and B. Moran. 2016. The Digital Divide in Farming: A
Problem of Access or Engagement? Applied Economic Perspectives and Policy 38 (3):
House, C.C. 1979. Forecasting Corn Yields: A Comparison Study Using 1977 Missouri
Data. U.S. Department of Agriculture, Economics, Statistics and Cooperatives
Service, Statistical Research Division. June 1979, 66.
Irwin, S., and D. Sanders. 2011. Index Funds, Financialization, and Commodity
Futures Markets. Applied Economic Perspectives and Policy 33 (1): 1–31.
Ishwaran, H., U.B. Kogalur, and J.S. Rao. 2010. Spikeslab: Prediction and Variable
Selection Using Spike and Slab Regression. The R Journal 2 (2): 68–73.
Ishwaran, H., J.S. Rao, and U.B. Kogalur 2013. Spikeslab: Prediction and Variable
Selection Using Spike and Slab Regression. R Package Version 1.1.5.
Jacobs, R.A., M.I. Jordan, S.J. Nowlan, and G.E.Hinton. 1991. Adaptive Mixtures of
Local Experts. Neural Computation 3 (1): 79–87.
Jain, R., and V. Ramasubramalliall. 1998. Forecasting of Crop Yields Using Second
Order Markov Chains. Journal of the Indian Society of Agricultural Statistics 51: 61–72.
Kaul, M., L. Robert, H. Hill, and C. Walthall. 2005. Artificial Neural Networks for
Corn and Soybean Yield Prediction. Agricultural System 85 (1): 1–18.
Ker, A., and K. Coble. 2003. Modeling Conditional Yield Densities. American Journal of
Agricultural Economics 85 (2): 291–304.
Khoshnevisan, B., S. Rafiee, M. Omid, H. Mousazadeh, and M.A. Rajaeifar. 2014.
Application of Artificial Neural Networks for Prediction of Output Energy and
GHG Emissions in Potato Production in Iran. Agricultural Systems 123: 120–27.
Kuchler, F., A. Tegene, and J.M. Harris. 2005. Taxing Snack Foods: Manipulating Diet
Quality or Financing Information Programs? Applied Economic Perspectives and
Policy 27 (1): 4–20.
Labrinidis, A., and H.V. Jagadish. 2012. Challenges and Opportunities with Big Data.
Proceedings of the VLDB Endowment 5 (12): 2032–3.
Ley, E., and M.K. Steel. 2009. On the Effect of Prior Assumptions in Bayesian Model
Averaging with Applications to Growth Regression. Journal of Applied Econometrics
24 (4): 651–74.
Applied Economic Perspectives and Policy
Downloaded from
by Kansas State University Libraries user
on 21 February 2018
Liu, Y., S.M. Swinton, and N.R. Miller. 2006. Is Site-specific Yield Response
Consistent Over Time? Does It Pay? American Journal of Agricultural Economics 88
(2): 471–83.
Livanis, G., C.B. Moss, V. Brennan, and R. Nehring. 2006. Urban Sprawl and
Farmland Prices. American Journal of Agricultural Economics 88 (4): 915–29.
Louhichi, K., P. Ciaian, M. Espinosa, L. Colen, A. Perni, and S. Gomez y Paloma.
2015. EU-wide Individual Farm Model for CAP Analysis (IFM-CAP): Application
to Crop Diversification Policy. European Commission, Joint Research Center,
Sevilla, Spain. Available at:
Matis, J.H., T. Birkett, and D. Boudreaux. 1989. An Application of the Markov Chain
Approach to Forecasting Cotton Yields from Surveys. Agricultural Systems 29 (4):
Mayer-Scho¨ nberger, V., and K. Cukier. 2014. Big Data: A Revolution That Will
Transform How We Live, Work, and Think. New York, NY: Houghton Mifflin
Harcourt Publishing Company.
Mitchell, T., and J. Beauchamp. 1998. Bayesian Variable Selection in Linear
Regression. Journal of the American Statistical Association 83: 1023–36.
Moshou, D., C. Bravo, S. Wahlen, J. West, A. McCartney, J. Baerdemaeker, and H.
Ramon. 2006. Simultaneous Identification of Plant Stresses and Diseases in Arable
Crops Using Proximal Optical Sensing and Self-organising Maps. Precision
Agriculture 7 (3): 149–64.
Mucherino, A., P. Papajorgji, and M. Paradalos. 2009. A Survey of Data Mining
Techniques Applied to Agriculture. Operational Research 9 (2): 121–40.
National Academies of Sciences, Engineering, and Medicine. 2017. Improving Crop
Estimates by Integrating Multiple Data Sources. Washington DC: The National
Academies Press.
Osman J., J. Inglada, and J.F. Dejoux. 2015. Assessment of a Markov Logic Model of
Crop Rotations for Early Crop Mapping. Computers and Electronics in Agriculture
113: 234–43.
Ozaki, V.A., S.K. Ghosh, and B.K. Goodwin. 2008. Spatio-Temporal Modeling of
Agricultural Yield Data with an Application to Pricing Crop Insurance Contracts.
American Journal of Agricultural Economics 90 (4): 951–61.
R Core Team. 2017. R: A Language and Environment for Statistical Computing. R
Foundation for Statistical Computing, Vienna, Austria. Available at: https://www.
Radhika, Y., and M. Shashi. 2009. Atmospheric Temperature Prediction Using
Support Vector Machines. International Journal of Computer Theory and Engineering 1
(1): 55–9.
Rumpf, T., A. Mahlein, U. Steiner, E. Oerke, H. Dehne, and L. Lumer. 2010. Early
Detection and Classification of Plant Diseases with Support Vector Machines Based
on Hyperspectral Reflectance. Computers and Electronics in Agriculture 74 (1): 91–9.
Schimmelpfennig, D., and R. Ebel. 2016. Sequential Adoption and Cost Savings from
Precision Agriculture. Journal of Agricultural and Resource Economics 41 (1): 97–115.
Sisung, T. 2016. Soil Testing and Nutrient Application Practices of Agricultural
Retailers in the Great Lakes Region. Master of Agribusiness Thesis. Department of
Agricultural Economics, Kansas State University.
Sonka, S. 2015. Big Data: From Hype to Agricultural Tool. Farm Policy Journal 12 (1):
Stubbs, M. 2016. Big Data in U.S. Agriculture. Washington DC: Congressional
Research Service, Report R44331.
Tack, J., K.H. Coble, R. Johansson, A. Harri, and B. Barnett. 2017. The Potential
Implications of “Big Ag Data” for USDA Forecasts. Available at:
Big Data in Agriculture: A Challenge for the Future
Downloaded from
by Kansas State University Libraries user
on 21 February 2018
Tremblay, N. 2017. Confronting the Challenges of Big Data for Precision Agriculture.
Presentation at International Society of Precision Agriculture Annual Meetings,
February 17. SaintJean-sur-Richeliueu, Quebec.
Uno, Y. 2005. Artificial Neural Networks to Predict Corn Yield from Compact
Airborne Spectrographic Imager Data. Computers and Electronics in Agriculture 47
(2): 149–61.
Upbin, B. 2013. Monsanto Buys Climate Corp for $930 Million. Forbes (online edition),
October 2, 2013. Available at:
U.S. House of Representatives Committee on Agriculture, Big Data and Agriculture:
Innovations and Implications. 2015. Washington DC: Committee Hearing
Vapnik, N.V. 1998. Statistical Learning Theory. New York: Wiley.
Varian, H. 2014. Big Data: New Trick for Econometrics. Journal of Economic
Perspectives 28 (2): 3–28.
Varian, H.R. 1999. Market Structure in the Network Age. Understanding the Digital
Economy Conference. Washington DC: U.S. Department of Commerce.
Veenadhari, S., B. Mishra, and C.D. Singh. 2011. Soybean Productivity Modelling
Using Decision Tree Algorithms. International Journal of Computer Applications. 27
(7): 11–15.
Warner, B., and M. Misra. 1996. Understanding Neural Networks as Statistical Tools.
The American Statistician 50 (4): 284–93.
Wilcox, A., N.H. Perry, N.D. Boatman, and K. Chaney. 2000. Factors Affecting the
Yield of Winter Cereals in Crop Margins. Journal of Agricultural Science 135 (4):
Woodard, J.D. 2016. Data Science and Management for Large Scale Empirical
Applications in Agricultural and Applied Economics Research. Applied Economic
Perspectives and Policy 38 (3): 373–88.
Woodard, J.D., and L.J. Verteramo-Chiu. 2017. Efficiency Impacts of Utilizing Soil
Data in the Pricing of the Federal Crop Insurance Program. American Journal of
Agricultural Economics 99 (3): 757–72.
Zhou, Y., M.A. Johansen, and J.A.D. Aston. 2012. Bayesian Model Comparison Via
Path-sampling Sequential Monte Carlo. Proceedings of the IEEE Workshop on
Statistical Signal Processing.
Applied Economic Perspectives and Policy
Downloaded from
by Kansas State University Libraries user
on 21 February 2018
... Notwithstanding the benefits of data-enabled collaborations, several studies point out the challenges that they may present. For instance, in their analysis of inter-firm data partnerships in the agricultural sector, Coble et al. (2018) draw attention to the issues of data ownership and integration as well as to the inherent disadvantages of collaboration partners located in rural areas with poor technology infrastructure. In the context of sustainability development in smart cities, Pincetl and Newell (2017) discuss the issues of data quality and completeness, which occur when partners use the data for purposes that differ from those for which the data were originally collected. ...
... Second, organisations have been criticised for obtaining data-driven efficiency gains at the expense of continuous location-based tracking and the pervasive performance control of their workforce (La Torre et al., 2018). Third, while there is a growing expectation from companies to become more data-driven, it is not a level playing field for smaller firms in rural areas or developing countries, which suffer from undeveloped internet infrastructure and cannot afford the "luxury" of BDA without a concrete understanding of the return on BDA investment (Coble et al., 2018;Fawcett and Waller, 2014;Grover et al., 2018). ...
Purpose This paper aims to identify, synthesise and critically examine the extant academic research on the relation between big data analytics (BDA), corporate accountability and non-financial disclosure (NFD) across several disciplines. Design/methodology/approach This paper uses a structured literature review methodology and applies “insight-critique-transformative redefinition” framework to interpret the findings, develop critique and formulate future research directions. Findings This paper identifies and critically examines 12 research themes across four macro categories. The insights presented in this paper indicate that the nature of the relationship between BDA and accountability depends on whether an organisation considers BDA as a value creation instrument or as a revenue generation source. This paper discusses how NFD can effectively increase corporate accountability for ethical, social and environmental consequences of BDA. Practical implications This paper presents the results of a structured literature review exploring the state-of-the-art of academic research on the relation between BDA, NFD and corporate accountability. This paper uses a systematic approach, to provide an exhaustive analysis of the phenomenon with rigorous and reproducible research criteria. This paper also presents a series of actionable insights of how corporate accountability for the use of big data and algorithmic decision-making can be enhanced. Social implications This paper discusses how NFD can reduce negative social and environmental impact stemming from the corporate use of BDA. Originality/value To the best of the authors’ knowledge, this paper is the first one to provide a comprehensive synthesis of academic literature, identify research gaps and outline a prospective research agenda on the implications of big data technologies for NFD and corporate accountability along social, environmental and ethical dimensions.
... It comes thus as no surprise that machine learning can offer new possibilities to analyse big data in agriculture (30,31,32,33,34,35,36,37). Indeed, the list of possible applications of machine learning to agriculture is potentially vast, especially when considered in combination with other research domains, such as climate change (38,39,40,41). ...
... This is where the power of machine learning can be explored to its full potential (36,43). Not only can biophysical variables such as microclimate effects, soil structure and quality be included in the analysis, but also socio-economic variables, such as local land use, urban-farm water accessibility, water-related policies, farm size, demographic data and access to markets can also be included, allowing analysis at every step of the agricultural value chain (32,44,45). The resulting datasets, just for one farm, could encompass millions of data point combinations. ...
Full-text available
Machine learning and statistical modeling methods were used to analyze the impact of climate change on financial wellbeing of fruit farmers in Tunisia and Chile. The analysis was based on face to face interviews with 801 farmers. Three research questions were investigated. First, whether climate change impacts had an effect on how well the farm was doing financially. Second, if climate change was not influential, what factors were important for predicting financial wellbeing of the farm. And third, ascertain whether observed effects on the financial wellbeing of the farm were a result of interactions between predictor variables. This is the first report directly comparing climate change with other factors potentially impacting financial wellbeing of farms. Certain climate change factors, namely increases in temperature and reductions in precipitation, can regionally impact self-perceived financial wellbeing of fruit farmers. Specifically, increases in temperature and reduction in precipitation can have a measurable negative impact on the financial wellbeing of farms in Chile. This effect is less pronounced in Tunisia. Climate impact differences were observed within Chile but not in Tunisia. However, climate change is only of minor importance for predicting farm financial wellbeing, especially for farms already doing financially well. Factors that are more important, mainly in Tunisia, included trust in information sources and prior farm ownership. Other important factors include farm size, water management systems used and diversity of fruit crops grown. Moreover, some of the important factors identified differed between farms doing and not doing well financially. Interactions between factors may improve or worsen farm financial wellbeing.
... With the increased potential to store and process data quickly, innovation and productivity can be increased (Wolfert et al., 2017;Krisnawijaya et al., 2022;Weersink et al., 2018;Coble et al. 2018). Sensor technology, the internet of things, and cloud computing real-time data can be used to support the various agriculture domains such as soil management, pest and weed management, disease management, crop management and water-use management (Basnet & Bang, 2018. ...
Full-text available
Innovations in digital technologies, especially in artificial intelligence (AI), promise substantial benefits to the agricultural sector. Agriculture is increasingly expected to ensure food security and food safety while at the same time considering the environmental aspects. AI in the agricultural sector offers the potential to feed a continuously growing global population and still contribute to achieving the UN's Sustainable Development Goals (SDGs). Despite its promises , the use of AI in agriculture is still limited. We argue that the slow uptake is due to the diverse ways in which AI impacts the agri-food industry, due to the diversity of foods, supply chains, climates, and land in the agricultural sector. We propose that this is also exacerbated by ethical concerns arising from AI use, the varying degrees of technological development and skills, and the economic impacts of agricultural AI. A literature review of multiple disciplines in agricultural AI (economic, environmental, social, ethical, and technological) and a focus group of experts. AI-powered systems in agriculture raise various sets of concerns in multiple disciplines that need to be aligned to provide sustainable AI solutions for the agriculture domain. Our research proposes that it is important to adopt an interdisciplinary approach when developing AI in agriculture. AI in agriculture should be developed by interdisciplinary collaboration because it has a greater chance to be robust, economically-valuable and socially desirable, which may lead to greater acceptance and trust among farmers when using it.
... On the other hand, because of the cultivated land quality's ambiguous meaning, the fundamental connection between the object of the assessment and the aim of the evaluation, real-world application scenarios, etc., the features and appropriate technological methods of the data on the cultivated land quality are imperfect and out of sync. Big data technology is a significant new path for assessment research in the fields of ecology, geography, and other related disciplines [32][33][34][35][36]. A study framework for cultivated land quality analysis based on the perspective of data must therefore be established in order to fully realize the value of data. ...
Full-text available
As cultivated land quality has been paid more and more scientific attention, its connotation generalization and cognitive bias are widespread, bringing many challenges to the investigation and evaluation of regional cultivated land quality and its data analysis and mining. Establishing a systematic and interdisciplinary cognitive approach to cultivated land quality is urgent and necessary. Therefore, we explored and developed a conceptual framework of the model for the cultivated land quality analysis from the data perspective, including cultivated land quality ontology, mapping, correlation, and decision models. We identified the primary content of cultivated land quality perceptions and four cognitive mechanisms. We built vital technologies, such as the collaborative perception of the quality of cultivated land, intelligent treatment, diagnostic evaluation, and simulation prediction. Applying this analysis framework, we sorted out the frequency of indicators that characterize the function of cultivated land according to the literature in recent years and have built the cognitive system of cultivated land quality in the black soil region of Northeast China. The system’s central component was production capacity and it had three components: a foundation, a guarantee, and an effect. The black soil region cultivated land quality evaluation system has seven purposes involving 20–31 key indicators: production supply, threat control, farmland infrastructure regulation, cultivated land ecological maintenance, economics, social culture, and environmental protection. In various application contexts, the system had many critical supporting technologies. The results demonstrate that the framework has strong adaptability, efficiency, and scalability, which might offer a theoretical direction for further studies on the evaluation of the quality of cultivated land in the area. The analysis framework established in this study is helpful to deepen the understanding of cultivated land quality systems from the perspective of big data. Taking the big data of cultivated land quality as the driving force, combined with the technical methods of cultivated land quality analysis, the evaluation results of cultivated land quality under different scenarios and different objectives are optimized. In addition, the framework can serve the practice of farmland management and engineering improvement, adapt to the management needs of different objects and different scales, and achieve the combination of theory and practice.
Digital transformation has become one of the main trends of the modern world, supporting changes in all spheres of life. Digitalization allows easier access to data, simplifies reporting processes, reduces bureaucracy and corruption in public administration and improves the efficiency of business processes. With Russia’s invasion of Ukraine, the priorities and goals of Ukraine’s digital transformation have changed significantly. People and the state faced new challenges, there was a need to quickly adapt to the new conditions in all spheres of activity. This article examines the processes of digitalization in Ukraine in the pre-war period and the development of this trend in various spheres of activity during wartime - education, security, public services for both the population and businesses, the agricultural sector, banking, financial sector and others. The prospects for digital transformation in the post-war period and the necessary steps are identified.KeywordsDigitalizationGovernmentBusiness processesWarUkraine
We discuss a little‐known but highly successful approach to innovation and data governance observed in the U.S. dairy sector. The National Cooperative Dairy Herd Improvement Program (NCDHIP) is a century‐old institution that coordinates farm data collection to support research on dairy cattle breeding and genetic selection. After discussing the program's history, we discuss how its evolution can inform data governance in agriculture today. We identify three key attributes that make the NCDHIP a successful model in agriculture: overcoming free‐riding with member benefits to data providers, ensuring data interoperability with uniform data standards, and controlling data access and use with cooperative governance.
Digitization of agricultural production processes is a prerequisite for a successful strategy for the development of the agricultural sector. The process of implementing Data Science methods and algorithms into business processes has already begun and is gradually accelerating. However, it is far from clear how Data Science could make agricultural more effective and profitable. Availability of data on each agricultural object combined with well-developed algorithms and methods. Data Science allows to build mathematical models of the agricultural sector and farms, accurately calculate the algorithm of actions and predict the result. AgroTech, AgroFinTech, FoodTech and other businesses are actively developing in the technologically underdeveloped agricultural sector. This makes the agricultural sector attractive for investment and opens up new prospects for attracting financial resources. We show that Data Science tools in agricultural does not simply replace analogue technologies used in traditional agriculture. It offers new options for agriculture, including opportunities for more effective tools for agriculture producers and management. It allows for in-depth analysis and understanding of business processes, facilitates the structuring of problems, systematization of the agricultural sector as whole and individual farms. However, the use of Data Science tools is accompanied by certain risks, which also discussed in the paper.
The emergence of precision farming technologies has increased the amount and detail of farming data collected by producers. Data increases farm profitability by complementing digitally connected equipment and improving on‐farm decision making. The value generated by farm data may be capitalized into the underlying farmland asset, potentially raising sale and rental prices. However, the absence of clear property rights over farmland and farm operating practice data limits the ability to capture this value. We explore the issue of farm data through a property rights and transaction costs lens, and propose a conceptual framework of farm data valuation to identify conditions under which landowners and tenant‐operators can engage in mutually beneficial negotiation over farm data. We conclude that establishing property rights to farm data within the farmland lease can facilitate welfare‐improving exchange, allowing farm data records to be allocated to their highest valued use.
Full-text available
Precision agriculture (PA) has been commercially available for decades, however only specific technologies have been readily adopted. The overall goal of this study was to provide information of the historical changes (from 2000 to 2016), current status of PA utilization, and sales expectations in the next time period. Within this overarching objective, specific goals included 1) determining the specific technologies that farmers adopt and 2) estimating the probability of transitioning from one bundle of PA technologies to another. The three information-intensive technologies included: 1) yield monitor (YM) with or without GNSS 2) variable rate (VR) application of inputs, and 3) precision soil sampling (PSS). Combinations of these three technologies in addition to a possible “no technology adopted” response resulted in eight categories of PA technology bundles. Each year, farms were classified as having one of these eight possible bundles of PA technology. Adoption of PA technologies has increased over time, with the use of only YMs and the bundle of all three PA technologies (YM, PSS, and VR) as the two primary bundles being adopted. When only VR was adopted, there was a 47% probability that the farm would add a YM by next year. When a farm used YM, VR, and PSS, there was a 99% probability that a farm would continue using the bundle in the following year. The results are useful for farmers, extension professionals, and policymakers to understand prior adoption paths for bundles of PA technology. Future steps can connect this database on adoption of PA technology with farm meta-descriptors such as acreage, type of crop, rotation, other relevant management practices, and financial variables so to better understand how farmers are integrating technologies into their farming operations. Keywords: Adoption, Information-intensive, Markov chain, Precision agriculture, Sequential, Site specific, Soil sampling, Transition probability, Variable rate, Yield monitor.
Full-text available
The promise of " big data " has been praised by the popular media. Concepts and impediments surrounding big data are discussed relative to both the current status and anticipated direction of the industry. Rural property professionals, such as farm managers and rural appraisers, have an opportunity to position themselves and their clients to make effective use of big data. Topics relevant to big data in agriculture include farmland values, lease arrangements, data ownership, data as an asset and its valuation, and the ramifications of wireless connectivity. The challenges that rural property professionals may encounter when integrating big data into their portfolio of services are described.
Full-text available
Precision agricultural (PA) technologies can decrease input costs by providing farmers with more detailed information and application control, but adoption has been sluggish, especially for variable-rate technologies (VRT). Is it possible that farmers have difficulty realizing these cost savings? Combinations of PA technologies are considered as complements, testing several patterns of PA technology adoption that may show different levels of costs. The USDA's Agricultural Resource Management Survey of corn producers is used to estimate a treatment-effects model that allows for selection bias. VRT contributes additional production cost savings when added to soil mapping, but not when done with yield mapping alone.
Full-text available
Data mining applications in agriculture is a relatively new approach for forecasting / predicting of agricultural crop/animal management. In the present study an attempt has been made to study the influence of climatic parameters on soybean productivity using decision tree induction technique. The findings of Decision tree were framed into different rules for better understanding by the end users. The study findings will help the researchers, policy makers and farmers in predicting/forecasting the crop yield in advance for market dynamics.
Research priorities for the U.S. farm economy include increasing the productivity and cost efficiency on current land resources while understanding production agriculture across the globe. Providing unbiased objective analysis to policymakers with regard to commodity programs, insurance markets, agricultural credit, and the production of bioenergy are important issues that directly affect not only the U.S. farm economy but other agricultural regions. The ability to manage risk, the increasing complexity of farm operations, the ability of the U.S. farm sector to be nimble to changes in individual and societal preferences, and the efficient discovery of information through efficient markets offer a wealth of research opportunities. © The Author(s) 2018. Published by Oxford University Press on behalf of the Agricultural and Applied Economics Association. All rights reserved.
This article explores farmers’ use of computers for farm business purposes by analyzing the computer access and usage decisions of almost 900 Irish farmers. The findings reveal that computer ownership is influenced by a combination of farm business and household characteristics, but that farm business characteristics dominate if the computer is used for the business. More detailed findings suggest that computers are most likely to be used on larger dairy farms, while farmers who are living alone have limited access to computers. Public policy needs to support the adoption of information technologies, and the role of computers in tackling social isolation and providing farm information is critically discussed.
The increased availability of high resolution data and computing power has spurred enormous interest in “Big Data”. While analysts typically source data from a wide variety of agencies, even within the USDA no comprehensive data warehouse exists with which researchers can interact. This leads to massive duplication in efforts, inefficient data sourcing, and great potential for error. The purpose of this article is to provide a brief overview of this state of affairs within the community. An overview of a prototype warehouse is also provided, as are thoughts on future directions.