ArticlePDF Available

Regression and correlation analysis of grid soil data versus cell spatial data

Authors:

Abstract and Figures

One of the many issues in precision agriculture is the correlation between yield maps and the yield factors involved. Low correlations have lead to questions on how to make fertilizers prescriptions, for instance, based on yield and soil sampling information. Some software run correlations and regressions relating each cell of the field generated from interpolation. Another way to analyze is by correlating the yield data only from grid points with the information that comes from the laboratory, but a yield has to be attributed to each point as the yield data comes from a set of spot information not related to the grid soil sampling. In this study some attempts were made to represent the yield for each soil sample point and to run the analysis in the two ways with data from one field. It was sampled on three depths (0-5, 0-10 and 0-20 cm) in a 30 m regular grid. Both cases resulted in low coefficients of determination (R 2).
Content may be subject to copyright.
REGRESSION AND CORRELATION ANALYSIS OF GRID SOIL DATA VERSUS
CELL SPATIAL DATA1
J.P. MOLIN, H.T.Z. COUTO
University of São Paulo, Piracicaba, Brazil
E-mail: jpmoli@esalq.usp.br, htzcouto@ciagri.usp.br
L.M. GIMENEZ, V. PAULETTI, R. MOLIN
ABC Foundation, Castro, Brazil
E-mail: fabc@fundacaoabc.com.br
S.R. VIEIRA
Instituto Agronômico, Campinas, Brazil.
E-mail: sidney@cec.iac.br
ABSTRACT
One of the many issues in precision agriculture is the correlation between yield maps and the
yield factors involved. Low correlations have lead to questions on how to make fertilizers
prescriptions, for instance, based on yield and soil sampling information. Some software run
correlations and regressions relating each cell of the field generated from interpolation. Another
way to analyze is by correlating the yield data only from grid points with the information that
comes from the laboratory, but a yield has to be attributed to each point as the yield data comes
from a set of spot information not related to the grid soil sampling. In this study some attempts
were made to represent the yield for each soil sample point and to run the analysis in the two
ways with data from one field. It was sampled on three depths (0-5, 0-10 and 0-20 cm) in a 30 m
regular grid. Both cases resulted in low coefficients of determination (R2).
INTRODUCTION
The complexity of explaining yield variability using specific factors is well known. Recent
literature presents many examples where yield factors have been listed based on the correlation
between yield and soil fertility parameters. Acock and Pachepsky (1997) state that it is not
possible to explain yield changes in the field by a only few limiting factors. They discuss the use
of mechanistic crop models as a powerful tool for helping in understanding the complex
soil/plant/atmosphere system. Nevertheless, because the plant behavior is, as yet, not completely
understood, these models still have limitations and need improvement.
Regression analysis and other techniques relating yield to the factors that have more potential for
causing yield variability have been frequently used. Drummond et al. (1995) presented a study
where they used different strategies of regression analysis on the kriging-interpolated values on
10 m spatialized cells. Clay et al. (1998) presented this same idea. They evaluated the impact of
grid distance on spatial analysis and profitability, correlating yield and its limiting factors to soil
fertility.
1 This work is being partially supported by FAPESP (São Paulo State Research Foundation).
An interesting aspect of correlating yield and soil fertility and other factors is the procedure
generally used. With the use of yield monitors data, geoestatistics and GIS, spatialized cell data is
frequently used to run correlations and regressions. Mallarino et al. (1999) collected grain yields
by hand around soil sampling points instead of using a monitor and ran simple correlations and
multiple regressions between the factors. In the first case, estimations take place and in the
second, the yield data had to be manipulated in order to estimate the yield that represents some
neighborhood of the point where soil samples were collected. Some specialized software used in
data collection and analysis for precision agriculture run correlations and regressions relating
each cell of the field. Those cells result from data already interpolated and based on assumptions
related to geostatistics criteria. Another way to analyze it is by correlating crop yield with the
information that comes from the laboratory for each point in the field. The difficulty is to attribute
a yield to each point as it comes from a set of spot information collected by the yield monitor and
not related to the grid soil sampling. This paper deals with the two different scenarios; multiple
regression analyses were run with grain yield and possible causes of yield variability related to
soil fertility using data from one field being cultivated with no-till for years.
METHODS
The study is part of a project involving local farmers, a private organization and a University in
Paraná State, South of Brazil (long. 490 55’’W, lat. 240 51 S). A 23.6 ha field is being monitored
and has been cultivated with no-till, rotating corn, soybeans and winter crops for more than 17
years. Soybean yield data were collected in 1999 using a commercial yield monitor on a combine.
Before planting the soybeans in 1999, grid soil samples were taken at 0-5, 0-10 and 0-20 cm
depth on a 30 m grid totaling 225 samples for each depth. The soil samples were sent to the
laboratory for soil fertility analysis.
Fertility data were analyzed using geoestatistical analysis procedure and the best model for each
semivariogram was used for kriging 10 m cells, using SSToolbox software (SST Development
Group, Inc.). This process resulted in a total of 2360 common cells for yield (using inverse
distance method) and soil chemical variables. This constituted the first scenario for regression
analysis.
Another approach to analysis that correlates yields and soil fertility factors was based on grid
sampling results with a population of 225 points for each depth. As the yield data do not have the
same coordinates as the soil samples, a procedure was developed using spreadsheet tools to
define a representative yield for each of those spots. The procedure allowed selecting the search
radius around the point, so circles of 5, 10 and 15 m radius were tested, resulting in an average
yield for each sampling location. This was the second scenario for the regression analysis.
Multiple regression analysis was run using SAS with the stepwise procedure. Five adjusting
models were tested for yield (linear, logarithmic, quadratic, inverse and square root) involving 10
factors from soil fertility laboratory tests: phosphorus (P), organic mater (OM), pH, hydrogen
plus aluminum (H+Al), potash (K), calcium (Ca), magnesium (Mg), Base Saturation (V%), CEC
and sum of bases – Ca2+, Mg2+ and K+ (SB). A probability of 5% was used for the regression
analyses and the results were expressed by the corresponding coefficients of determination (R2).
The whole analysis was run for each depth for both, grid data and cell data.
RESULTS AND DISCUSSION
All the regression results are presented by their R2 values when significant at 5% level. The
results related to cell data are presented on Table 1 and those related to the grid data are
summarized on Table 2.
Table 1 – Multiple regression results with variables involved and coefficients of determination
(R2) for yield versus soil fertility factors for each model and depth for cell data analysis.
Model N. of Var. R2 Variables
Depth: 0 – 5 cm
Linear 6 0,120 P, OM, pH, H+Al, Al, Mg
Logaritmic 8 0,147 P, OM, pH, H+Al, Al, K, Ca, CEC
Quadratic 8 0,210 P, H+Al, Al, K, Ca, Mg, SB, V%
Inverse 10 0,160 P, OM, pH, H+Al, K, Ca, Mg, SB, CEC, V%
Root square 9 0,179 P, pH, H+Al, Al, K, Ca, Mg, SB, CEC
Depth: 0 – 10 cm
Linear 5 0,156 P, pH, H+Al, Al, CEC
Logaritmic 3 0,193 pH, H+Al, Mg
Quadratic 10 0,228 P, OM, pH, H+Al, Al, Ca, Mg, SB, CEC, V%
Inverse 3 0,191 H+Al, Mg, CEC
Root square 3 0,158 pH, Mg, CEC
Depth: 0 – 20 cm
Linear 4 0,232 P, H+Al, Al, K
Logaritmic 6 0,247 pH, H+Al, Al, K, SB, CEC
Quadratic 8 0,284 P, OM, H+Al, K, Ca, Mg, SB, CEC
Inverse 6 0,255 H+Al, Al, K, SB, CEC, V%
Root square 6 0,244 P, pH, H+Al, Al, K, SB
Table 2 – Multiple regression results with variables involved and coefficients of determination
(R2) for yield versus soil fertility factors for each model and depth for grid data analysis.
Model N. of Var. R2 Variables
Depth: 0 – 5 cm; Search radius: 5 m
Linear NS*
Logaritmic NS
Quadratic NS
Inverse NS
Root square NS
Depth: 0 – 5 cm; Search radius: 10 m
Linear NS
Logaritmic NS
Quadratic NS
Inverse 1 0,065 K
Root square NS
Depth: 0 – 5 cm; Search radius: 15 m
Linear NS
Logaritmic NS
Quadratic NS
Inverse NS
Root square NS
Depth: 0 – 10 cm; Search radius: 5 m
Linear 2 0,159 pH, H+Al
Logaritmic 2 0,164 pH, H+Al
Quadratic 1 0,100 OM
Inverse 4 0,244 pH, H+Al, Al, K
Root square 2 0,163 pH, H+Al
Depth: 0 – 10 cm; Search radius: 10 m
Linear 1 0,113 CEC
Logaritmic 3 0,223 pH, Mg, CEC
Quadratic 3 0,196 pH, Mg, CEC
Inverse 2 0,172 K, CEC
Root square 3 0,222 pH, Mg, CEC
Depth: 0 – 10 cm; Search radius: 15 m
Linear 2 0,213 Mg, CEC
Logaritmic 3 0,280 pH, Mg, CEC
Quadratic 3 0,251 pH, Mg, CEC
Inverse 3 0,263 pH, Mg, CEC
Root square 3 0,281 pH, Mg, CEC
Depth: 0 – 20 cm; Search radius: 5 m
Linear 2 0,134 OM, CEC
Logaritmic NS
Quadratic 2 0,216 pH, V%
Inverse 3 0,205 OM, K, CEC
Root square 3 0,172 OM, K, CEC
Depth: 0 – 20 cm; Search radius: 10 m
Linear 3 0,279 pH, K, V%
Logaritmic NS
Quadratic 3 0,248 pH, K, V%
Inverse 3 0,271 OM, K, CEC
Root square 3 0,290 pH, K, V%
Depth: 0 – 20 cm; Search radius: 15 m
Linear 4 0,406 pH, Al, K, Ca
Logaritmic 4 0,415 pH, Al, K, Ca
Quadratic 5 0,417 pH, Al, Mg, CEC, V%
Inverse 3 0,336 pH, K, CEC
Root square 4 0,415 pH, Al, K, Ca
* NS – not significant at 5%
The set of data represented by cells resulted in more elements involved in the regression
equations but the coefficients of determination (R2) resulted in poor values, showing that even
with several factors, yield limitations cannot be attributed to soil fertility in a simple way as it is,
in general, believed. The regressions based on grid data resulted in higher R2 values and with less
factors involved. It indicates that when using cell data it causes attenuation on spatial variability.
Looking at the R2 values for grid data we see numbers increasing as the searching radius and
depth increases. It means that the yield and the superficial soil crust also have a local variability
not well explained.
As the yield averaging area increases it tends to be more related to some of the soil fertility
limiting factors represented by a composite sample taken in the center of an area being
represented by that yield. In one way, spatial variability of yield is attenuated by increasing the
searching radius, on the other the estimation of soil parameters has uncertainties due to
interpolation. Coefficients of determination increasing with the increase of searching radius also
may be related to the fact that yield is strongly spatially dependent or local errors on yield
readings can cause distortions.
It is known that soil sampling depth is an important issue, especially on no-till. As the depth
increased, the results of regressions were higher. On the grid data, the 5 cm depth was almost not
significant. It shows that the spatial variability on the top soil is higher and more work has to be
done on soil sampling depth and its criteria.
In these cases pH, Al, K, Ca consistently appear to explain part of the variability on yield.
Nevertheless results were not consistent and it corroborates with the idea of Acock and
Pachepsky (1997) that it is not correct try to explain yield variability only by a few factors,
normally restricted to soil fertility.
CONCLUSIONS
- Increasing the size of yield sampling area increased the coefficients of determination,
meaning that yield has its local variability dependent of sampling area.
- Increasing soil sampling depth also increased the coefficients of determination, showing that
soil local variability is related to depth, especially in no-till and it deserves more studies.
- Coefficients of determination resulted in poor values, showing that even with several factors,
yield limitations cannot be attributed to soil fertility in a simple way as it is, in general,
believed.
- The regressions based on grid data resulted in higher R2 values and with less factors involved,
indicating the attenuation caused by interpolations on the cell data.
REFERENCES
Acock B. and Pachepsky, Ya. (1997) Holes in precision farming: mechanistic crop models.
Precision Agriculture ´97. Stafford, J. (Ed.), pp.397-404.
Clay, D.E., Carlson, C.G., Chang, J., Clay, S.A., Malo, D.D., Ellsbury, M.M. and Lee, J. (1998)
Systematic evaluation of precision farming soil sampling requirements. In: Precision
Agriculture, Robert, P.C., Rust, R.H., Larson, W.E. (Eds.), Minneapolis, MN, USA,
pp.253-265.
Drummond S.T., Sudduth, K.A. and Birrell, S.J. (1995) Analysis and correlation methods from
spatial data, ASAE Paper 95-1335, ASAE, St. Joseph, MI, USA.
Mallarino, A.P., Oyarzabal, E.S. and Hinz, P.N. (1999) Interpreting within-field relationships
between crop yields and soil and plant variables using factor analysis. Precision
Agriculture, 1 (1), pp. 15-25.
... Homogeneous areas within a field are difficult to identify due to the complex combination of factors that may influence crop yield. Correlations between yield (dependent variable) and production factors (e.g., soil fertility) are quite low (Molin et al., 2001); and attempts to explain local phenomena with few factors have been unsuccessful. ...
... Several methods have been proposed to define management zones, among them, the use of topography, aerial photographs, crop canopy imaging, and remote sensing, in addition to crop yield mapping, which is currently most known among these sources of information (Molin, 2001). ...
Article
Full-text available
Brazilian soils have natural high chemical variability; thus, apparent electrical conductivity (ECa) can assist interpretation of crop yield variations. We aimed to select soil chemical properties with the best linear and spatial correlations to explain ECa variation in the soil using a Profiler sensor (EMP-400). The study was carried out in Sidrolândia, MS, Brazil. We analyzed the following variables: electrical conductivity - EC (2, 7, and 15 kHz), organic matter, available K, base saturation, and cation exchange capacity (CEC). Soil EC a was measured with the aid of an all-terrain vehicle, which crossed the entire area in strips spaced at 0.45 m. Soil samples were collected at the 0-20 cm depth with a total of 36 samples within about 70 ha. Classical descriptive analysis was applied to each property via SAS software, and GS+ for spatial dependence analysis. The equipment was able to simultaneously detect ECa at the different frequencies. It was also possible to establish site-specific management zones through analysis of correlation with chemical properties. We observed that CEC was the property that had the best correlation with ECa at 15 kHz. © 2015, Revista Brasileira de Ciencia do Solo. All rights reserved.
... Segundo Molin et al. (2001) uma das dificuldades no estudo de agricultura de precisão é a correlação de dados de produção obtidos com monitor de produtividade com dados de atributos físicos e químicos do solo, os quais são obtidos em malhas de amostragem, devido à dificuldade de se estabelecer à produtividade em cada ponto onde foram coletados os atributos do solo. Mallarino et al. (1999) verificaram que o estabelecimento de raios em torno dos pontos amostrados para atributos do solo, é viável para obter os valores da produtividade da cultura em estudo e estabelecer estudos de correlações entre atributos do solo. ...
... No raio de 75 m todos os atributos estudados apresentaram maior correlação na profundidade de 0,1-0,2 m, isto se deve ao aumento da área do circulo para o cálculo da produtividade. MOLIN et al. (2001) estudando a regressão e correlação de atributos do solo com a produtividade de culturas anuais em diferentes raios de amostragem encontraram maior correlação para os atributos químicos em estudo na profundidade mais profunda. ...
Article
Full-text available
The yield mapping is essential part to establishment of the precision agriculture. Therefore, the objective of this work was to identify which better ray represents the yield of the sugar cane gotten with yield monitor of the point had been where collected the samples of soil and to effect correlation between yield data and attributes of the soil. This monitor allows the elaboration of a digital map that represents the yield surface of an 42 ha. In order to determine the soil attributes samples were collected in a regular grid of 100 x 100 m and in 0.0-0.1 m depth in the beginning of 2003/2004 seasons. To correlate the soil attributes with sugar cane yield, obtained with the monitor, circles of rays ranging from 10, 25, 50, and 75 m were created for each point of soil sampling. All the yield data contained inside of circle area a were summed in order to calculate the mean. This mean was attributed to each point of soil sampling. The altitude, organic matter and soil moisture presented the higher correlation with the sugar cane yield, more than 0.5. The highest correlations obtained was in the 75 m ray.
... The correlation analysis results between each element of the soil chemical fertility and yield for the two areas is presented in the Table 3. The correlation coefficients resulted in low values, confirming the tendency that has been observed in several works as, for example, the one of Acock and Pachepsky (1997) and Molin et al. (2001). However, those lower correlations still indicate some tendencies. ...
Article
Full-text available
Mechanized coffee harvesting started in Brazil in 1979 and precision agriculture techniques related to coffee only recently became an important issue. Industry and university started a project in 1999 aiming to monitor two experimental fields and test precision agriculture tools in coffee plantations. Soil sampling techniques for row permanent crops were studied, defining the position and number of subsamples. A regular grid soil sampling with 50m cells was taken and the data analyzed with geostatistical techniques to produce the soil fertility maps. A yield monitor that measures volume of grain was specifically developed for coffee harvesters and the first prototype was installed in a machine and used during the 2000 harvesting season. As grain maturity varies along each field, several georeferenced samples were collected, classified in different maturity stages, dried for determining grain moisture and processed. A correction factor was defined for each field and the volumes were converted to commercial grains. Yield maps were generated and correlated with soil fertility components resulting in low correlation coefficients but with important indications related to differences between the two fields. The investigation continues and the objective is to define zones with low yield variability for future specific management.
... No entanto, o correto entendimento das variáveis, associadas à variabilidade espacial da produtividade dentro de um talhão, ainda é deficiente. As correlações existentes entre a variável dependente (produtividade) e os fatores de produção, muitas vezes relacionados aos componentes da fertilidade do solo, normalmente são bastante baixos (MOLIN et al., 2001), e as tentativas de explicar os fenômenos locais com alguns poucos fatores são frustradas. ACOCK & PACHEPSKY (1997) consideram que as relações entre solo, planta e ambiente são melhor entendidas utilizando-se de ferramentas de modelagem. ...
Article
Full-text available
Soil electrical conductivity obtained by contact or induction sensors has been used as a variable that correlates with soil characteristics of interest on spatial variability analysis. This work reports the experience done in a 19 ha no-till field aiming to correlate soil electrical conductivity (EC) measured by an induction sensor and soil chemical fertility properties, soil texture, altitude, humidity, soil fertility and corn and soybean yield between 2000 and 2002. Yield data was obtained using a yield monitor; EC and the other parameters were sampled based on a regular grid. Data were submitted to a spatial analysis generating maps for correlation analysis among the factors. EC sensor was limited on its operation, especially for not having a data logger, and requiring frequent calibrations. EC data were strongly spatially dependent but with inverse correlation between the two reading depths. Clay and water content resulted in positive correlations with EC in the shallow reading but at levels lower than expected.
... Because the yield maps were created using the data from 20 m 2 cells and the soil maps were generated from a 50 x 50 m grid, yield values were averaged within a 20 m radius from the soil sample location (Molin et al., 2001) using ArcGis software to correlate the sugarcane yield with soil attributes. ...
Article
Full-text available
Soils submitted to the same management system in places with little variation of landscape, manifest differentiated spatial variability of their attributes and crop yield. The aim of this work was to investigate the correlation between spatial variability of the soil attributes and sugarcane yield as a result of soil topography. To achieve this objective, a test area of 42 ha located at the São João Sugar Mill, in Araras, in the State of São Paulo, Brazil, was selected. Sugarcane yield was measured with a yield monitor fitted in a sugarcane harvester and GPS signal. A total of 170 soil samples were taken at regular 50 m grid, at a depth of 0 - 0.2 m. The area under study was divided into two sites based on topography. The following soil attributes were analysed: organic matter (OM) content, exchangeable potassium (K), calcium (Ca) and magnesium (Mg), their base saturation percentage (%BS), cation exchange capacity (CEC), pH, clay, silt, total sand and density. The use of landscape and geostatistics enable defining areas with different spatial variability in soil attributes and crop yield, providing the visualization and definition of homogeneous management zones. The largest spatial variability of soil attributes and sugarcane yield was in the lowest part of the field.
... No entanto, o correto entendimento das variáveis, associadas à variabilidade espacial da produtividade dentro de um talhão, ainda é deficiente. As correlações existentes entre a variável dependente (produtividade) e os fatores de produção, muitas vezes relacionados aos componentes da fertilidade do solo, normalmente são bastante baixos (MOLIN et al., 2001), e as tentativas de explicar os fenômenos locais com alguns poucos fatores são frustradas. ACOCK & PACHEPSKY (1997) consideram que as relações entre solo, planta e ambiente são melhor entendidas utilizando-se de ferramentas de modelagem. ...
Article
Full-text available
A condutividade elétrica (CE) do solo, obtida por contato ou por indução eletromagnética, tem sido utilizada como uma variável que se correlaciona com características do solo. Investigaram-se as relações existentes entre a CE medida com um sensor de indução eletromagnética, a granulometria do solo, umidade e fatores da fertilidade, a topografia do terreno e a produtividade de milho e soja, em uma lavoura de 19 ha, conduzida sob semeadura direta, entre 2000 e 2002. A produtividade das culturas foi obtida com monitor comercial, e a CE e os demais parâmetros foram obtidos por amostragem em grade regular. Os dados foram espacializados para a produção dos mapas que serviram para a análise de correlações entre os fatores. O sensor de condutividade elétrica mostrou-se limitado para a leitura na ausência de um coletor de dados, dificultando a coleta de dados com maior freqüência e densidade, além de exigir freqüentes calibrações. A CE apresentou forte dependência espacial, porém com correlação inversa entre as duas profundidades monitoradas. Os teores de argila e umidade do solo tiveram correlações positivas, porém abaixo do esperado, somente para as leituras superficiais de CE.
Article
Precision farming technologies allow for collection of large amounts of data from producers' fields. This study used grid-sampling techniques and factor analysis to investigate relationships between several site variables and corn (Zea mays L.) yields on five producer's fields. Sampling positions (112 to 258) were at the intersecting points of grid lines spaced 15 m. Variables measured were soil organic matter, pH, P, K, and NO3-N; residue cover; broadleaf and grass weed control; corn height at two dates, plant population, and grain yield. Correlation and multiple regression analyses showed that some variables were related to corn yields but the variables involved in significant relationships varied among fields. Moreover, the site variables often were highly correlated and the correlations varied among fields. In these conditions multiple regression would be an unreliable analysis tool. Study of covariance relationships among the variables using factor analysis showed that some of the variables measured could be grouped to indicate a number of underlying common factors influencing corn yields. These common factors were soil fertility, weed control, and conditions for early plant growth. Their importance in explaining the yield variability differed greatly among fields. Application of factor analysis to data generated by precision-farming technologies has potential for describing and understanding relationships between measured variables.
Analysis and correlation methods from spatial data, ASAE Paper
  • S T Drummond
  • K A Sudduth
  • S J Birrell
Drummond S.T., Sudduth, K.A. and Birrell, S.J. (1995) Analysis and correlation methods from spatial data, ASAE Paper 95-1335, ASAE, St. Joseph, MI, USA.
Holes in precision farming: mechanistic crop models. Precision Agriculturé97
  • B Acock
  • Ya Pachepsky
Acock B. and Pachepsky, Ya. (1997) Holes in precision farming: mechanistic crop models. Precision Agriculturé97. Stafford, J. (Ed.), pp.397-404.