Content uploaded by Jose Paulo Molin

Author content

All content in this area was uploaded by Jose Paulo Molin

Content may be subject to copyright.

REGRESSION AND CORRELATION ANALYSIS OF GRID SOIL DATA VERSUS

CELL SPATIAL DATA1

J.P. MOLIN, H.T.Z. COUTO

University of São Paulo, Piracicaba, Brazil

E-mail: jpmoli@esalq.usp.br, htzcouto@ciagri.usp.br

L.M. GIMENEZ, V. PAULETTI, R. MOLIN

ABC Foundation, Castro, Brazil

E-mail: fabc@fundacaoabc.com.br

S.R. VIEIRA

Instituto Agronômico, Campinas, Brazil.

E-mail: sidney@cec.iac.br

ABSTRACT

One of the many issues in precision agriculture is the correlation between yield maps and the

yield factors involved. Low correlations have lead to questions on how to make fertilizers

prescriptions, for instance, based on yield and soil sampling information. Some software run

correlations and regressions relating each cell of the field generated from interpolation. Another

way to analyze is by correlating the yield data only from grid points with the information that

comes from the laboratory, but a yield has to be attributed to each point as the yield data comes

from a set of spot information not related to the grid soil sampling. In this study some attempts

were made to represent the yield for each soil sample point and to run the analysis in the two

ways with data from one field. It was sampled on three depths (0-5, 0-10 and 0-20 cm) in a 30 m

regular grid. Both cases resulted in low coefficients of determination (R2).

INTRODUCTION

The complexity of explaining yield variability using specific factors is well known. Recent

literature presents many examples where yield factors have been listed based on the correlation

between yield and soil fertility parameters. Acock and Pachepsky (1997) state that it is not

possible to explain yield changes in the field by a only few limiting factors. They discuss the use

of mechanistic crop models as a powerful tool for helping in understanding the complex

soil/plant/atmosphere system. Nevertheless, because the plant behavior is, as yet, not completely

understood, these models still have limitations and need improvement.

Regression analysis and other techniques relating yield to the factors that have more potential for

causing yield variability have been frequently used. Drummond et al. (1995) presented a study

where they used different strategies of regression analysis on the kriging-interpolated values on

10 m spatialized cells. Clay et al. (1998) presented this same idea. They evaluated the impact of

grid distance on spatial analysis and profitability, correlating yield and its limiting factors to soil

fertility.

1 This work is being partially supported by FAPESP (São Paulo State Research Foundation).

An interesting aspect of correlating yield and soil fertility and other factors is the procedure

generally used. With the use of yield monitors data, geoestatistics and GIS, spatialized cell data is

frequently used to run correlations and regressions. Mallarino et al. (1999) collected grain yields

by hand around soil sampling points instead of using a monitor and ran simple correlations and

multiple regressions between the factors. In the first case, estimations take place and in the

second, the yield data had to be manipulated in order to estimate the yield that represents some

neighborhood of the point where soil samples were collected. Some specialized software used in

data collection and analysis for precision agriculture run correlations and regressions relating

each cell of the field. Those cells result from data already interpolated and based on assumptions

related to geostatistics criteria. Another way to analyze it is by correlating crop yield with the

information that comes from the laboratory for each point in the field. The difficulty is to attribute

a yield to each point as it comes from a set of spot information collected by the yield monitor and

not related to the grid soil sampling. This paper deals with the two different scenarios; multiple

regression analyses were run with grain yield and possible causes of yield variability related to

soil fertility using data from one field being cultivated with no-till for years.

METHODS

The study is part of a project involving local farmers, a private organization and a University in

Paraná State, South of Brazil (long. 490 55’’W, lat. 240 51 S). A 23.6 ha field is being monitored

and has been cultivated with no-till, rotating corn, soybeans and winter crops for more than 17

years. Soybean yield data were collected in 1999 using a commercial yield monitor on a combine.

Before planting the soybeans in 1999, grid soil samples were taken at 0-5, 0-10 and 0-20 cm

depth on a 30 m grid totaling 225 samples for each depth. The soil samples were sent to the

laboratory for soil fertility analysis.

Fertility data were analyzed using geoestatistical analysis procedure and the best model for each

semivariogram was used for kriging 10 m cells, using SSToolbox software (SST Development

Group, Inc.). This process resulted in a total of 2360 common cells for yield (using inverse

distance method) and soil chemical variables. This constituted the first scenario for regression

analysis.

Another approach to analysis that correlates yields and soil fertility factors was based on grid

sampling results with a population of 225 points for each depth. As the yield data do not have the

same coordinates as the soil samples, a procedure was developed using spreadsheet tools to

define a representative yield for each of those spots. The procedure allowed selecting the search

radius around the point, so circles of 5, 10 and 15 m radius were tested, resulting in an average

yield for each sampling location. This was the second scenario for the regression analysis.

Multiple regression analysis was run using SAS with the stepwise procedure. Five adjusting

models were tested for yield (linear, logarithmic, quadratic, inverse and square root) involving 10

factors from soil fertility laboratory tests: phosphorus (P), organic mater (OM), pH, hydrogen

plus aluminum (H+Al), potash (K), calcium (Ca), magnesium (Mg), Base Saturation (V%), CEC

and sum of bases – Ca2+, Mg2+ and K+ (SB). A probability of 5% was used for the regression

analyses and the results were expressed by the corresponding coefficients of determination (R2).

The whole analysis was run for each depth for both, grid data and cell data.

RESULTS AND DISCUSSION

All the regression results are presented by their R2 values when significant at 5% level. The

results related to cell data are presented on Table 1 and those related to the grid data are

summarized on Table 2.

Table 1 – Multiple regression results with variables involved and coefficients of determination

(R2) for yield versus soil fertility factors for each model and depth for cell data analysis.

Model N. of Var. R2 Variables

Depth: 0 – 5 cm

Linear 6 0,120 P, OM, pH, H+Al, Al, Mg

Logaritmic 8 0,147 P, OM, pH, H+Al, Al, K, Ca, CEC

Quadratic 8 0,210 P, H+Al, Al, K, Ca, Mg, SB, V%

Inverse 10 0,160 P, OM, pH, H+Al, K, Ca, Mg, SB, CEC, V%

Root square 9 0,179 P, pH, H+Al, Al, K, Ca, Mg, SB, CEC

Depth: 0 – 10 cm

Linear 5 0,156 P, pH, H+Al, Al, CEC

Logaritmic 3 0,193 pH, H+Al, Mg

Quadratic 10 0,228 P, OM, pH, H+Al, Al, Ca, Mg, SB, CEC, V%

Inverse 3 0,191 H+Al, Mg, CEC

Root square 3 0,158 pH, Mg, CEC

Depth: 0 – 20 cm

Linear 4 0,232 P, H+Al, Al, K

Logaritmic 6 0,247 pH, H+Al, Al, K, SB, CEC

Quadratic 8 0,284 P, OM, H+Al, K, Ca, Mg, SB, CEC

Inverse 6 0,255 H+Al, Al, K, SB, CEC, V%

Root square 6 0,244 P, pH, H+Al, Al, K, SB

Table 2 – Multiple regression results with variables involved and coefficients of determination

(R2) for yield versus soil fertility factors for each model and depth for grid data analysis.

Model N. of Var. R2 Variables

Depth: 0 – 5 cm; Search radius: 5 m

Linear NS*

Logaritmic NS

Quadratic NS

Inverse NS

Root square NS

Depth: 0 – 5 cm; Search radius: 10 m

Linear NS

Logaritmic NS

Quadratic NS

Inverse 1 0,065 K

Root square NS

Depth: 0 – 5 cm; Search radius: 15 m

Linear NS

Logaritmic NS

Quadratic NS

Inverse NS

Root square NS

Depth: 0 – 10 cm; Search radius: 5 m

Linear 2 0,159 pH, H+Al

Logaritmic 2 0,164 pH, H+Al

Quadratic 1 0,100 OM

Inverse 4 0,244 pH, H+Al, Al, K

Root square 2 0,163 pH, H+Al

Depth: 0 – 10 cm; Search radius: 10 m

Linear 1 0,113 CEC

Logaritmic 3 0,223 pH, Mg, CEC

Quadratic 3 0,196 pH, Mg, CEC

Inverse 2 0,172 K, CEC

Root square 3 0,222 pH, Mg, CEC

Depth: 0 – 10 cm; Search radius: 15 m

Linear 2 0,213 Mg, CEC

Logaritmic 3 0,280 pH, Mg, CEC

Quadratic 3 0,251 pH, Mg, CEC

Inverse 3 0,263 pH, Mg, CEC

Root square 3 0,281 pH, Mg, CEC

Depth: 0 – 20 cm; Search radius: 5 m

Linear 2 0,134 OM, CEC

Logaritmic NS

Quadratic 2 0,216 pH, V%

Inverse 3 0,205 OM, K, CEC

Root square 3 0,172 OM, K, CEC

Depth: 0 – 20 cm; Search radius: 10 m

Linear 3 0,279 pH, K, V%

Logaritmic NS

Quadratic 3 0,248 pH, K, V%

Inverse 3 0,271 OM, K, CEC

Root square 3 0,290 pH, K, V%

Depth: 0 – 20 cm; Search radius: 15 m

Linear 4 0,406 pH, Al, K, Ca

Logaritmic 4 0,415 pH, Al, K, Ca

Quadratic 5 0,417 pH, Al, Mg, CEC, V%

Inverse 3 0,336 pH, K, CEC

Root square 4 0,415 pH, Al, K, Ca

* NS – not significant at 5%

The set of data represented by cells resulted in more elements involved in the regression

equations but the coefficients of determination (R2) resulted in poor values, showing that even

with several factors, yield limitations cannot be attributed to soil fertility in a simple way as it is,

in general, believed. The regressions based on grid data resulted in higher R2 values and with less

factors involved. It indicates that when using cell data it causes attenuation on spatial variability.

Looking at the R2 values for grid data we see numbers increasing as the searching radius and

depth increases. It means that the yield and the superficial soil crust also have a local variability

not well explained.

As the yield averaging area increases it tends to be more related to some of the soil fertility

limiting factors represented by a composite sample taken in the center of an area being

represented by that yield. In one way, spatial variability of yield is attenuated by increasing the

searching radius, on the other the estimation of soil parameters has uncertainties due to

interpolation. Coefficients of determination increasing with the increase of searching radius also

may be related to the fact that yield is strongly spatially dependent or local errors on yield

readings can cause distortions.

It is known that soil sampling depth is an important issue, especially on no-till. As the depth

increased, the results of regressions were higher. On the grid data, the 5 cm depth was almost not

significant. It shows that the spatial variability on the top soil is higher and more work has to be

done on soil sampling depth and its criteria.

In these cases pH, Al, K, Ca consistently appear to explain part of the variability on yield.

Nevertheless results were not consistent and it corroborates with the idea of Acock and

Pachepsky (1997) that it is not correct try to explain yield variability only by a few factors,

normally restricted to soil fertility.

CONCLUSIONS

- Increasing the size of yield sampling area increased the coefficients of determination,

meaning that yield has its local variability dependent of sampling area.

- Increasing soil sampling depth also increased the coefficients of determination, showing that

soil local variability is related to depth, especially in no-till and it deserves more studies.

- Coefficients of determination resulted in poor values, showing that even with several factors,

yield limitations cannot be attributed to soil fertility in a simple way as it is, in general,

believed.

- The regressions based on grid data resulted in higher R2 values and with less factors involved,

indicating the attenuation caused by interpolations on the cell data.

REFERENCES

Acock B. and Pachepsky, Ya. (1997) Holes in precision farming: mechanistic crop models.

Precision Agriculture ´97. Stafford, J. (Ed.), pp.397-404.

Clay, D.E., Carlson, C.G., Chang, J., Clay, S.A., Malo, D.D., Ellsbury, M.M. and Lee, J. (1998)

Systematic evaluation of precision farming soil sampling requirements. In: Precision

Agriculture, Robert, P.C., Rust, R.H., Larson, W.E. (Eds.), Minneapolis, MN, USA,

pp.253-265.

Drummond S.T., Sudduth, K.A. and Birrell, S.J. (1995) Analysis and correlation methods from

spatial data, ASAE Paper 95-1335, ASAE, St. Joseph, MI, USA.

Mallarino, A.P., Oyarzabal, E.S. and Hinz, P.N. (1999) Interpreting within-field relationships

between crop yields and soil and plant variables using factor analysis. Precision

Agriculture, 1 (1), pp. 15-25.