Content uploaded by Jaap C. Hanekamp
Author content
All content in this area was uploaded by Jaap C. Hanekamp on Jun 28, 2021
Content may be subject to copyright.
LETTER TO THE EDITOR
Response to van Pul, van Zanten and Wichink Kruit
Pul et al. offer a critique of our recent contribution in Soil
Use and Management. Regrettably, their critique contains a
number of confusing statements and unstated assumptions,
some of which we identify here.
Van Pul et al. state that the ‘statistical correlation between
hourly ammonia concentrations between regional
measurement stations is weak’, which, as we pointed out, is
true. They ascribe this weakness to ‘large variability in local
agricultural practice and in weather conditions’, which is
also true. They then say: ‘If data are aggregated to longer
timescales, correlations between stations clearly increase’,
which happens, but they claim this increased correlation is
‘due to the removal of noise at the hourly timescale’.
First, the ‘removal of noise’ logically assumes that their
statistical model is more real than the measured values. By
implication, the measured values seem to be viewed by Van Pul
et al. as a corrupted form of reality in the sense there is a ‘true’
value to which ‘noise’ is added, and that their statistical model
can discover this ‘true’ value. This clearly is false. Instead, the
sum total of all causes results in certain ammonia concentration
at time x and location y and recorded as nlg/m
3
. These
measured values are the experienced values; that is, what is
experienced is not some mysterious truth after which noise has
somehow been subtracted. The subsequent measured data are
the actual values. As we made clear in our study, no
assumptions are made by us concerning the data from the
Dutch National Air Quality Monitoring Network (LML).
This brings us to our second point: that data aggregation
increases correlation. What Van Pul et al. ignore, and we
pointed out (Hanekamp et al., 2017, p. 284), is that ‘smoothing
the data’ does not entail increased causation. As an experiment,
take any two sets of numbers, wholly unrelated to each other,
sampled at some finite timescale. Clearly, the correlation
between the two series will be quite small. Then, aggregate the
series by averaging (at least two) time points. The correlation
will, on average, increase (i.e. the absolute value). Aggregate
again at even coarser levels and the correlation increases once
more and may reach ‘significance’. Aggregation is a form of
statistical smoothing, and smoothing any two (or more) series
increases their correlation, on average, as Briggs (2016, p. 242–
243) formally proved. Thus, how much of the correlation in the
ammonia series Van Pul et al. report is due to the artefacts of
smoothing, and how much is capturing larger-scale causes is
unknown. Yet, Van Pul et al. simply assume, without proof,
the latter to be the dominant, or indeed the only source of the
increased correlation.
Van Pul et al. next state that ‘annual concentrations at the
various stations over the period 1993–2014 tend to be
remarkably similar when normalized for the station’s
concentration level.’ By implication, they were not similar
until after statistical manipulation. There are, of course,
models that help characterize uncertainty in heterogeneous
time series like this, such as hierarchical modelling, which
might suggest larger-scale correlations, but, of course, never
causes. But even without these, it is obvious from the data
themselves that local causes dominate larger, more regional
causes. Otherwise, there would no need for statistical
manipulation.
We reiterate that the mean is an inadequate value to
represent average exposure. Indeed, ecosystems do not
experience average exposures at all, but a continuum of
differing concentrations in time, of which the high values are
short-lived, as is clear from the LML-data. Indubitably, it
would be best not to use point measures at all for complex
phenomena and use distributions. If the interest were in total
deposition, then it would be better to form a real measure of
total deposition and not use a proxy that gives undue weight
to temporarily high values, which does not represent average
system behaviour (Galton, 1907).
Van Pul et al. claim ‘annual concentrations of ammonia ...
appear to be only piecewise linear’ over various times.
They then suggest that ‘most stations have the largest
positive trends somewhere after the year 2000.’ and indicate
ways of statistically handling these pieces. This we consider
to be their greatest mistake. First, the periods Van Pul et al.
chose were biased and not with respect to known causes.
Anyone can pick arbitrary periods. Second, statistical models
are not needed to say whether a trend was present or absent
in any set of data. All one needs is a definition of ‘trend’
and then just observe the data. Does ‘trend’ mean more ‘ups’
than ‘downs’, or greater averages at the end than the
beginning, or so many per cent more ‘ups’, or higher or
lower at the end? Many more definitions of trends can be
proffered. Even if the definition is the trend coefficient of
(say) a linear regression model, then whether that ‘trend’ is
‘significant’ carries no meaning. It is certainly not an
indication that some linear cause was in effect, as the data
themselves by definition never harbour causal relations
(Briggs, 2014, 2016). No: something causes each data point,
and if the causes are unknown, which clearly is the case
here, then, statistical models are only worthwhile for making
predictions of what has not happened yet (Briggs, 2016).
Put differently: What makes the time periods noted by
Van Pul et al. the ‘correct’ periods? They simply observed
these periods after all. Why not, say, 2001 to 2002, or 2002
to 2003? Was there a trend in any of these series?
©2017 British Society of Soil Science 609
Soil Use and Management, December 2017, 33, 609–610 doi: 10.1111/sum.12385
SoilUse
and Management
Absolutely, and everywhere: no data series with real-world
causes remain constant. If the period of one year is
considered too short: why? After all, something (actually
many things) causes the series to change values. By seemly
and biased choice of time periods, trends have been
ostensibly demonstrated by Van Pul et al. and, in so doing,
beg the question about the cause.
If some large-scale, multi-year and linear cause is suspected,
then we first have to guess or deduce how that cause will affect
the series at hand, then verify if the data are indicative of that
cause. That is not the case here. If some cause is suspected but
its nature cannot be deduced, then various statistical models
can be posited to make predictions. If a strictly linear forcing,
for example, is suspected, then probabilistic predictions can be
made. This van Pul et al. have never done; they simply assume
the large-scale, multi-year and linear cause, namely
agricultural activities, and once again beg the question.
Considering the above, Van Pul et al. did not produce a case
against our critique. We therefore invite Van Pul et al. to do
the concrete predictive work.
W. M. BRIGGS
1
,J.C.HANEKAMP
2,3
&M.CROK
4
1
Independent researcher,
2
University College Roosevelt, Middelburg, The Netherlands
3
Environmental Health Sciences, University of
Massachusetts, Amherst, MA, USA
4
Independent researcher
E-mails: matt@wmbriggs.com; j.hanekamp@ucr.nl;
hjaap@xs4all.nl; marcel.crok@gmail.com
References
Briggs, W.M. 2014. Common statistical fallacies. Journal of
American Physicians and Surgeons,19,58–60.
Briggs, W.M. 2016. Uncertainty –The soul of modeling. Probability
& Statistics, Springer, Switzerland.
Galton, F. 1907. One vote. One value. Nature,75, 414.
Hanekamp, J.C., Briggs, W.M. & Crok, M. 2017. A volatile
discourse –reviewing aspects of ammonia emissions, models and
atmospheric concentrations in The Netherlands. Soil Use and
Management,33, 276–287.
©2017 British Society of Soil Science, Soil Use and Management,33, 609–610
610 Letter to the Editor