Content uploaded by J.C. Hanekamp

Author content

All content in this area was uploaded by J.C. Hanekamp on Mar 23, 2021

Content may be subject to copyright.

Outlining A New Method To Quantify

Uncertainty In Nitrogen Critical Loads

William M. Briggs

matt@wmbriggs.com

New York City, NY

Jaap Hanekamp

j.hanekamp@ucr.nl; hjaap@xs4all.nl

University College Roosevelt, Middelburg, the Netherlands

Environmental Health Sciences

University of Massachusetts, Amherst, MA, USA

March 22, 2021

Abstract

We highlight deﬁciencies and improvements of a nitrogen critical

load model. An original model using logistic regression augmented

observations with ﬁctitious data. We replace that with actual data,

and show how to incorporate uncertainty in nitrogen measurement

into the modeling process. In the end, however, we show a basic

logistic regression model has irremovable deﬁciencies, giving positive

probability of harmful eﬀects of nitrogen even when no nitrogen is

present.

Keywords: critical loads, logistic regression, nitrogen, uncertainty

1 Introduction

Nitrogen loads play a decisive role in environmental policies. Critical loads

(CL) for nitrogen are usually deﬁned as follows: “A quantitative estimate of

1

an exposure to one or more pollutants below which signiﬁcant harmful eﬀects

on speciﬁed sensitive elements of the environment do not occur according to

present knowledge” [3].

As a means to fathom the actual modelling of CLs for nitrogen, we will

explore the methods of [1]. We will subsequently present an improvement of

their methods, and suggest new directions modeling should take.

Their method was as follows:

Collect data from a series of planned nitrogen experiments, in which

known amounts of nitrogen were added to background atmospheric amounts

placed on small plots. Various measures of growth of plant matter on these

plots were then measured. If the growth was higher on experimental than

control (no added nitrogen) plots, in a statistical sense, an adverse or harmful

eﬀect was noted. The amounts added were then scaled from the small plots

up to the hectare.

They produced the following table:

Figure 1: Table 2 from [1]. The explanation of the columns is in the text.

2

The “Reference” points to the papers from which the experimental data

was extracted, and where the experiments were performed at the “Location.”

The “Eﬀect” was whether the nitrogen-added plots had greater statistical

plant growth (1) than the control plots, or not (0), as noted in the references.

The “First level of N deposition where eﬀect was observed (kg ha−1yr−1)”

is the amount of nitrogen scaled up from the small-plot experiments, and

“Lowest estimated background deposition (kg ha−1yr−1)” was also gleaned

from the references.

After reviewing the papers referenced, there is room for diﬀerent interpre-

tations of the statistical results, leading to values diﬀerent to those presented

in the Table. For example, Banin [1] used the lowest background nitrogen

levels, but a good case can be made to pick the average value observed. How-

ever, for the purposes of our simple demonstration, we take all values here

as they are presented in the Table.

The next step was to estimate a function at which a known amount of

nitrogen, in (kg ha−1yr−1), corresponded to a probability of an harmful

eﬀect. The level of 20% was picked as a threshold requiring action. A

standard logistic regression was picked for this function.

Banin used only the nitrogen-added data in the model data and not the

background levels per se. This turns out to be a crucial point. Since there was

only one instance of an eﬀect = 0 in the added nitrogen column, the logistic

regression’s parameters could not be estimated (this is a standard statistical

limitation). To overcome this diﬃculty, the authors added 90 zeros to both

the eﬀect and levels of added nitrogen. This represents a sort of pseudo data.

In other words, the authors padded the 19 data points with 90 ﬁctitious

observations of nitrogen = 0, and 90 ﬁctitious observations of eﬀect = 0.

The authors gave no justiﬁcation for the number of ﬁctitious data points

used (why not 80? why not 100?). As for adding the ﬁctitious data itself, they

surmised that no nitrogen would incur no deﬁned harmful eﬀect of nitrogen,

which is surely true.

2 Suggested Modeling Approach

The data need not be padded with zeros. In place of the pseudo data, the

observed background levels could be instead, which are actual measures and

associated with eﬀect = 0 (no harmful eﬀects due to nitrogen).

The substitution of the ﬁctitious zeros with the actual background rates

3

represents the ﬁrst point of departure from our new proposed method with

theirs.

The second is to account for the uncertainty in the measures themselves.

This arises in two ways.

First, in the references themselves, the nitrogen values are given not as

certain values, but values with a plus-and-minus attached, or with standard

deviations or other statistical measures of variability (usually because of vari-

ability in the background measurements). We intend to use this variability,

though since we do not yet have a complete survey of all references (those

from the Table plus quite a few others), we do not know what the variability

is for all entries in the Table.

Merely for demonstration purposes, we take the square root of the Table

nitrogen entries as representing the standad deviation (square root of the

variance). This is because in the data we have collected so far, this is a

reasonable if imperfect approximation. For example, a mean of 20.6 kg ha−1

yr−1(the ﬁrst entry) is assigned a standard deviation of 4.54 kg ha−1yr−1.

Again, we stress this approximation is only for the purposes of illustration.

Second, to derive the correct variances from the reported data in the

references, we need to account for the scaling of the plot-sized nitrogen values

to the hectare. This scaling induces variability that must be accounted for.

This is simply because we can’t be certain the amounts added to a square-

meter plot exactly scales up a hectare, which are 10,000 times larger. Here

we use the approximation that the amounts of nitrogen can be represented

with normal distributions.

The amounts of nitrogen added were in g m−2, but with times of experi-

ments not yet noted. Converting to kg ha−1yr−1amounts to a factor of 10,

as long as we assume the time of the experiments is the same, which it likely

was not for each. We have yet to explore this, but will in future eﬀorts. In

any case, given the plot-sized variance is vp, it is easy to ﬁgure the variance of

the hectare-scaled data, which is vh= 102vp, a standard calculation. Using

that on a few entries of the table gives rise to the square-root approximation

mentioned earlier.

Here in Fig. 1 are the methods of [1] using added zeros (0-padding) com-

pared to a second logistic regression using the background low rates instead

(all with eﬀect = 0). The variability in measurement is not yet accounted

for here. The 0-padded data is in black, and the background-added-data is

in green. This plot represents the central estimate of the logistic regression

in a thick line, and the uncertainty due to the parameter estimation in thin

4

lines; i.e. the 95% conﬁdence intervals. A horizontal line at 20% is overlaid.

0 10 20 30 40 50 60 70

0.0 0.2 0.4 0.6 0.8 1.0

Parametric Uncertainty of Nitrogen Critical Loads

Nitrogen kg ha−1yr−1

Pr (Effect = 1 | data)

Original; 0−padded

With lows no padding

Figure 2: Parametric uncertainty of nitrogen critical load uncertainty, using

0-padded (black) and background low levels (green), with central estimates

(thick lines) and 95% conﬁdence intervals (thin lines).

The 0-padded estimate crosses the 20% threshold at about 7 kg ha−1yr−1

of N, with a range of about 2 to 11 kg ha−1yr−1N. The background-level

data central estimate begins above 20%, with a range of 0 to about 9 kg ha−1

yr−1N. This means that, even with background levels, and with no nitrogen

whatsoever, there is an estimated greater than 20% chance of an harmful

eﬀect due to nitrogen.

This is, of course, not possible. Obviously, the answer is not nearly enough

data is available, or that diﬀerent interpretations can be given to the presently

measured data, or in inadequacies of the model form itself. We think all

explanations are partly true. However, none of these ideas are explored in

this paper.

In any case, it is clear something has gone wrong with a model that gives

positive probability of nitrogen having an ill eﬀect at 0 levels of nitrogen. This

result is also found (but not noted) in [1], as there was a deﬁnite positive

probability of an ill eﬀect with 0 nitrogen in the 0-padded data. I.e., the

5

black line at 0 kg ha−1yr−1N is about 5% in their Fig. 2 (not shown here).

0 10 20 30 40 50 60 70

0.0 0.2 0.4 0.6 0.8 1.0

Predictive Uncertainty of Nitrogen Critical Loads

Nitrogen kg ha−1yr−1

Pr (Effect = 1 | data)

Original; 0−padded

With lows no padding

Figure 3: Predictive uncertainty of nitrogen critical load uncertainty, using 0-

padded (black) and background low levels (green). This shows the probability

of an eﬀect with a given level of N.

Passing over these impossibilities, the next step is to account for the

uncertainty in the parameter estimates, presenting the curves in a predictive

way instead. This is pictured in Fig. 3.

This plot gives direct statements of Pr(Eﬀect|N level, model, data) (the

condition is shown as just “data” in the plots). This represents a Bayesian

approach to the model, showing the predictive posterior distributions of the

model, see [2]. This is a more actionable form than the standarad parametric

uncertainty displays, because it’s never clear what to with the conﬁdence

interval. Here, once a level of nitrogen is speciﬁed, direct probabilities are

given, with no ambiguity in interpretation.

In any case, the story is the same. The level at which the threshold is

crossed for the 0-padded data is about 5 kg ha−1yr−1N. And again, even

with 0 nitrogen in the atmosphere, there is a greater than 20% of nitrogen

causing an eﬀect with the background-data model.

Even though it is by now clear the data, or the model or both, have diﬃ-

6

culties, we present a picture of how to incorporate measurement uncertainty

to the model, using the variance approximation mentioned above. In other

words, the regression model now makes use of the plus-or-minus attached to

each observation. This is pictured in Fig. 4

0 10 20 30 40 50 60 70

0.0 0.2 0.4 0.6 0.8 1.0

Parametric Mean Uncertainty of Nitrogen Critical Load With Error

Nitrogen kg ha−1yr−1

Pr (Effect = 1 | data)

Ordinary logistic

Uncertainty logistic

Figure 4: Measurement uncertainty added model.

Here we ignore the 0-padded data (which has no uncertainties). Because

of the diﬃculties mentioned, and because our variance estimates are only

approximations, it is the shape that is important here, and not the exact

values, which are only an approximation. The green lines are the ordinary

logistic regression with conﬁdence interval, using the background low data.

The red line is the model expanded to allow uncertainty in the measurements.

This model is choppier than the green because there is a great increase

in the number of parameters due to the measurement uncertainties, which

makes estimates a bit more diﬃcult to make. In any case, it is clear there are

many changes in the ﬁnal model, compared against the ordinary, uncertainty-

free model. These changes will very likely be present in the actual data, once

it is compiled.

7

References

[1] L. Banin, B. Bealey, R. Smith, M. Sutton, C. Campbell, and N. Dise.

Quantifying uncertainty in critical loads. Technical report, CEH Report

to SEPA, 2014.

[2] W. M. Briggs. Uncertainty: The Soul of Probability, Modeling & Statis-

tics. Springer, New York, 2016.

[3] J. Nilsson and P. Grennfelt. Critical loads for sulphur and nitrogen. report

from a workshop held at Skokloster, Sweden, 19-24 mar 1988. Technical

report, Nordisk Ministerraad, 1988.

8