Conference PaperPDF Available

Home Depot, Hogwarts & Excess Deaths at the CDC

Authors:

Abstract and Figures

What would the Securities & Exchange Commission do if Home Depot reported as its 2020 “net income” only its revenues (not expenses), from only its best eight months, in only its best 18 states? What if we replace “Home Depot” with “CDC”? Which organization should be held to higher accounting and disclosure standards? In 2020, the CDC’s budget ballooned from $8 billion to roughly $46 billion largely in response to Covid-19 mortality figures reported by the CDC. How many deaths has Covid-19 caused? Causal analysis is always a tricky business, but pinpointing cause of death is notoriously error-prone, more so when a reported cause is used to solicit billions in funding. Indeed, the CDC emphasizes the funding connection twice on the first page of its65-page death certificate handbook. In support of its remarkable budget increase, and claiming concern that death-certificate-based counts were under-counting Covid-19 deaths, in fall 2020, the CDC began estimating “excess” deaths as a proxy for Covid-19 deaths. Is the CDC’s estimate realistic? To find out, we explore the CDC’s all-cause death data and evaluate their expected deaths model and related disclosures, using as benchmarks SEC Rule 10b-5, AICPA SAS 143 (on auditing estimates), and the fraud triangle. We find that in order to meet minimum SEC standards, which fundamentally require that statements and disclosures be "not materially misleading," the CDC’s excess deaths estimate, disclosures, and presentation thereof would need to change significantly. The analysis was performed in R Studio using, primarily, the dplyr and ggplot2 packages.
Content may be subject to copyright.
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 1/28
Home Depot, Hogwarts & Excess Deaths
at the CDC
Kurt Schulzke, JD, CPA, CFE
I. Introduction
What would the Securities & Exchange Commission do if Home Depot reported as its 2020 “net income” only its
revenues (not expenses), from only its best eight months, in only its best 18 states? What if we replace “Home Depot”
with “CDC”? Which organization should be held to higher accounting and disclosure standards? Which has more power
to damage the United States with misleading information?
In 2020, the CDC’s budget ballooned from $8 billion (https://www.cdc.gov/budget/documents/fy2020/fy-2020-cdc-
operating-plan.pdf) to roughly $46 billion (https://www.cdc.gov/budget/fact-sheets/covid-19/index.html), all in response to
perceived Covid-19 mortality. How many deaths has Covid-19 caused? Causal analysis is always a tricky business, but
pinpointing cause of death is notoriously error-prone, more so when a reported cause is used to solicit billions in
funding. Indeed, the CDC emphasizes the funding connection twice on the first page of its 65-page death certificate
handbook. In support of its remarkable budget increase, and claiming concern that death-certificate-based counts were
under-counting Covid-19 deaths, in fall 2020, the CDC began estimating “excess” deaths as a proxy for Covid-19 deaths.
To be clear, the CDC’s estimate is not data. It is an estimate, which means it is one judgmental interpretation of data
among multiple possible interpretations. Is the CDC’s estimate realistic? To find out, we will explore the CDC’s data and
evaluate the model they used to make their estimate.
The key high-level questions are these: What is the plausible range of “excess” deaths in a relevant place over a
relevant time period? What share of excess deaths might be fairly attributed to Covid-19? And how transparently does
the CDC present the range and share to readers? We offer some answers to these questions in Part V. We’ll keep things
simple and (mostly) visual, avoiding complex statistical tools.
At the outset, we note that the SEC’s fundamental “anti-fraud” standard, Rule 10b-5
(https://www.law.cornell.edu/cfr/text/17/240.10b-5), declares it illegal, in buying or selling stocks, to (a) employ any
device, scheme, or artifice to defraud, (b) mislead readers by statements or omissions, or (c) engage in any act, practice,
or course of business which operates as a fraud or deceit upon anyone.
Let’s see how the CDC measures up to stock market disclosure requirements. In doing so, we’ll generally follow the
AICPA standard for auditing management estimates, SAS 143, which admonishes that the importance of professional
skepticism:
…increases when accounting estimates are subject to a greater degree of estimation
uncertainty or are affected to a greater degree by complexity, subjectivity, or other inherent
risk factors. Similarly, the exercise of professional skepticism is important when there is
greater susceptibility to misstatement due to management bias or fraud.
In this connection, we should also consider the “fraud triangle
(https://corporatefinanceinstitute.com/resources/knowledge/accounting/fraud-triangle/),” which holds that opportunities,
incentives, and easily accessible rationalizations heighten fraud risk. All three legs of the triangle are abundantly evident
here.
Code
1
2
3
4
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 2/28
Before we engage the CDC’s model, we’ll peek at the “raw” data (scaled to per thousand population) while keeping in
mind that it is curated by the NCHS (https://www.cdc.gov/nchs/about/organization.htm), a division of the CDC. We’ll
assume that the data are what the CDC says they are (https://data.cdc.gov/NCHS/Weekly-Counts-of-Deaths-by-State-
and-Select-Causes/muzy-jte6): unvarnished, reported deaths data. An audit would be required to verify the validity of this
assumption. For context, the U.S. crude annual mortality rate (https://data.worldbank.org/indicator/SP.DYN.CDRT.IN?
locations=US) has recently averaged 8-9 nationally, ranging between 12.5 (WV) and 9 (Alaska). And, according to the
CDC, the 1918 flu killed 675,000 (https://www.cdc.gov/flu/pandemic-resources/1918-pandemic-h1n1.html) out of 103
million Americans (6.6 DPT).
Keep in mind that we are looking at total deaths data, not Covid-19 deaths data. While the popular narrative makes it
difficult to remember this, even the CDC admits in their Technical Notes that they do not know what share of “excess”
deaths can be blamed on Covid-19. This means that they cannot say how many people have truly died from Covid-19.
No one can. Not the WHO, not Johns Hopkins U., not Emory U., not Georgia Tech. No one knows. For all we know, the
Covid-19 narrative may be masking other, more deadly causes.
Figure 1 gets us started. It visualizes weekly total U.S. DPT from all causes and all jurisdictions since 2014. The CDC
says that the data for pre-2020 years is “final” and that 2020-21 data are “provisional,” though the “all cause” deaths
figures (from most states) appear pretty stable after about four weeks.
Even at this altitude, the 2020 data paint a strange picture compared to prior years. The horizontal green line marks the
high-point of the 2018 flu season. The y-axis scale runs from 0.10 to 0.95 deaths per thousand (DPT) to accommodate
spring 2020 spikes in the Tri-state area (NY, CT, NJ). These will appear later. The 2019 and 2020 highs stand out,
respectively, as low and high, but why and by how much? We take a closer look in Figure 2.1, which shrinks the y-axis
(vertical) range and drops the 2014 data. This brings smaller patterns into view.
Code
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 3/28
Figure 2.1 shows a spring 2020 spike in pink. At its apex, it rises by roughly 0.025 DPT above the 2018 peak (blue). A 4-
week-long peak of 0.025 adds 0.10 deaths or about 1 percent of the annual total of 8-9 DTP. Spikes in 2015 and 2018
were followed by down years. If this pattern holds, we’ll see the 2020 peak followed by lows in 2021-22. That is, unless –
as cancer diagnosis data suggest (https://www.statnews.com/2020/08/04/new-cancer-diagnoses-fell-sharply-after-
coronavirus/) – Covid-19 interventions drive up deaths from other causes.
In addition, 2019 missed the typical January - March spike. Instead, deaths plateaued. Later, after the summer 2019
trough, they climbed Aug to Jan, then plateaued again – reminiscent of 2016 – in January and February 2020, before
spiking well above the 2018 peak between March 28 and April 25. Then they dropped, with an after-shock in mid July to
early August, before falling again until mid-October. In short, lots of ups and downs but no 2020 summer trough. This is
all at the national level.
How does the picture change if we exclude the Tri-state area? Figure 2.2 tells the tail, as it were. Focus on the flat green
line (representing the 2018 peak in flu deaths) and the squiggly purple and pink ones.
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 4/28
Removing the Tri-state drops the national spring 2020 peak below January 2018. The line then morphs into a serpentine
spring-summer-fall that would make Hogwarts and Loch Ness proud. It’s almost as if the peak that no-showed in January
2019 was “apparated (https://harrypotter.fandom.com/wiki/Apparition)” into spring and summer 2020. Maybe, outside of
the first-hit Tri-state – which spiked the 2020 national high over the (green) 2018 flu line – the spring 2020 school and
business closures that occurred in many states (e.g., Georgia) merely smoothed deaths into the summer. We can search
for the serpent in the state data, though this won’t guarantee that the data can be trusted. First, let’s talk modeling
excess deaths.
Excess deaths are estimated, not counted. They are a function of both the estimator’s subjective judgments and
objective data. It is widely accepted that estimated excess deaths equals reported deaths from all causes in excess of
estimated “expected” deaths over a given time period in a chosen geographical area. In theory, excess deaths may
result from some novel cause or causes like, e.g., Covid-19 and societal responses to it. However, the CDC states
(https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm) that Covid-19 may not be the cause of any of the
CDC’s estimated excess deaths. Similarly, an excess of zero or less is some evidence that no novel cause is killing
large numbers of people who would not die anyway. So the debate centers on the meaning and estimation of the
plausible range of expected deaths, the proper time and geography subsets to use for comparing expected to actual
deaths, how to present the comparison, and what share of the excess, if any, to attribute to particular plausible causes
(e.g., Covid-10).
In its October 23, 2020 Morbidity and Mortality Weekly Report
(https://www.cdc.gov/mmwr/volumes/69/wr/mm6942e2.htm) (MMWR), the CDC published 299,028 as its estimate of
nationwide excess deaths, from Feb 1 to Oct 3, 2020. It attributed ~2/3 of these (198,081) to Covid-19. This 198,081
number, apparently based largely on PCR test positives, may be significantly overstated, particularly in light of the
WHO’s Jan 20, 2021 Notice (https://www.who.int/news/item/20-01-2021-who-information-notice-for-ivd-users-2020-05)
warning, contrary to longstanding CDC Covid-19 dogma, that a single PCR test positive alone does not justify a “Covid-
19” diagnosis.
That said, in the MMWR, the CDC states that excess deaths “have been used to estimate the impact of public health
pandemics or disasters, particularly when there are questions about underascertainment of deaths directly attributable to
a given event or cause.” This implies, misleadingly, that excess deaths are not also useful for exposing suspected over-
counts.
5
6
7
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 5/28
In any case, after updating for data available through mid-January 2021, the CDC’s gross estimated excess for the Feb 1
- Oct 3, 2020 period is either 311K or 206K, depending on which expected death benchmark (mean or upper-bound) is
used. If we assume that the CDC’s 2/3 multiplier holds for all numbers of excess deaths, these would translate to 208K
or 137K gross estimated Covid-19 deaths.
Excess
<dbl>
207711.3
1 row
Excess
<dbl>
137296.7
1 row
But this 8-month, gross national estimate, based on multiple by-state-by-week statistical models
(https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm), is misleading. This is partly because it arbitrarily
cherry-picks eight months instead of twelve. And not just any eight months. Starting the count just as deaths were about
to spike in the Tri-state is like Home Depot reporting profits for only its best eight months. It also implies that a
nationwide estimate is meaningful when, because of diversity among states and regions, it is not.
Yet questions regarding the CDC estimate – involving binning, netting, algorithm choice, estimation uncertainty, and
training data – run much deeper. And the net impact of Covid-19 where you live is probably much less than the CDC’s
national estimate suggests. In most states, according to the CDC’s own “provisional” numbers, the net impact of Covid-
19 for the year ended September 30, 2020 was zero or less, if we use the lower side of the CDC’s estimate. What? The
CDC’s estimate is two-sided, you ask? Yes, of course. More than two-sided. This is true of every estimate. Numbers can
be spun in many directions. Estimates and their interpretation are as much art as science, and estimates cover a range
of possibilities. In the CDC’s case, the range is very wide. Let’s consider how and why.
II. Questions & choices
The CDC’s model (https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm) relies in large part on five subjective
choices: 1. time and geography segments or “bins” for prediction and reporting, 2. whether to ignore deficits or net them
against surpluses, 3. what to say about estimation uncertainty (e.g., report high and low point estimates or probability
distributions over the plausible range), 4. which algorithm or other tool to use in predicting (i.e., “modeling”) expected
deaths, and 5. which historical data to use for “training” the model. These choices significantly impact outcomes.
Overshadowing these modeling choices are visceral value-based questions: What is the appropriate “loss function” for
evaluating possible regulatory responses to novel causes of mortality like Covid-19? In other words, what should society
be trying to maximize or minimize with regulatory choices? Do we want to maximize the joys of association with other
human beings or minimize exposure to a single risk (e.g., contracting Covid-19)?
We’ll comment briefly on each of these issues in the sections that follow.
A. Netting & binning
Code
Code
8
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 6/28
The CDC’s excess deaths model (https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm) (we’ll call them
“surplus” deaths) predicts how many people will die in each week of the year in each state. These weekly-by-state
predictions are then compared to reported deaths. If the reported deaths exceed predicted deaths in a given week, that
excess counts toward surplus deaths. However, any deficit (i.e., where predicted deaths exceed reported) is counted as
zero. The CDC then reports the sum of these weekly, gross surpluses as U.S. excess deaths.
Let’s use Georgia as an example. Figure 3, copied from the CDC’s website
(https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm), illustrates the impact of the CDC’s netting and binning
choices on Georgia’s surplus deaths. The blue vertical bars show reported deaths adjusted for the CDC’s assumptions
about incomplete reporting. The brown line visualizes the mean CDC expected deaths produced by the model. The
differences between the brown line and the top of the blue bars–in yellow highlight–equals the deficit or surplus deaths
per week, but only if we use the mean expected-death threshold. If the CDC’s dashboard allowed us to use the upper-
bound expected deaths, the brown line would rise and the yellow zone would expand considerably.
Fig. 3
(https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm)
The mean-based deficits highlighted in yellow were ignored by the CDC’s national estimate of 198K Covid-19 deaths.
Ignoring the yellow zone inflates estimated Covid-19 deaths, just as if Home Depot were to report its gross revenues
(without deducting expenses) as net income. To illustrate the impact of this one-sided accounting, gross Georgia excess
deaths (ignoring the yellow zone) for the two years, 2019-2020, using the CDC’s mean expected deaths would be
14,144, while the net (counting the yellow zone) would be 2,156 less, or 11,988:
sum(`Excess Higher Estimate`)
<dbl>
14144
1 row
sum(`Observed Number` - `Average Expected Count`)
<dbl>
11988
1 row
9
Code
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 7/28
By contrast, if measured against the CDC’s upper bound expected deaths, the CDC’s estimate of Georgia’s gross
surplus (excess) deaths would be 8,997, with a net deficit of 185:
sum(`Excess Lower Estimate`)
<dbl>
8997
1 row
sum(`Observed Number` - `Upper Bound Threshold`)
<dbl>
-185
1 row
The CDC attempts to rationalize this surplus-only accounting by arguing that some weekly deficits in some states may
eventually flip to surpluses because of “incomplete” reporting. This argument does not pass the straight-face test.
Reporting is always incomplete and every organization has accounting cutoffs. Furthermore, data scientists routinely
model missing or incomplete data (https://www.jstatsoft.org/article/view/v045i03), especially in the health sciences. It’s
data science 101 (https://amices.org/mice/index.html), as the CDC well knows. The CDC’s own model “weights” weekly
by-state reported deaths by their predicted incompleteness. That the CDC models these reporting lags means they can
also model the deficits.
And let’s not forget that the numbers used by the CDC to recommend nationwide lock downs in the spring of 2020 were
beyond “incomplete”: they were largely imaginary. So while deficits for some slow jurisdictions (e.g., North Carolina)
might warrant “deferred netting,” permanently hiding the deficits finds no scientific justification.
At the national level, by netting estimated deficits against estimated surpluses and reporting the entire Sep 30, 2020
year (instead of just eight months), the CDC’s 299K mean-based estimate drops to about 60K. Why didn’t the CDC bin
annually? One possible motive is suggested by the R package – surveillance (https://cran.r-
project.org/web/packages/surveillance/surveillance.pdf) – used by the CDC to develop its model. But annual binning is
not the only option. Figure 2 suggests that total deaths are seasonal, cyclical, and gradually trending upward, naturally
cycling through ever-higher highs every 2-3 years. Thus, biennial or triennial bins might be more informative. Whatever
the binning window, deficits must be counted.
Beyond netting and time binning, geographical binning also affects the reported surplus. It does so at two levels:
prediction and reporting. First, the CDC’s state-based prediction model produces surpluses that are 25% higher
(https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm) than they would be if the model were run at the
national level. Second, putting the entire U.S. into a single geographic bucket obscures important regional and state-
level differences. For example, using annual binning (9-30 YE) and dropping the Tri-state area from the national estimate
reduces national surplus deaths to ~13K. Tri-state added ~41K surplus deaths to the national total. We’ll look more
closely at state and regional differences in Part IV.
B. Uncertainty
Two issues are of concern here. First, the CDC MMWR claims an impossible degree of estimating precision, reporting its
national (and state) estimates to single deaths, e.g., “299,028,” instead of rounding, e.g., “299K,” or stating a range,
“50K-299K.” Given the many subjective assumptions involved, such precision is misleading. While the CDC explains
most of its assumptions and notes that its model produces “expected” and “upper bound” point estimates, it
Code
Code
10
11
12
13
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 8/28
nevertheless promotes the illusion of precision through its public-facing media like the MMWR. This conveys the false
impression that the estimate is more certain than it is. Modeling is a mix of subjective judgment and objective science.
The CDC should make this clear.
Second, aside from inflated claims of precision, point estimates – which is all the CDC offers on its dashboard – are
inadequate to support responsible policy. Are 10-20K excess deaths more or less probable than 190-200K? While the
notes refer to upper-bound and mean expected deaths, the dashboard (see Fig. 3, brown line) shows only the mean
value and prevents readers from accessing the upper-bound.
The notes say that the model relies on the historical mean (the “expected value”, in statistical terms) for each MMWR
week to establish the expectation for the current week’s deaths. Thus, every week in every state gets it’s own little
model. It’s not clear how many data points feed into each weekly model, but – with only five years to pick from – there
might be only five (2015-2019). For illustrative purposes, let’s look at those five (and their means) for New York City and
Nebraska in week 15.
Juris
<chr>
Year
<fctr>
Deaths
<dbl>
DeathsPM
<dbl>
Nebraska 2015 332 0.1755428
Nebraska 2016 372 0.1952125
Nebraska 2017 318 0.1659754
Nebraska 2018 342 0.1776057
Nebraska 2019 313 0.1618066
New York City 2015 1019 0.1204058
New York City 2016 1094 0.1291747
New York City 2017 1049 0.1243263
New York City 2018 1085 0.1293194
New York City 2019 1052 0.1261872
1-10 of 10 rows
Juris
<chr>
Mean
<dbl>
DPTmu
<dbl>
Nebraska 335.4 0.1752286
New York City 1059.8 0.1258827
2 rows
Of these numbers, which deserves to be the chosen, “expected” one in Nebraska and NYC in 2020 Week 15?
Statistically speaking, none of them. That’s right: None of these numbers was individually probable at all because
probability attaches only to ranges, not to point-estimates. For the record, here are the reported Week 15 results:
Code
Code
Code
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 9/28
Juris
<chr>
Year
<fctr>
Deaths
<dbl>
DeathsPM
<dbl>
Juris
<chr>
Year
<fctr>
Deaths
<dbl>
DeathsPM
<dbl>
Nebraska 2020 343 0.1773152
New York City 2020 7860 0.9428059
2 rows
NYC’s reported 2020 7,860 deaths (0.942 DPT) were 7.4 times the 2015-19 mean, 1060 (0.126 DPT), but Nebraska’s
reported 343 (0.177 DPT) came remarkably close to its mean, 335 (0.176 DPT). Yet, prospectively, none of the historical
numbers or their mean should carry any more weight than the others in establishing the expectation that defines an
“excess.” And why limit the benchmarking period to the past five years? Why not ten or twenty? Why not go back to the
1918 flu, which killed 6.6 DPT? There’s no science to the choice; it is completely arbitrary and should be decided through
open debate among elected representatives, not by cloistered, unaccountable data modelers.
For a different perspective on the relative plausibility of the CDC’s mean and upper-bound, consider Figure 4.
Figure 4 shows a close correlation, in weeks 5-48, between Georgia’s reported Covid-19 deaths (classified as such
based primarily on positive PCR test results) and the CDC’s modeled, low-end excess deaths estimate (based on the
CDC’s upper-bound expected deaths). If we assume the accuracy of Georgia’s PCR-identified Covid-19 deaths and that
all excess deaths were caused by Covid-19, this correlation suggests that the CDC’s upper-bound expectation is much
closer to reality than the mean shown on the CDC dashboard. This points to net excess deaths (from all causes, for the
9-30 year) centered on 19K and 60K for the U.S. without and with Tri-state, respectively.
<dbl>
Up_Ex_noTri_DPT
<dbl>
18632 0.05680488
Code
Code
14
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 10/28
1 row
Up_Ex_All
<dbl>
Up_Ex_All_DPT
<dbl>
59571 0.1816189
1 row
In summary, the inherent uncertainty of these estimates demands more clarity and transparency. We need the full
distribution of probabilities over plausible values and the assumptions that support them. We also need an open debate
on what they all mean. Modern data science offers multiple tools to make this happen. One of the best is Bayesian
networks, about which I have previously written (https://www.linkedin.com/pulse/amy-klobuchar-covid-19-tests-false-
positives-kurt-schulzke) in the Covid-19 context. When so much is at stake for so many, there is every reason to treat
uncertainty with the highest degree of data professionalism and transparency. Most importatly, those who make this
fateful decision should be held fully accountable, in every sense of the term, to the stakeholders affected by it. This is
corporate governance 101.
C. Loss function
Whatever the model, the user must decide how many deaths are too many. What number of deaths requires policy
interventions like business closures, social distancing, masks, or vaccination? The CDC’s own excess deaths model
points to ten deaths per state per week as the lower bound. The Technical Notes
(https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm) state that weekly counts “between 1 and 9 are
suppressed.” In other words, nine or fewer deaths are not tracked at all by the CDC. This suggests that the CDC
believes that less than ten deaths per state per week should not impact public policy. Another benchmark might be the
1918-19 flu pandemic, with an estimated toll of 6.6 DPT. Through September 30, 2020, the CDC’s own 200K gross,
mean-based estimate equated to only 0.60 DPT. After netting, the upper-bound based estimate was 0.18 (0.06) with
(without) Tri-state.
This is not to say that ten is certainly the right threshold but merely that there is a number, ten or higher per week per
state, below which government intervention is over-kill. The threshold is a value judgment related to an equally
judgmental loss function that should balance or “net” costs and benefits. Isolating a nursing home patient may prevent
a Covid-19 infection at the expense of inducing severe depression or suicide. Shuttering restaurants and concerts may
reduce Covid-19 deaths while killing restaurant operators and musicians by starvation and dramatically reducing the
quality of life for millions. Oxford epidemiologist Sunetra Gupta calls (https://gbdeclaration.org/video/) this a violation of
the “social contract:”
Why don’t we do this for the flu? What’s the contract here? … We want schools to remain
open, we want people to flourish, we want inequality not to get any worse. We want the
arts… What are we alive for? We put all of these things on a plane … and we say, ‘Okay,
we’re going to tolerate this level of disease in the population. It will kill some people. We
will try and protect them.’ We just do our best within that context, within those boundaries
to try and make it all work." I think that’s the social contract … and that’s what’s being
ignored."
How do 0.06 to 0.85 deaths per thousand fit into the social contract? Where do you draw the line?
Code
Code
15
16
17
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 11/28
D. Algorithm
The CDC uses the Farrington algorithm to model expected deaths but does not identify alternatives or compare
Farrington to them. Schumacher et al (https://www.jstatsoft.org/article/view/v070i10), cited in the CDC’s Technical
Notes, say that Farrington “only performs one time-point detection” of outbreaks. In order to fully and fairly evaluate the
model, we need to see the code. The CDC should publish its R code and explain why the Farrington method is suitable
at all and why it is better than alternatives.
E. Training Data
Like all models, the CDC’s excess deaths model was “trained” on historical data. However, it’s unclear which data the
CDC used. The Technical Notes (https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm) say both “2013-
present” and “2015-2019” but provide no link to the data itself. We need access to the data. We also need access to data
going back at least to 1918 to cross-check the appropriateness of limiting the model’s historical knowledge to just the last
five years.
III. Data & Code
The Part IV analysis – primarily in the form of visualizations and commentary – focuses on three bodies of data: 1. U.S.
mortality from all causes by age, time, and place (NCHS), 2. Excess Deaths Associated with Covid-19 (NCHS), and
3. a Covid-19 tracking spreadsheet (https://ga-covid19.ondemand.sas.com/docs/ga_covid_data.zip) produced by the
Georgia Department of Public Health. Population denominators for per-thousand figures and, for cross-validation, annual
deaths, are from the U.S. Census.
The code used to wrangle and transform the data appears in the Appendix. All coding was done in R Studio
(https://rstudio.com/) using tidyverse (https://www.tidyverse.org/) and other R packages identified in the first code
chunk. The code can be hidden or revealed by toggling the Code button in the top right corner of this html document and
before each code chunk. The full R Studio .Rmd file can also be downloaded through the same button. I provide the
code, submit to the MIT open source license, to encourage others to extend the analysis.
IV. Analysis
In Part IV, we’ll look more closely at deaths by time period and jurisdiction using simple visualizations. New York City
(NYC) is treated as a separate state, so New York state (NY) excludes NYC.
Parts IV. A-B deal with total deaths per thousand, without regard to modeling the excess. Part IV.C digs directly into the
CDC excess deaths model and data table. In Parts A-C, we see that 2020 mortality in the U.S. varied by state, region,
and time periods mostly inside narrow bands, with the Tri-state area being the far-out exception. In most places, in most
weeks, 2020 deaths from all causes differed from 2014-2019 by less than 0.10 DPT. Within this range, several regional
patterns are distinguishable, with the northeastern United States clearly standing out in terms of maximum weekly DPT.
The table below sorts 2020 reported weekly DPT (which the table calls “DeathsPM”) from high to low. Only four of the
top 50 weekly scores – in MS, LA, and TN – are located south of Washington, D.C. (38 deg. north latitude).
Juris
<chr>
Week Ending Date
<date>
REGION
<dbl>
DeathsPM
<dbl>
New York City 2020-04-11 1 0.9428059
New York City 2020-04-04 1 0.7549644
New York City 2020-04-18 1 0.7075842
New Jersey 2020-04-11 1 0.5369171
18
19
20
21 22
23
24
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 12/28
Next123456... 100Previous
Juris
<chr>
Week Ending Date
<date>
REGION
<dbl>
DeathsPM
<dbl>
New Jersey 2020-04-18 1 0.5324137
New York City 2020-04-25 1 0.4855570
New Jersey 2020-04-25 1 0.4476374
New York 2020-04-11 1 0.4194574
New York 2020-04-18 1 0.4016464
South Dakota 2020-11-07 2 0.3978934
1-10 of 2,809 rows
A. By state & region
1. U.S.
Figure 1, discussed in Part I, plots the weekly nationwide DPT, January 2014 to mid-December 2020. A rough measure
of the difference between 2020 and prior years is the distance or “delta” between the high weekly DPTs, 2018 vs 2020
and 2019 vs 2020. These can be visually evaluated or calculated directly. In Figure 1, these are, respectively, ~ 0.035
and ~ 0.05 DPT. The plots that follow drill down on specific regions and states, north-to-south and east-to-west.
2. NE: NY, NYC, NJ, CT
We begin with NYC (Fig. 5). Boasting an estimated 2019 population of 8.4 million, about 2.2 million less than Georgia,
NYC reached its 2020 DPT high of ~0.94 DPT in Week 15, far surpassing peaks in 2018 (0.15) and 2019 (0.16). The
2018/20 delta of 0.79 per thousand is 22.6 times the overall 2018/20 U.S. delta that includes the NYC data.
Fig. 6 places NYC in its broader regional context. Again, NYC stands out, though other NE players (especially NJ) make
a strong showing. NY state and NJ were rising above the 2018 peak, at the end of December.
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 13/28
3. Mid Atlantic
A bit further south, in Figure 7, MD and PA show 2018/20 deltas of ~ 0.05. VA, with a negligible delta, is elevated
compared to 2018 but not in comparison to MD and PA, which were climbing into late December. The late drop in NC
reflects slow reporting.
4. SE: GA, FL, SC, TN
Code
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 14/28
Moving into SEC territory, the Fig. 8 y-axis shrinks and shifts downward. In the selected states, the deltas max out at
~0.04 per thousand. In contrast to the NE, GA and SC show two relatively minor but distinct highs in spring and late
summer. FL and TN are climbing into December, while the others meander or fall (GA). The trailing volatility in SC is
suspicious.
5. Midwest: MI, IL, OH, IN
In the Midwest, Figure 9, MI busts the curve (and stretches the y-axis), mimicking NJ with a 2018/20 delta of ~0.20. IN
and OH had no or low deltas, with IL sporting ~0.025. All four states were rising into November, but fell in December.
Code
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 15/28
7. Northern Plains
Lots of late-fall runs on the northern plains were followed by dives, where roughly no deltas appeared until mid-
Octoberish. All four states began falling in late November. The SD drop is corroborated by the SD Department of Health
Covid-19 Dashboard (https://doh.sd.gov/COVID/Dashboard.aspx).
8. Rockies: UT, CO, AZ, NV
Fig. 11 shows a bit of action in CO, ID, and WY, with little to show in UT which, nevertheless, bared some flare in
November. By late December, all are falling and miniscule compared to the Tri-state.
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 16/28
9. Southwest: NM, AZ, CA, NV
In the SW (Fig. 12), only AZ had a 2018/20 delta to speak of, at ~ 0.05. By late December, NV and NM were falling, with
CA and AZ rising, though still nothing compared to territory above the 38th parallel.
10. Pac Northwest
Code
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 17/28
In Figure 13, Oregon, and Washington show deficits with respect to their high-water mark in 2017 but are on par with
2019. Idaho and California were included because of geographical proximity (far northern CA is really Pac NW). ID made
a late run but stumbles by late November.
B. Layered years
Layered smooth-line plots offer a different view of state DPT deltas. Faceting the plots facilitates state-to-state
comparisons. Interpret these plots with caution because 1. the y-axes are very narrow; those hills and spikes typically
reach no more than 0.10 DPT over the lows; 2. the y-axes are different for every panel; and 3. the smoothing function
“smooths” extreme values in order to expose larger patterns; thus, the deltas won’t look as wide as they do in previous
figures.
1. U.S.
Consistent with Fig. 1, Fig. 14.1 shows a maximum 2019/20 delta of ~.065 DPT. The typical fall uptick starts at about
week 43.
Code
25
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 18/28
2. South
In the South, Texas (Fig. 14.2) stands out for its muted seasonal pattern and low mean DPT. No wonder Oracle wants to
move to Austin! Alabama’s 2014 data are suspect. Three states have deltas of ~ 0.10. For comparison of smooth and
line plots, an extended times series panel (Fig. 14.3) is included for the same group. MI and AL were climbing through
late December.
Code
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 19/28
3. SE: GA, FL, SC, TN
The layers in Fig. 14.4 highlight GA’s counter-seasonal, bi-modal pattern with deficit deaths by late November, While TN
has been mostly climbing since April or May. Yet it’s all within 0.05 DPT.
3. Flyover
In Fig. 14.5, AK and OK show similar patterns, as do KS and NE, until NE takes a dive at the end.
Code
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 20/28
4. Tri-state
Fig. 14.6 re-emphasizes how different the Tri-state is from the rest of the country. Either the data are corrupt or
something went terribly wrong in the Tri-state, especially NYC. Either way, policies that might apply in the Tri-state
should have zero sway over the remainder of the country.
C. Excess Deaths
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 21/28
We now shift to estimates produced by the CDC’s excess deaths model. The expected deaths are estimated and
compared to reported deaths over time intervals or “bins.” The CDC’s model estimates expected deaths by state by
week, then sums these estimates over some larger time period. We’ll look at the twelve months beginning on October 1,
2019, which is when flu season typically starts in the northern hemisphere. This 9-30-20 year end excludes the last
quarter of 2020, for which data for some states might be incomplete.
For the 9-30-20 year, absolute, net U.S. excess deaths (“net” means surpluses - deficits) based on the CDC’s upper-
bound (mean) expected deaths were 60K (286K) with Tri-state, and 19K (225K) without (see queries below). In other
words, even the CDC’s model recognizes a wide range of plausible estimates. In DPT, with the Tri-state the upper-
bound-to-mean range is 0.18 - 0.87 or 0.06 - 0.69 without Tri-state, against an annual national mean DPT of about 8.6
over the 2017-2019 interval (see queries below). Without Tri-state, this means that the CDC’s upper-bound total excess
deaths, on the high side, were roughly 0.69/8.6 = 8% of a “normal” year. On the low side, this figure drops to 0.06/8.6 =
0.6%.
Up_Ex_All
<dbl>
Up_Ex_All_DPT
<dbl>
Mu_Ex_All
<dbl>
Mu_All_DPT
<dbl>
59571 0.1816189 285608 0.8707561
1 row
Up_Ex_noTri
<dbl>
Up_Ex_noTri_DPT
<dbl>
Mu_Ex_noTri
<dbl>
Mu_Ex_noTri_DPT
<dbl>
18632 0.05680488 225080 0.6862195
1 row
1. Bar chart by state
Figure 15 plots a bar for each state, sorted by estimated upper-bound net excess deaths per thousand for the year
ended September 30, 2020. States to the right of IN – all 32 of them – have deficits, not excesses, over this time period.
26
Code
Code
Code
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 22/28
2. Text plot of states
Figures 16.1 and 16.2 offer a different perspective on Figure 15, plotting the states by their ratios of reported deaths to
the CDC’s upper-bound and mean expected deaths, respectively. The vertical red line marks the 1:1 ratio. States to the
left (right) of the line reported fewer (more) deaths than predicted, meaning that they had no (some) surplus or excess.
This is a good visual illustration of the uncertainty in the CDC’s estimate. Let’s recall, however, that Figure 4 (under Part
II.B., above) suggests that Figure 16.1 (using the upper-bound expected deaths) is closer to reality than Figure 16.2
(using the mean expected deaths). The figures reinforce that New York City and New Jersey should be viewed as a
policy and treatment zone completely distinct from the rest of the country. More investigation is warranted to discover
why NYC and New Jersey did so poorly compared to the rest of the United States.
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 23/28
V. Conclusion
So let’s summarize.
What is the plausible range of total “excess” deaths in a relevant place over a relevant time period?
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 24/28
The answer depends on where you live and the measurement window. No one lives everywhere across the United
States. That said, the range for the year ended September 30, 2020 for United States most likely lies between 0.17 -
0.85 (0.03 - 0.67) DPT, including (excluding) the Tri-state. For most states individually, zero excess is plausible. Yet, this
highly important fact goes unmentioned in the CDC’s October 23 MMWR, which cherry-picks the Feb - Sep period and
ignores deficits to produce 0.91 DPT (299,028/328 million/1,000) as the solitary point estimate for the entire country.
Compare to the 1918 flu with a 6.6 DPT.
What share of excess deaths might be fairly attributed to Covid-19?
Small and getting smaller, especially in light of the WHO’s Jan 20, 2021 Notice (https://www.who.int/news/item/20-01-
2021-who-information-notice-for-ivd-users-2020-05) warning that a PCR test positive should not be counted as “Covid-
19” in the absence of a clinical diagnosis: “The cycle threshold (Ct) needed to detect virus is inversely proportional to the
patient’s viral load. Where test results do not correspond with the clinical presentation, a new specimen should be taken
and retested using the same or different NAT technology.”
The vast majority of the fatalities are either (a) very elderly and, therefore, likely to die soon anyway, or (b) saddled with
major comorbidities and … likely to die soon anyway. Even the CDC admits, in print, that it does not know whether ANY
of its estimated excess deaths were caused by Covid. All of this means that we should be looking beyond Covid-19 for a
full explanation of the Tri-state and other anomalies like the summer 2020 serpent.
How transparently does the CDC present the range and share to readers?
Not transparently at all. They publish a big, falsely precise, nationwide number (i.e., 298,028) – cherry-picked from the
“best” eight months – front and center. They fail to inform readers, “Our estimate is subject to lots of uncertainty. We
don’t even know how many”excess" deaths there are and we don’t know what share of the unknown excess deaths were
caused by Covid." This is all buried in the fine print. They entirely omit the upper-bound expected deaths from their
dashboard, fail to share their model code and data, and say nothing about the inversely proportional relationship
between PCR cycle threshold (Ct) and viral load, which calls into question the Covid-19 label on deaths reported by the
states.
Perhaps most importantly, contrary to standard practice in research, the CDC modelers say nothing about their major
financial interest in fostering the appearance of higher excess deaths. An adequate disclosure might read like this:
CONFLICT OF INTEREST: Covid-19 deaths are a proxy for CDC revenue. The CDC
tends to receive more federal funding when the perceived epidemic death rate is high. In
2020, Congress increased our operating budget from $8.5 billion to $25 billion in response
to Covid-19. Thus, this estimate may be affected by bias induced by our financial interest
in higher reported Covid-19 deaths.
In summary, in order to meet minimum SEC rules, which fundamentally require that statements and disclosures be "not
materially misleading (http://blogs2.law.columbia.edu/jtl/wp-content/uploads/sites/4/2018/01/Schulzke_56-CJTL-6-
1.pdf)," the CDC’s excess deaths estimate, disclosures, and presentation thereof would need to change significantly.
Appendix
I. Data preparation
A. Load libraries
B. Import all data
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 25/28
C. Xforms
Here we transform the CDC, Census, and Georgia data for further analysis.
1. CDC data
CDC: Long format conversion
Before we convert the CDC data to long format to facilitate time series analysis, we check for missing data patterns.
The missing values reflect reported weekly counts of less than 10 deaths that are intentionally “suppressed” by the CDC.
In the CDC’s 2019-2020 table (Dec 30, 2020 update), the “all cause” column is missing only CT week 51 and NC week
43. Many more values are missing for specific causes of death because the states take longer to report such details. We
replace all missing numeric values with 0.
Prepare for DT melt. Melting converts the data from “wide” to “long” format. Long is better for time series analysis.
Check for any remaining missing Deaths
0 rows
2. Census data
Rename some vars & create 2020 population estimate = 2019 pop estimate.
Convert to long format
Extract year from variable variable.
3. Join long CDC & Census tables
The DeathsPM variable was originally intended to report deaths per million but was repurposed to report deaths per
thousand. Much of the literature uses per thousand and thousands are more relateable than millions.
4. CDC Excess deaths
This data transform chunk supports analysis of the CDC’s Excess Deaths worksheet. First, we import state abbrevs &
join with CDC_C10_Ex & PopEst tables. Then we create per-thousand variables like DPTmu (mean deaths per
thousand), EDPT (excess deaths per thousand), and REDPT (reported excess deaths per thousand).
5. GA DPH vs weighted CDC
Code
Code
Code
Code
Code
Code
Code
Code
Code
Code
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 26/28
D. Functions
1. See, e.g., Judea Pearl & Dana Mackenzie, The Book of Why: The New Science of Cause and Effect (2018).
2. See “Principles and Pitfalls: a Guide to Death Certification,” Clin Med Res
(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4504663/#). 2015 Jun; 13(2): 74-82.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4504663/
(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4504663/).
3. The CDC’s 65-page instruction book, https://www.cdc.gov/nchs/data/misc/hb_cod.pdf
(https://www.cdc.gov/nchs/data/misc/hb_cod.pdf), explains how to complete the death certificate at
https://www.cdc.gov/nchs/data/dvs/blue_form.pdf (https://www.cdc.gov/nchs/data/dvs/blue_form.pdf).
4. SAS 143, para. 7 (AICPA 2020).
5. “Finally, the estimates of excess deaths reported here may not be due to COVID-19, either directly or indirectly.”
Technical Notes, https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm
(https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm).
6. See <https://www.cdc.gov/mmwr/volumes/69/wr/mm6942e2.htm>
(https://www.cdc.gov/mmwr/volumes/69/wr/mm6942e2.htm). The 198,081 “attributed” to Covid-19 apparently
represents deaths what the CDC calls “confirmed or presumed COVID-19 deaths.” By “confirmed,” the CDC
means that the decedent at some point received a positive PCR test result, suggesting the presence of SARS-
COV-2 virus.
7. “The cycle threshold (Ct) needed to detect virus is inversely proportional to the patient’s viral load. Where test
results do not correspond with the clinical presentation, a new specimen should be taken and retested using the
same or different NAT technology.”
8. See https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm
(https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm).
9. “Negative values, where the observed count fell below the thresholds, were set to zero… The total number of
excess deaths in each state was calculated by summing the excess deaths in each week, from February 1, 2020
to present.” https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm
(https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm).
10. The underlying CDC figures are in the worksheet at https://data.cdc.gov/api/views/xkkf-xrst/rows.csv?
accessType=DOWNLOAD&bom=true&format=true%20target= (https://data.cdc.gov/api/views/xkkf-xrst/rows.csv?
accessType=DOWNLOAD&bom=true&format=true%20target=)
11. For its national estimate, the CDC omitted North Carolina because of incomplete reporting.
12. The surveillance package was built to assist public health authorities in “lessening disease burden by, e.g., timely
recognizing emerging outbreaks in case of infectious diseases.” In other words, it was created to locate epidemic
outbreaks for possible short-term intervention, not to estimate excess deaths for long-term policy formulation. See
Maëlle Salmon, Dirk Schumacher & Michael Höhle, “Monitoring Count Time Series in R: Aberration Detection in
Public Health Surveillance,” J. Statistical Software (2016), https://www.jstatsoft.org/article/view/v070i10
(https://www.jstatsoft.org/article/view/v070i10).
13. See https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm
(https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm).
Code
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 27/28
14. Georgia is a sample of one. We would need to test this correlation for more states in order to reach a more reliable
conclusion on its meaning. Note, also, that the WHO’s January 20, 2021 Notice
(https://www.who.int/news/item/20-01-2021-who-information-notice-for-ivd-users-2020-05) on Covid-19 diagnosis
calls into question the accuracy of Covid-19 death reports from any state, like Georgia, which relies primarily on a
single PCR test positive for classifying “cases” and deaths as Covid-19.
15. CDC, Excess Deaths Associated with COVID-19, Technical Notes,
https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm
(https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm)
16. See, generally, Austin Frakt, “Putting a Dollar Value on Life? Governments Already Do,” NYT (May 11, 2020),
https://www.nytimes.com/2020/05/11/upshot/virus-price-human-life.html
(https://www.nytimes.com/2020/05/11/upshot/virus-price-human-life.html).
17. Sunetra Gupta discusses the epidemiological “social contract” at 1:46 - 1:51 of the video at
https://gbdeclaration.org/video/ (https://gbdeclaration.org/video/).
18. I gather from the footnotes (https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm), that “Farrington”
means farringtonFlexible, discussed in the surveillance package user manual (https://cran.r-
project.org/web/packages/surveillance/surveillance.pdf). Other possibilities are algo.farrington and algo.cdc
(based on Farrington).The Farrington algo assumes that events (e.g., deaths) occur frequencies prescribed by the
Poisson distribution, which is nicely explained in this zedstatistics video (https://www.youtube.com/watch?
v=cPOChr_kuQs). It would help greatly if the CDC were to publicly share its code.
19. Salmon M, Schumacher D, Hohle M. Monitoring Count Time Series in R: Aberration Detection in Public Health
Surveillance. Journal of Statistical Software 2016;70(10):1-35, 14.
20. The analysis was inspired in part by Genevieve Briand’s November 2020 presentation
(https://www.youtube.com/watch?v=3TKJN61aflI&feature=youtu.be) at John Hopkins University in which Briand
suggested that Covid-19 should not be credited with any excess deaths in the United States.
21. CDC data sources: https://data.cdc.gov/NCHS/Provisional-COVID-19-Death-Counts-by-Sex-Age-and-W/vsak-wrfu
(https://data.cdc.gov/NCHS/Provisional-COVID-19-Death-Counts-by-Sex-Age-and-W/vsak-wrfu),
https://data.cdc.gov/NCHS/Weekly-Counts-of-Deaths-by-State-and-Select-Causes/3yf8-kanr,
(https://data.cdc.gov/NCHS/Weekly-Counts-of-Deaths-by-State-and-Select-Causes/3yf8-kanr,) and
https://data.cdc.gov/NCHS/Weekly-Counts-of-Deaths-by-State-and-Select-Causes/muzy-jte6.
(https://data.cdc.gov/NCHS/Weekly-Counts-of-Deaths-by-State-and-Select-Causes/muzy-jte6.)
22. See https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm
(https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm).
23. Census data sources: http://www2.census.gov/programs-surveys/popest/datasets/2010-2019/national/totals/nst-
est2019-alldata.csv (http://www2.census.gov/programs-surveys/popest/datasets/2010-2019/national/totals/nst-
est2019-alldata.csv), and https://www.census.gov/data/tables/time-series/demo/popest/2010s-total-cities-and-
towns.html#ds (https://www.census.gov/data/tables/time-series/demo/popest/2010s-total-cities-and-
towns.html#ds).
24. Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J,
Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K,
Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software,
4(43), 1686. doi: 10.21105/joss.01686 (https://doi.org/10.21105/joss.01686).
25. Smooth lines were fitted using LOESS smoothing implemented in R through the geom_smooth function of the
ggplot2 package. LOESS smoothing is “a non-parametric form of regression that uses a weighted, sliding-window
average to calculate a line of best fit.” Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-
Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org (https://ggplot2.tidyverse.org).
1/28/2021 Home Depot, Hogwarts & Excess Deaths at the CDC
https://kschulzke.github.io/C19/CDC_C19_Excess19.nb.html# 28/28
26. The CDC’s National and State Estimates of Excess Deaths worksheet is available in csv format at
https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm
(https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm).
ResearchGate has not been able to resolve any citations for this publication.
↩ 2. See "Principles and Pitfalls: a Guide to Death Certification
  • E G See
  • The Dana Mackenzie
  • Book
  • Why
See, e.g., Judea Pearl & Dana Mackenzie, The Book of Why: The New Science of Cause and Effect (2018).↩ 2. See "Principles and Pitfalls: a Guide to Death Certification," Clin Med Res (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4504663/#). 2015 Jun; 13(2): 74-82. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4504663/ (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4504663/).↩
In other words, it was created to locate epidemic outbreaks for possible short-term intervention, not to estimate excess deaths for long-term policy formulation. See Maëlle Salmon, Dirk Schumacher & Michael Höhle
For its national estimate, the CDC omitted North Carolina because of incomplete reporting.↩ 12. The surveillance package was built to assist public health authorities in "lessening disease burden by, e.g., timely recognizing emerging outbreaks in case of infectious diseases." In other words, it was created to locate epidemic outbreaks for possible short-term intervention, not to estimate excess deaths for long-term policy formulation. See Maëlle Salmon, Dirk Schumacher & Michael Höhle, "Monitoring Count Time Series in R: Aberration Detection in Public Health Surveillance," J. Statistical Software (2016), https://www.jstatsoft.org/article/view/v070i10 (https://www.jstatsoft.org/article/view/v070i10).↩
Putting a Dollar Value on Life? Governments Already Do
  • Generally See
  • Austin Frakt
See, generally, Austin Frakt, "Putting a Dollar Value on Life? Governments Already Do," NYT (May 11, 2020), https://www.nytimes.com/2020/05/11/upshot/virus-price-human-life.html (https://www.nytimes.com/2020/05/11/upshot/virus-price-human-life.html).↩
John Hopkins University in which Briand suggested that Covid-19 should not be credited with any excess deaths in the United States.↩ 21. CDC data sources
  • M Salmon
  • D Schumacher
  • M Hohle
Salmon M, Schumacher D, Hohle M. Monitoring Count Time Series in R: Aberration Detection in Public Health Surveillance. Journal of Statistical Software 2016;70(10):1-35, 14.↩ 20. The analysis was inspired in part by Genevieve Briand's November 2020 presentation (https://www.youtube.com/watch?v=3TKJN61aflI&feature=youtu.be) at John Hopkins University in which Briand suggested that Covid-19 should not be credited with any excess deaths in the United States.↩ 21. CDC data sources: https://data.cdc.gov/NCHS/Provisional-COVID-19-Death-Counts-by-Sex-Age-and-W/vsak-wrfu (https://data.cdc.gov/NCHS/Provisional-COVID-19-Death-Counts-by-Sex-Age-and-W/vsak-wrfu), https://data.cdc.gov/NCHS/Weekly-Counts-of-Deaths-by-State-and-Select-Causes/3yf8-kanr, (https://data.cdc.gov/NCHS/Weekly-Counts-of-Deaths-by-State-and-Select-Causes/3yf8-kanr,) and https://data.cdc.gov/NCHS/Weekly-Counts-of-Deaths-by-State-and-Select-Causes/muzy-jte6. (https://data.cdc.gov/NCHS/Weekly-Counts-of-Deaths-by-State-and-Select-Causes/muzy-jte6.)↩
Smooth lines were fitted using LOESS smoothing implemented in R through the geom_smooth function of the ggplot2 package. LOESS smoothing is "a non-parametric form of regression that uses a weighted, sliding-window average to calculate a line of best fit
  • H Wickham
  • M Averick
  • J Bryan
  • W Chang
  • L D Mcgowan
  • R François
  • G Grolemund
  • A Hayes
  • L Henry
  • J Hester
  • M Kuhn
  • T L Pedersen
  • E Miller
  • S M Bache
  • K Müller
  • J Ooms
  • D Robinson
  • D P Seidel
  • V Spinu
  • K Takahashi
  • D Vaughan
  • C Wilke
  • K Woo
  • H Yutani
Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). "Welcome to the tidyverse." Journal of Open Source Software, 4(43), 1686. doi: 10.21105/joss.01686 (https://doi.org/10.21105/joss.01686).↩ 25. Smooth lines were fitted using LOESS smoothing implemented in R through the geom_smooth function of the ggplot2 package. LOESS smoothing is "a non-parametric form of regression that uses a weighted, sliding-window average to calculate a line of best fit." Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org (https://ggplot2.tidyverse.org).↩