Content uploaded by Vitor Hugo Moreau Cunha
Author content
All content in this area was uploaded by Vitor Hugo Moreau Cunha on Sep 27, 2022
Content may be subject to copyright.
Model Assisted Statistics and Applications 16 (2021) 5–14 5
DOI 10.3233/MAS-210510
IOS Press
Using the Weibull distribution to model
COVID-19 epidemic data
Vitor Hugo Moreau
Department of Biotechnology, Institute of Health Sciences, Federal University of Bahia, Av. Reitor Miguel Calmon,
sn, Vale do Canela, Salvador, BA, Brazil
Tel.: +55 71 98175 5469; E-mail: vitorhmc@ufba.br
Abstract.
COVID-19 is a severe acute respiratory syndrome caused by the new Coronavirus. COVID-19 outbreak is a Public
Health Emergency of International Concern, declared by WHO, that killed more than 2 million people worldwide. Since there are
no specific drugs available and vaccination campaigns are in the initial phase, or even have not begun in some countries, the main
way to fight the outbreak worldwide is still based on non-pharmacological strategies, such as the use of protective equipment,
social isolation and mass testing. Modeling of the disease epidemics have gained pivotal importance to guide health authorities on
the decision making and applying of those strategies. Here, we present the use of the Weibull distribution to model predictions
of the COVID-19 outbreak based on daily new cases and deaths data, by non-linear regression using Metropolis-Markov Chain
Monte Carlo simulations. It was possible to predict the evolution of daily new cases and deaths of COVID-19 in many countries as
well as the overall number of cases and deaths in the future. Modeling predictions of COVID-19 pandemic may be of importance
on the evaluation of governments and health authorities mitigation procedures, since it allows one to extract parameters that may
help to guide those decisions and measures, slowing down the spread of the disease.
Keywords: COVID-19, Weibull distribution, modelling, model, death toll
1. Introduction
Since the World Health Organization has declared the COVID-19 outbreak a Public Health Emergency of Interna-
tional Concern, in January, 30
th
, more than 93 million cases and 2 million deaths were registered worldwide. Many
efforts are being done for discovering and developing therapeutic strategies against COVID-19. In despite of global
initiatives to search for treatments and vaccines, the main tool for slowing down the spread of the disease throughout
communities still are social isolation, personal hygiene and mass testing.
Before the development of efficient vaccines, many non-pharmacological strategies have been proposed to fight
COVID-19. Most of them are based on slowing the virus spread by self-care measures, as the use of personal
protection equipment, mass testing and restriction of social contact, through patient quarantine, social isolation
and lock down. Despite many discussion, social isolation and mass lock down measures have been described as
successfully strategies for slowing the virus spreading (Anderson et al., 2020; Lau et al., 2020; Mitjà et al., 2020;
Saez et al., 2020; Sjödin et al., 2020; Wilder-Smith & Freedman, 2020). Mass testing has been shown to be one of the
most effective strategies, since it allows tracing precisely the contact network of each contaminated person and apply
isolation and quarantine measures. Success of some countries, such as South Korea, in slowing down the spread of
the virus has been attributed to mass testing and selective quarantine (Choi, 2020).
Comparison of the evolution of epidemic curves among countries may be of pivotal importance to predict the effect
of the mitigation measures taken. It is possible, based on the data analysis, to model the evolution of the disease
and to predict the number of infected, healed and deceased people along days and weeks. Such predictions may be
extremely helpful in the decision taking by health authorities. Many papers have presented predictions of the epidemic
evolution by different methods (Ciufolini & Paolozzi, 2020; Gupta et al., 2020; Kim et al., 2020; Li et al., 2020). In
this work, we have used the Weibull distribution in a selected set of data, depicting the number of daily new cases and
ISSN 1574-1699/$35.00 c
2021 – IOS Press. All rights reserved.
AUTHOR COPY
6V.H. Moreau / Using the Weibull distribution to model COVID-19 epidemic data
deaths, from countries that present distinct epidemic patterns. Weibull distribution is one of the most commonly used
parametric lifetime model (Lawless, 2003), mostly for its parsimony, its ability to satisfactorily model data which are
commonly encountered in survival analysis and its availability in statistical software packages (Khan, 2018; Lawless,
2003). We believe that data from daily new cases and deaths of COVID-19, as well as from other epidemic outbreaks,
may be modeled by the Weibull distribution, resulting in valuable information to be used for supporting mitigation
measures taken by governments and health authorities worldwide.
2. Materials and methods
Data on the daily number of confirmed new cases and deaths, for every country, were extracted from Our World
in Data project (Roser et al., 2020) as comma-spaced values (CSV) files, processed with R (R Core Team, 2013)
using Rstudio 1.2.5042 (RStudio Team, 2020) for Linux. Data were subset in order to select countries names, dates
of registers and the number of daily new cases and deaths for each country, since the beginning of the pandemic
(December, 31
st
2019) up to January, 7
th
2021. In order to perform proper statistics, data were subset to use only
those countries in which more than 3,000 deaths were registered until the day of the data collection.
Data on daily new cases and daily new deaths were adjusted to a 4-parameter Weibull distribution using Markov
Chain Monte Carlo simulation (MCMC). A modified Weibull 4-parameter equation was used to adjust the data to the
prediction model, described as:
f(t) =
0, τ 6γ
αβ
ηt−γ
ηβ−1
e−(t−γ
η)β
,otherwise (1)
where
t
is the time;
f(t)
is the number of new cases or new deaths as a function of
t
;
α
is the area under the curve
(sum of total cases or deaths),
γ
is the location parameter,
β
and
η
are the shape and the scale Weibull’s parameter,
respectively.
Some cases in which data calculations required the use of bimodal Weibull distribution will be presented bellow. In
such cases, a bimodal Weibull distribution, adapted from Eq. (1), was used:
f(t) =
0, τ 6γ
α"β
ηt−γ
ηβ−1
e−(t−γ
η)β
+β0
η0t−γ0
η0β0−1
e−t−γ0
η0β0#,otherwise (2)
where
t, f (t)
,
α
,
β
,
γ
and
η
have the same meanings as in the Eq. (1);
β0
,
γ0
and
η0
are the shape, location and scale
parameters of the second mode of the Weibull distribution, respectively.
2.1. Markov Chain Monte Carlo Simulations
Markov Chain Monte Carlo Simulations (MCMC) were performed using random walk Metropolis algorithm
(Metropolis et al., 1953) within a five dimensional space to accommodate
β
,
η
,
γ
and
α
parameters, as well as
the standard deviation (SD). When bimodal distributions were used, the calculations were performed in an eight
dimension space (
α
,
β
,
η
,
γ
,
β0
,
η0
,
γ0
and SD). Prior distributions used in MCMC were normal for
η
and uniform
for
β
,
γ
,
α
and SD. The log of the likelihoods were determined from 10,000 iterations, with a 5,000 iterations burn
out period. Parameters were sampled from a normal proposal distribution averaged in the value of the parameter
in the subsequent iteration. The standard deviations of the proposal distributions were set to 1.5% of the given
parameters’ values, since this value was described to give an acceptance ratio around 0.23 among MCMC iterations.
The accepatance ratio of 0.23 has been previously demonstrated to maximize the efficiency of the Metropolis-MCMC
algorithm (Roberts et al., 1997).
2.2. Starting parameters for Metropolis-MCMC
Selection of the starting parameters for the Metropolis-MCMC procedures is a key feature for the efficiency of the
simulation. These initial values should not be too far away from a typical set of parameters (where posterior density
is high) because the Metropolis-MCMC algorithms would need too many iterations to reach the convergence if the
initial values are far in the tail of the posterior distribution (Korner-Nievergelt et al., 2015). Since 4-parameter Weibull
AUTHOR COPY
V.H. Moreau / Using the Weibull distribution to model COVID-19 epidemic data 7
distribution was specifically used to model COVID-19 epidemic curves, we have used simple rules, based on the
analysis of the graphical role of the Weibull parameters (Eq. (1)), as described below:
i.
Location parameter (
γ
): shifts the beginning of the distribution to higher
t
values. In our data, it represents the
time lapse before the firsts cases/deaths arise, or before the raising of the exponential growing of daily new
cases and deaths. Data sets used in this work register COVID-19 daily new cases and deaths since December,
31
st
2019. However, most countries registered their first cases and deaths only in latter dates. Thus, there is
a lag of zeroes (or near zeroes) registered before the raise of the first case/death. The starting
γ
value for
Metropolis-MCMC procedures (
γ0
) was empirically calculated as the first day in which more than 5% of the
actual maximum number of daily new cases or deaths was registered, or the magnitude of the vector
ti
, from
t= 0 to the time value in which 5% of the maximum number of cases/deaths was registered, given by:
γ0=k−→
tikf(ti)=0.05 max(f(t))
0(3)
where
γ0
is the location parameter at iteration 0 and
f(ti)
is the number of daily new cases or deaths as a
function of t.
ii.
Scale parameter (
η0
): starting value for the Weibull’s scale parameter was set as the mean of the vector
t
from
its ith element equals to γ0to tn, described as:
η0=1
(n−γ0)Zi=n
ti=γ0
−→
ti(4)
iii.
Starting values for the area parameter (
α0
) was set to the sum of the number of cases or deaths and for the
shape parameter (
β0
) was set to 2.5 for all countries, since typical Weibull shape parameter that fits to most of
the COVID-19 epidemic data ranges from 1 to 5 (data not shown).
2.3. Weibull4 R package
Fitting procedures described in this paper were summarized in a R language package named “weibull4”, designed
to fit epidemiological data, in special for COVID-19, using Weibull 4-parameter equation (Eq. (1)) by Metropolis-
MCMC algorithm. The package weibull4 is available for downloading and installing at the Comprehensive R Archive
Network (CRAN) repository (R Core Team, 2013).
2.4. Supplementary material
Figure S1, described in the text, is available as Supplementary Material. The R script called “Moreau_weibull_
2021”, with the codes for every calculation and plots in this paper is available in the Code Ocean server (codeo-
cean.com).
3. Results
Data analysis can be of outstanding importance during infection diseases outbreaks, mainly if fast decision making
is crucial to slow down the spread of the disease. Modeling of the course of the COVID-19 pandemic in highly
affected countries is a live-saving demand (Eberhardt et al., 2020; Verma et al., 2020), since it can be used to support
and guide decision makers to quickly act and block the spread of the disease. Data on daily new cases and daily new
deaths were extracted from Our World in Data project on Coronavirus (Roser et al., 2020). Data from countries that
faced the COVID-19 pandemics earlier, such as Italy, France, Spain, etc, formed a well defined single peak in a first
moment. Such pattern allowed us to evaluate statistical modeling to proper fit the data. The Weibull distribution was
chosen for this goal because of its potential in modeling life time events (Lawless, 2003). Such analysis allows us to
forecast predict epidemiological outcomes, as death toll and the future number of daily new cases and deaths in the
studied countries.
Figure 1 shows plots for the initial single peaks of daily new cases and deaths of COVID-19 registered in Italy,
as well as the curve fits calculated for them. As seen, the Weibull distribution can properly fit the evolution of the
epidemic peak and to be used to model and to forecast predict the number of daily new cases and deaths. Panel A of
the Fig. 1 also illustrates the positions of the scale parameter (
γ
) and the mode of the distribution (
Mof(t)
), what may
also be called
tmax
, that stands for the time (in days) in which the maximum number of cases or deaths were (or will
be) registered or, yet, the maximum turning point, given by:
AUTHOR COPY
8V.H. Moreau / Using the Weibull distribution to model COVID-19 epidemic data
Fig. 1. Panel A: Profile of the first wave of daily new cases (open circles) and deaths (closed circles) of COVID-19 in Italy. Data were fit within an
unimodal 4-parameter Weibull distribution (lines). Arrows show the
γ
parameter, corresponding to the beginning of the exponential growth of new
cases or deaths; and the mode of the distribution, corresponding to the average day in which the largest number of cases or deaths are observed.
Calculated Weibull’s shape, scale, location and area parameters are shown in Panels B, C (upper lines), C(lower lines) and D, respectively, as
a functions of the Metropolis-MCMC iterations. Lines converging to the same average values in each panel correspond to the same parameter
calculation, with distinct starting values.
Mof(t) =
γ, β 61
γ+ηβ−1
β1/β
, β > 1(5)
where
Mof(t)
is the mode of the number of daily new cases or deaths distribution (
f(t)
). Additionally, Fig. 1 shows
the posterior values for the Weibull parameters along the Metropolis-MCMC simulations iterations (Panels B, C and
D). Convergences are reached before the ending of the burning out period and the posterior distributions converge to
the same average values even if distinct starting values were chosen. This could be observed for every parameter
(converging lines in Panels B, C and D), suggesting that the Weibull-directed Metropolis-MCMC performed here is a
suitable procedure to properly fit the daily new cases and deaths data of COVID-19.
Recently, most countries have entered in a second wave of COVID-19 infections. Due to this second wave, such
countries began to present a multimodal pattern of daily new cases and deaths. Additionally, some countries have
shown more complex patterns, with more than two mixed waves. Such puzzling patterns make harder, or even
impracticable, to perform proper statistic analysis and predictions. Even so, such complex data on COVID-19 daily
new cases and deaths could be analyzed if multimodal distributions were used. This is possible by splitting the data
by date to perform the analysis with two Weibull distributions, being one before and one after the splitting date.
Optional arguments were included into the weibull4 R package (see Material and Methods) in order to allow users to
choose the dates for split the data in two parts, as well as to set the unimodal or bimodal Weibull distribution to be
used before and after the split date. More explanations about such arguments functionality are described under the
package documentation files (not shown).
AUTHOR COPY
V.H. Moreau / Using the Weibull distribution to model COVID-19 epidemic data 9
Fig. 2. Data on daily new cases and deaths of COVID-19 from nine selected countries used as examples of customized analysis. Upper panels
show countries in which two unimodal Weibull distributions were used, with split date on August, 1
st
. (Belgium, Canada and Germany), i.e., Mo1
=
1 and Mo2
=
1. Middle panels display countries in which a single bimodal Weibull distribution was used for data fitting (Bolivia, Brazil and
Russia), i. e., without splitting the data and Mo1
=
2. Lower panels show countries in which two Weibull distributions were used – one unimodal
and one bimodal – with split date on September, 1
st
(Serbia and United States), and bimodal distribution up to the split date and an unimodal
distribution from the split date foward (Mo1
=
2, Mo2
=
1); and United Kingdom, with an unimodal distribution up to the split date and a bimodal
distribution from the split date forward (Mo1 =1, Mo2 =2).
Examples on the data analysis performed with the Weibull distribution within the countries COVID-19 daily new
cases and deaths data are shown in Fig. 2, in three distinct ways: i. Upper panels display countries in which data were
analyzed by two separated unimodal distribution (Belgium, Canada and Germany), with data splitting on Sep, 1
st
;
ii. Middle panels display the analysis using a single bimodal distribution for the whole data set (Bolivia, Brazil and
Russia), without split date; iii. Lower panels show the data analysis using one unimodal distribution and one bimodal
distribution, with data splitting in Aug, 1
st
(Serbia, United Kingdom and United States). Non-linear regressions
performed with single or double, both unimodal and bimodal Weibull distributions look to fit the COVID-19 daily
new cases and deaths data in a proper fashion. Split date, as well as the number of modes of the Weibull distribution
to be used can be selected from each countries data. Such parameters may be chosen in order to reach better fit quality
from daily new cases and deaths data. The split dates used in chart calculations of the Fig. 2 were manually selected
to the deepest valley between two COVID-19 infection waves, both in daily new cases and deaths data.
The suitability of the model can be further evaluated by the residuals and by the Determination Coefficient
(
R2
) of the regressions. Figure 3 shows the distribution of the fit residuals of the daily new cases (Panel A) and
deaths (Panel B) of COVID-19 in all studied countries. Lines in Panels A and B represent normal distributions with
means and standard deviations (SD) for each respective panel data. In both cases and death data, the residuals are
narrowed distributed around zero when compared to a normal distribution with the same mean and SD of the residuals
distribution (lines). Panels C and D display the residuals correlation plots between the actual number of daily new
cases and deaths values versus the Weibull distribution estimates for all studied countries. As shown, residuals plot
are well distributed around the slope =1, intercept =0 straight line, although it seams to present a large number of
outliers (Fig. 3, panels C and D). Additionally, Table 1 shows the value for the
R2
of every countries data fit. As
displayed, with rare exceptions, all fitted data resulted in R2values greater than 0.6.
AUTHOR COPY
10 V.H. Moreau / Using the Weibull distribution to model COVID-19 epidemic data
Fig. 3. Distributions of the residuals of the fitted data for daily new cases (Panel A) and deaths (Panel B) of COVID-19 calculated with customized
analysis for each country. Split dates, as well as the number of modes in each used Weibull distribution were as described in Table 1. Normal
distributions, calculated with the same means and SD of each respective residuals distributions, are shown in lines. Panels C and D show the
residuals plots of daily new cases and deaths for countries versus the estimated fits in custom mode, as described in Table 1. Straight lines were
draw with slope =1 and intercept =0, merely to guide the eyes.
3.1. Parameter extraction from Weibull Metropolis-MCMC
Modeling of natural processes is of primordial importance for predicting forecast tendencies of similar phenomena
in the future, as well as for extracting information from the model that allows one to better understand it. The
estimated death toll is one of the parameters that can be extracted from the calculations used to model the COVID-19
data here. Modeling of the COVID-19 curves of daily new cases and deaths allows us to predict both the number of
daily new cases and deaths in the future, as well as the overall death toll for COVID-19 in a given country. Table 1
shows the total expected death tolls for COVID-19 for all the studied countries. Data fitting were performed, for each
country, using the split dates and the number of modes for the first (Mo1) and the second (Mo2) distributions, as
displayed in Table 1 (Mo1
=
1, representeing unimodal Weibull distribution and Mo1
=
2 bimodal distribution).
Modeling of the COVID-19 data can be customizing adjusted for each countries data, in order to set the split date and
the number of modes of the used distributions, as well as to be reevaluated day-by-day, as new data emerge. Such
analytical model may be a worthfull tool to evaluate and guide the health authorities and governments response to
COVID-19 and to other epidemics in the future.
4. Discussion
COVID-19 is a global health emergency that is going to change the way in which people, institutions and
governments manage and execute their lives and duties. The fact that specific drugs or vaccines for COVID-19
have only been developed recently, raises the importance of behavioral strategies, as social isolation, lock-down
(Anderson et al., 2020; Lau et al., 2020; Mitjà et al., 2020; Saez et al., 2020; Sjödin et al., 2020; Wilder-Smith &
AUTHOR COPY
V.H. Moreau / Using the Weibull distribution to model COVID-19 epidemic data 11
Table 1
Estimated overall death tolls calculated from customized analysis, updated to January, 7
th
2021, for every country. Overall death tolls represent the
integer area under the regression curves shown in Fig. S1 (
α
parameter of Eqs (1) or (2)). Errors were calculated from standard deviations of the
last 5,000 iterations of Metropolis-MCMC simulation (see Material and Methods). Determination Coefficients (
R2
) for the data fitting, as well as
the split date and the number of modes of the Weibull distribution used to fit the data before (Mo1) and after (Mo2) the split date are also shown
for each country (1 corresponds to unimodal – Eq. (1) – and 2 to bimodal – Eq. (2) – Weibull distribution).
Country Death toll R2Split date Mo1 Mo2 Country Death toll R2Split date Mo1 Mo2
Argentina 68,512 ±1,505 0.8452 – 2 – Italy 118,980 ±8,500 0.9409 Jul/01 1 2
Austria 8,952 ±630 0.8576 Jun/01 1 1 Japan 8,304 ±1,132 0.8586 Jun/01 1 2
Bangladesh 11,902 ±476 0.7970 – 2 – Jordan 8,311 ±283 0.9510 – 1 –
Belgium 24,180 ±1,228 0.8536 Jul/01 1 2 Mexico 202,567 ±7,053 0.6970 – 2 –
Bolivia 15,752 ±298 0.8185 – 2 – Moldova 5,584 ±785 0.7228 Jul/01 2 2
Bosnia and
Herzegovina
4,785 ±296 0.7477 Jun/01 2 2 Morocco 13,735 ±956 0.9134 Jun/05 1 1
Brazil 358,586 ±15,536 0.6423 – 2 – Netherlands 17,559 ±1,134 0.8089 Jul/01 1 2
Bulgaria 10,291 ±558 0.8125 Oct/01 1 1 Pakistan 20,151 ±2,266 0.6464 Sep/01 1 1
Canada 34,380 ±3,541 0.8252 Jul/01 1 1 Panama 13,142 ±1,444 0.8265 May/15 1 2
Chile 20,155 ±1,202 0.4935 Aug/01 1 2 Peru 51,300 ±3,463 0.5195 – 2 –
China 4,058 ±478 0.7175 May/01 1 2 Philippines 12,175 ±1,055 0.4125 May/20 2 1
Colombia 73,053 ±2,281 0.8549 – 2 – Poland 34,976 ±5,243 0.7739 Jul/01 2 2
Croatia 7,812 ±357 0.9710 Aug/05 2 2 Portugal 10,622 ±593 0.9513 Aug/10 2 2
Czechia 24,690 ±1,693 0.9267 Jul/15 2 2 Romania 18,194 ±1,023 0.9192 Jun/01 1 2
Ecuador 20,980 ±764 0.0168 – 2 – Russia 83,294 ±2,799 0.9478 – 2 –
Egypt 23,975 ±2,644 0.8569 Oct/05 2 1 Saudi Arabia 11,323 ±515 0.8250 – 1 –
France 86,646 ±5,763 0.6716 Jul/01 1 2 Serbia 5,205 ±372 0.9686 Sep/15 2 1
Germany 213,074 ±26,463 0.8010 Jul/01 1 1 South Africa 118,902 ±26,752 0.7654 Oct/01 1 1
Greece 6,906 ±207 0.9597 Jun/01 1 2 Spain 108,474 ±13,022 0.4896 Jun/15 1 2
Guatemala 6,178 ±180 0.5920 – 2 – Sweden 15,763 ±1,193 0.4741 Sep/01 2 1
Honduras 5,547 ±208 0.3671 – 2 – Switzerland 13,565 ±1,271 0.7349 Jul/01 1 1
Hungary 15,525 ±855 0.9617 Jul/01 1 1 Tunisia 6,234 ±626 0.5676 Jun/01 1 1
India 266,570 ±8,443 0.8933 – 1 – Turkey 137,267 ±31,191 0.9289 Aug/01 2 1
Indonesia 53,125 ±2,335 0.8647 – 2 – Ukraine 26,986 ±3,135 0.8644 Jun/01 1 2
Iran 58,670 ±1,270 0.9638 May/01 1 2 United Kingdom 123,536 ±4,698 0.8347 Jul/01 1 2
Iraq 30,346 ±473 0.9101 – 1 – United States 1,209,893 ±193,330 0.7741 Sep/01 2 1
Israel 4,968 ±353 0.8405 Aug/15 2 2
Freedman, 2020) and mass testing (Choi et al., 2020; Peto, 2020; Salath et al., 2020) to keep fighting the pandemic.
In this scenario, modeling and forecast predicting the course of the pandemic play an important role by providing
information for evaluating the measures taken by governments and health authorities. Parameters extracted from
modeling and forecast predictions may be used to determine better strategies to mitigate the impact of infection
diseases in the population (Verma et al., 2020). With this in mind, we proposed the use of the Weibull distribution to
model data on daily new cases and death of COVID-19 pandemic from some selected countries. In our previous work,
the Weibull distribution has been used to model forecast predictions of COVID-19 data in Brazil (Moreau, 2020).
From our knowledge, that was the first time in which such approach was used with this end, and the present work is
the first report of the use of the Weibull distribution to model COVID-19 data in a sistematic worldwide analysis.
Weibull distribution has been shown to fit well to a COVID-19 daily new cases and deaths single peak. Figure 1
displays the daily new cases and deaths data from the first wave of COVID-19 infections in Italy. Italy was chosen
because it was one of the countries that displayed a well defined single peak of new cases and deaths, probably
because of strict lock downs and wide mass testing measures taken in response to the first wave of infections. This
pattern allows us to use the Italy data to evaluate the application of the 4-parameter Weibull distribution to fit the
COVID-19 epidemic data. Similar results could be obtained when data from the first peak of dailly new cases and
deaths were analyzed in other countries that displayed a clear initial single wave of infections, such as Belgium,
Canada, China, France, Germany, Netherlands, Portugal, Spain, Switzerland and United Kingdom (data not shown).
Figure 2 shows examples of five distinct customized ways to model the daily new cases and deaths for COVID-19,
depending on the pattern of the countries epidemic curve. It was possible to model the data by performing non-linear
curve fitting with in a single unimodal Weibull distribution (Eq. (1)), within a single bimodal distributions (Eq. (2)) or
within two Weibull distribution. Two distributions can be applied to the modeling calculation by splitting the data in a
given date. Actually, the split date may be set to a day in the deepest valley between the end of one epidemic wave and
AUTHOR COPY
12 V.H. Moreau / Using the Weibull distribution to model COVID-19 epidemic data
the beginning of another wave. Figure 2 brings examples of the customized analysis in which the fitting parameters
were set to reach better fit results. Data from Belgium, Canada and Germany (upper panels), were modeled within two
unimodal Weibull distribution and split point at August, 1
st
; from Bolivia, Brazil and Russia (middle panels), analyzed
with one single bimodal Weibull distribution (Eq. (2)); from Serbia and United States (lower panel), analyzed with
two Weibull distributions – one bimodal up to September, 1
st
(split date) and one unimodal from September, 2
nd
up;
and, finally, from United Kingdom (lower panel), analyzed by one unimodal distributions up to September, 1
st
(split
date) and with one bimodal distribution from September, 2
nd
forward. The split date, as well as the number of modes
of the Weibull distribution to be used before (Mo1) and after (Mo2) the split date may be chosen in order to reach
better quality of the data fitting. Table 1 shows the splitting date and the number of modes of the used distributions
(Mo1 and Mo2), as well as the Determination Coefficient (
R2
) for each countries data fit. As also seen, most fitting
procedures showed here have reached a good fit quality, with
R2
values over 0.6, confirming that the data analysis
procedures using the 4-parameter Weibull distribution is a suitable model for COVID-19 data fitting.
Data analysis of the results presented did not allow us to determine correlations between the goodness of the fit and
any key parameter related to the response measures from the countries governments to the COVID-19 pandemic,
such as the number of deaths per million or the Oxford COVID-19 Government Response Tracker (Hale et al., 2020),
for instance (data not shown), though the goodness of fit might be associated to misconduct data collection and
processing. Well defined peaks for daily new cases and deaths, with the clear presence of ascending and descending
phases, may tend to conform better to the unimodal Weibull distribution, as shown in Fig. 1A. We speculate that fuzzy
patterns for the daily new cases and deaths, presented by some countries, might be associated to misleading strategies
taken by such country to fight COVID-19 pandemic, what would make the number of daily new transmissions
strongly vary, due to undesired spreading of the virus through out the community. This misconducting might lead to
what is called “multiple waves” of the disease. In cases in which multiples waves are present, alternative ways to
perform the Weibull analysis were presented here (Fig. 2 and Table 1).
Figure S1 (Supplementary Material) shows the customized analysis, performed with a single (no split date) or two
Weibull distributions for every country data that present more than 3,000 deaths up to January, 7
th
. Multiple waves
pattern can be observed for most, if not all, the countries (Fig. S1). Split dates, Mo1 and Mo2 used to fit the data in
Fig. S1 were as described in Table 1. Although the data analysis presented here was able to deliver feasible models on
the COVID-19 pandemic data, it may be taken in account with caution, due to the possible existence of corrupted
data or by the unconfidence on the epidemic data collected by some countries authorities. Yet, although the overall
death tolls – extracted from the area under the curve (
α
in Eq. (2)) – displayed in Table 1, reflect good estimates of
the real number of deaths at the end of the pandemic peaks, it might probably be biased by the oscillations present
in the pattern of epidemic data from some countries and can reach much greater values if new waves of infection
become present.
In a overall point of view, the 4-parameter Weibull distribution showed to be a suitable modeling distribution for the
COVID-19 pandemic, when applied to daily new cases and deaths data. Figure 3 summarizes the residuals analysis of
the non-linear regression of the daily new cases and deaths data from every country used in this work. Residuals from
both daily new cases and deaths form narrow distributions around zero. Lines in Panels A and B represent normal
distributions with the mean and SD for each respective panel data. It is worth to note that the residuals distributions
are narrower than the normal distribution of residuals, with same mean and SD values. This observation denotes
the presence of highly dispersed outliers in the residuals. Panels C and D display more evidently those outliers. In
despite of the presence of the outliers, data both from new cases and deaths display sharp residuals distributions
within the Weibull distribution fit, what reinforces the use of such method for suitably modeling, forecast predicting
and parameters extracting from daily new cases and deaths data of COVID-19.
Non-linear regressions used here were performed by Metropolis-MCMC algorithm built in a R language script and
coded in a R package called “weibull4”. This module may be quite useful to be applied not just to COVID-19, but to
any epidemic data that displays similar spreading pattern. Weibull4 R package can be used for non-linear regression
of daily new cases and deaths data using both unimodal and bimodal 4-parameters Weibull distributions (Eqs (1) and
(2)), with the location parameter (
γ
), that accommodates the time lapse before the arise of first cases or deaths, and
the area parameter (
α
), that represents the overall number of registered cases or deaths. Weibull4 package is available
at the R CRAN repository (R Core Team, 2013) at https://cran.r-project.org/.
Predictions of COVID-19 epidemic evolution based on daily new cases and deaths are especially efficient, because
they can be revised day-by-day, giving to governments and health authorities the opportunity of re-conducting their
AUTHOR COPY
V.H. Moreau / Using the Weibull distribution to model COVID-19 epidemic data 13
measures as new data arise. Additionally, 4-parameters Weibull distribution, as well as weibull4 R package, may
be suitable to perform analysis on epidemic data from other diseases or, eventually, from future pandemics, since it
seams to be a consensus in the scientific community that we are in imminent risk of them (Osterholm, 2005). We
believe that such predictions would be useful for decision makers in order to define strategies to fight epidemic and
pandemic outbreaks, nowadays and in the future.
Acknowledgments
The author would like to thank to Dr. Gilson Carvalho for worthful discussion and to Dr. Juliana Cortines for the
critical reading of the manuscript.
References
Anderson, R. M., Heesterbeek, H., Klinkenberg, D., & Hollingsworth, T. D. (2020). How will country-based mitigation measures influence the
course of the COVID-19 epidemic? Lancet,395, 931-934.
Choi, J. Y. (2020). COVID-19 in South Korea. Postgrad Med J ,96, 399-402.
Choi, S., Han, C., Lee, J., Kim, S. I., & Kim, I. B. (2020). Innovative screening tests for COVID-19 in South Korea. Clin Exp Emerg Med, 1-5.
Ciufolini, I., & Paolozzi, A. (2020). Mathematical prediction of the time evolution of the COVID-19 pandemic in Italy by a Gauss error function
and Monte Carlo simulations. Eur Phys J Plus 135, 355.
Eberhardt, J. N., Breuckmann, N. P., & Eberhardt, C. S. (2020). Multi-Stage Group Testing Improves Efficiency of Large-Scale COVID-19
Screening. J Clin Virol, S1386-6532(20)30124-4.
Gupta, S., Raghuwanshi, G. S., & Chanda, A. (2020). Effect of weather on COVID-19 spread in the US: A prediction model for India in 2020. Sci
Total Environ,728, 138860.
Hale, T., Angrist, N., Cameron-Blake, E., Hallas, L., Kira, B., Majumdar, S., Petherick, A., Phillips, T., Tatlow, H., & Webster, S. (2020). Variation
in government responses to COVID-19. BSG Work. Pap. Ser. Blavatnik Sch. Gov. Univ. Oxford: Version 8.0.
Khan, S. A. (2018). Exponentiated Weibull regression for time-to-event data. Lifetime Data Anal,24, 328-354.
Kim, S., Seo, Y. B., & Jung, E. (2020). Prediction of COVID-19 transmission dynamics using a mathematical model considering behavior changes.
Epidemiol Health,42, e2020026.
Korner-Nievergelt, F., Roth, T., von Felten, S., Guélat, J., Almasi, B., & Korner-Nievergelt, P. (2015). Markov Chain Monte Carlo Simulation. in:
Bayesian Data Anal. Ecol. Using Linear Model. with R, BUGS, STAN. Elsevier, 197-212.
Lau, H., Khosrawipour, V., Kocbach, P., Mikolajczyk, A., Schubert, J., Bania, J., & Khosrawipour, T. (2020). The positive impact of lockdown in
Wuhan on containing the COVID-19 outbreak in China. J Travel Med 27.
Lawless, J. F. (2003). Basic Concepts and Models 1.1. in: Stat Model Methods Lifetime Data, Second Ed, 1-47.
Li, L., Yang, Z., Dang, Z., Meng, C., Huang, J., Meng, H., Wang, D., Chen, G., Zhang, J., Peng, H., & Shao, Y. (2020). Propagation analysis and
prediction of the COVID-19. Infect Dis Model,5, 282-292.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of State Calculations by Fast Computing
Machines. J Chem Phys,21, 1087-1092.
Mitjà O., Arenas À., Rodó X., Tobias, A., Brew, J., & Benlloch, J. M. (2020). Experts’ request to the Spanish Government: Move Spain towards
complete lockdown. Lancet,395, 1193-1194.
Moreau, V. H. (2020). Forecast predictions for the COVID-19 pandemic in Brazil by statistical modeling using the Weibull distribution for daily
new cases and deaths. Brazilian J Microbiol,51, 1109-1115.
Osterholm, M. T. (2005). Preparing for the next pandemic. N Engl J Med,352, 1839-1842.
Peto, J. (2020). Covid-19 mass testing facilities could end the epidemic rapidly. BMJ, m1163.
R Core Team. (2013). R: A language and environment for statistical computing. Vienna, Austria.
Roberts, G. O., Gelman, A., & Gilks, W. R. (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann Appl
Probab,7, 110-120.
Roser, M., Ritchie, H., Ortiz-Ospina, E., & Hasel, J. (2020). Coronavirus Pandemic (COVID-19).
RStudio Team. (2020). RStudio: Integrated Development Environment for R. Boston, MA.
Saez, M., Tobias, A., Varga, D., & Barceló, M. A. (2020). Effectiveness of the measures to flatten the epidemic curve of COVID-19. The case of
Spain. Sci Total Environ,727, 138761.
Salath, M., Althaus, C. L., Neher, R., Stringhini, S., Hodcroft, E., Fellay, J., Zwahlen, M., Senti, G., Battegay, M., Wilder-Smith, A., Eckerle, I.,
Egger, M., & Low, N. (2020). COVID-19 epidemic in Switzerland: on the importance of testing, contact tracing and isolation. Swiss Med
Wkly.
Sjödin, H., Wilder-Smith, A., Osman, S., Farooq, Z., & Rocklöv, J. (2020). Only strict quarantine measures can curb the coronavirus disease
(COVID-19) outbreak in Italy, 2020. Eurosurveillance,25, 1-6.
Verma, V., Vishwakarma, R. K., Verma, A., Nath, D. C., & Khan, H. T. A. (2020). Time-to-Death approach in revealing Chronicity and Severity of
COVID-19 across the World. Ed. Kannan Navaneetham. PLoS One,15, e0233074.
Wilder-Smith, A., & Freedman, D. O. (2020). Isolation, quarantine, social distancing and community containment: pivotal role for old-style public
health measures in the novel coronavirus (2019-nCoV) outbreak. J Travel Med,27, 1-4.
AUTHOR COPY
14 V.H. Moreau / Using the Weibull distribution to model COVID-19 epidemic data
Supplementary data
The supplementary files are available to download from http://dx.doi.org/10.3233/MAS-210510.
Supplement Fig. 1. Plots of daily new cases (open circles) and deaths (closed circles) for COVID-19 in every country used in this work, calculated
with customized mode. Split dates and the number of modes in the Weibull distribution used to fit the data before and after the split date are shown
in Table 1. Data were fit (lines) using weibull4 R package (see Material and Methods). Charts are ordered from higher to lower Determination
Coefficients (R2) of the data fit.
AUTHOR COPY