MethodPDF Available

Estimation of the final size of the coronavirus epidemic by the SIR model

Authors:

Abstract and Figures

In the note, the SIR model is used for the estimation of the final size of the coronavirus epidemic. The current prediction is that the size of the epidemic will be about 85000 cases. The note complements the author’s note [1]
Content may be subject to copyright.
21.03.2020 21:57
1
Estimation of the final size of the coronavirus epidemic by the SIR
model
Milan Batista
University of Ljubljana, Slovenia
milan.batista@fpp.uni-lj.si
(Feb 2020)
Abstract.
In the note, the SIR model is used for the estimation of the final size of the coronavirus
epidemic. The current prediction is that the size of the epidemic will be about 85 000
cases. The note complements the author’s note [1]
1. Introduction
In this note, we will try to estimate a final epidemic size by the SIR model [2, 3]. The
program implements the model is available at
https://www.mathworks.com/matlabcentral/fileexchange/74658-fitviruscovid19
2. The SIR model
The model equations are
dS IS
dt N

, (1)
dI IS I
dt N

, (2)
dR I
dt
, (3)
where t is time,
 
St
is the number of susceptible persons at time t,
I It
is the
number of infected persons at time t,
 
Rt
is the number of recovered persons in time
21.03.2020 21:57
2
t.
is the contact rate, .and
1
is the average infectious period. From (1),(2),(3) we
obtain total population size N
(4)
The initial conditions are
 
0
0SS
,
 
0
0II
,
 
0
0RR
.
Eliminating I from (1) and (3) yield
 
00
expS S RR
N


 


. (5)
In the limit
t
the number of susceptible people left
S
is
 
00
exp
S S RR
N



 


, (6)
where
R
is the final number of recovered persons. Because the final number of
infected people is zero, we have, using (4),
NS R


. (7)
From this, and (6) the equation for
R
is
00
expR NS R R
N



 


. (8)
In order to use the model, we must estimate model parameters
,
and initial values
0
S
and
0
I
from available data (we set
0
0R
and
01
IC
).
Now the available data is a time series of the total number of cases C, i.e.,
CIR
. (9)
We can estimate the parameters and initial values by minimizing the difference between
the actual and predicted number of cases, i.e., by minimizing
 
2
0
ˆ, , min
tt
CC S
, (10)
where
 
12
,,,
tn
C CC C
are given number of cases in times
12
,, ,
n
tt t
and
 
12
ˆ ˆˆ ˆ
,,,
tn
C CC C
are corresponding estimates calculated by the model. For the
21.03.2020 21:57
3
practical calculation, we use MATLAB’s function fminsearch and for the
integration of the model equation MATLAB's function ode45.
3. Results
The results of the calculation are shown in Table 1 and on Figure 1. From data in
Table 1, we see that all data sets have high R2 (>0.98). Also, we can see that the final
number of recovered persons converge and the predicted values do not differ
substantially; however, the predicted total population involved differs substantially.
Here we note that from day 28, the collection of data changed. Until day 28, the
estimated epidemic size was about 52 000 infections; after that prediction change to
about 85 700 infections.
Table 1. Convergence study. After day 28, the method of data collection change.
Day N
S
R
, ,1nn
RR
 

2
R
12.feb.20
28
551513
473888
77625
1.429
2.897
2.689
1.077
0.988
13.feb.20
29
1300538
1189616
110922
1.048
4.026
3.854
1.045
0.988
14.feb.20
30
1434310
1318035
116275
0.925
4.157
3.988
1.042
0.990
15.feb.20
31
1203132
1095609
107523
0.932
3.905
3.730
1.047
0.991
16.feb.20
32
1002252
901990
100262
0.953
3.624
3.441
1.053
0.992
17.feb.20
33
864976
769448
95528
0.969
3.387
3.198
1.059
0.993
18.feb.20
34
774076
681530
92546
0.969
3.205
3.010
1.065
0.994
19.feb.20
35
683791
594070
89722
0.978
2.998
2.798
1.071
0.994
20.feb.20
36
619431
531645
87787
0.985
2.835
2.630
1.078
0.994
21.feb.20
37
574963
488460
86503
0.990
2.712
2.504
1.083
0.994
22.feb.20
38
544793
459126
85667
0.993
2.624
2.413
1.087
0.995
23.feb.20
39
522591
437513
85078
0.993
2.557
2.343
1.091
0.995
24.feb.20
40
506492
421823
84669
0.995
2.506
2.291
1.094
0.995
21.03.2020 21:57
4
Figure 1. Actual and predicted number of cases by the SIR model and logistic model
(data up to 25 Feb 2020)
Having a series of final predictions, we can estimate the series limit by Shanks
transformation [4, 5]
2
,1 ,1 ,
,1 ,1 ,
2
nn n
nn n
RR R
RRR R
 
 

. (11)
For data from Table 1, the current prediction is 84085 cases (Table 2).
Table 2. Iterated Shanks transformation.
day
R
R(
R
) R(R(
R
)) R(R(R(
R
)))
R(R(R(R(
R
))))
32
100262
33
95528 87470
34
92546
39247
62344
35
89722 83575
83974
84181
36
87787 83971
84179
84084
84085
37
86503
84107
84003
84499
38
85667 83673
83731
39 85078 83740
40 84669
4. Conclusion
21.03.2020 21:57
5
If a method of data collection will not change again, and if the situation will remain
stable, then by the SIR model, the predicted size of the epidemic is about 84 100 cases.
This prediction is comparable with the current prediction 84 000 infections, by the
empirical logistic model [6] (see Fig 2).
Figure 2. Comparison of models predictions (data up to 25.Feb 2020)
Appendix. MATLAB script
%SIR model - Dynamic of epidemic
%
% Reference:
% https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology
%
% Variables
% S - # of susceptible
% I - # of infected
% R - # recovered
%
% Parameters:
% beta -- 1/beta = Tc - typical time between contacts
% gamma -- 1/gamma = Tr - typical time until recovery
% N -- = S + I + R = const
%
% Derived parameters:
% RN = beta/gamma -- basic reproduction number (ratio)
%
% History:
% 23.Feb.2020 MB Created
global beta gamma N % parameters
global S0 I0 R0 % initial values (normalized S0=1)
21.03.2020 21:57
6
global C % number of infected
global init % initial guess
close all
% get data
[CC,date0] = getData();
fprintf('%12s %3s %10s %10s %10s %7s %7s %7s %7s\n',...
'Date','Day','N','Sinf','Rinf','beta','gamma','R0','R2')
init = false;
for ii = 28:length(CC)
% get data
C = CC(1:ii);
% estimate model parameters
b = parest();
% final number
Rinf = Rmax();
Sinf = S0*exp(-beta/gamma/N*(Rinf - R0));
% calculate R2
tspan = 0:length(C)-1; % final time
ic = [S0 I0 R0]'; % initial conditions
opts = []; % no options set
[~,z] = ode45( @SIR, tspan, ic, opts);
z = z(:,2)+z(:,3);
zbar = sum(C)/length(C);
SStot = sum((C - zbar).^2);
SSres = sum((C - z').^2);
R2 = 1 - SSres/SStot;
% print results
fprintf('%12s %3d %10d %10d %10d %7.3f %7.3f %7.3f %7.3f\n',...
datestr(date0+ceil(length(C)-
1)),ceil(length(C)),round(N,0),...
round(Sinf,0),round(Rinf,0),beta,gamma,beta/gamma,R2)
% fprintf('Estimated parameters\n')
% fprintf(' End date
%s\n',datestr(date0+ceil(length(C)-1)));
% fprintf(' Day number %d\n',ceil(length(C)));
% fprintf(' Population size %d\n',round(N,0));
% fprintf(' Initial infected %d\n',round(z(1,2),0));
% fprintf(' Remain susceptible %d\n',round(Sinf,0));
% fprintf(' Total recovered %d\n',round(Rinf,0));
% fprintf(' Contact rate %g\n',beta);
% fprintf(' recovery rate %g\n',gamma);
% fprintf(' Recovery time %g\n',1/gamma);
% fprintf(' Reproduction number %g\n',beta/gamma);
% fprintf(' R2 %g\n',R2);
end
% set parameters
tspan = 0:2*length(C); % final time
ic = [S0 I0 R0]'; % initial conditions
opts = []; % no options set
% simulate
[t,z] = ode45( @SIR, tspan, ic, opts);
21.03.2020 21:57
7
% plot results
figure
hold on
plot(t,z,'LineWidth',2)
legend('Susceptible','Infected','Recovered',...
'Location','best','FontSize',12)
xlabel('Day (after 16 jan 2020)')
ylabel('Cases')
grid on
hold off
shg
% plot comparsion
figure
hold on
plot(t,(z(:,2)+z(:,3))','k','LineWidth',2)
scatter(1:length(C),C,50,'filled')
legend('Predicted','Actual',...
'Location','best','FontSize',12)
xlabel('Day (after 16 jan 2020)')
ylabel('Cases')
grid on
hold off
shg
% save to global
ta = t;
Ca = z(:,2)+z(:,3);
function dzdt = SIR(~,z)
%SIR model
% x(1) = S, x(2) = I, x(3) = R
global beta gamma N
S = z(1);
I = z(2);
R = z(3);
N = S + I + R;
dzdt = [ -beta*I*S/N; beta*I*S/N - gamma*I; gamma*I];
end
function b = iniguess()
%INIGUESS Obtain initial guess
global beta gamma
global S0 I0 R0 % initial values
global C
global init
if ~init
beta = 1/0.00267103;
gamma = 1/0.00267232;
S0 = 1e8;
I0 = C(1);
R0 = 0;
init = true;
end
b(1) = beta;
b(2) = gamma;
b(3) = S0;
end
function b = parest
21.03.2020 21:57
8
%PAREST Parameter estimation
global beta gamma N
global S0 I0 R0 % initial values
maxiter = 20000;
maxfun = 20000;
b0 = iniguess();
options = optimset('Display','off','MaxIter',maxiter,...
'MaxFunEvals',maxfun);
[b, fmin,flag] = fminsearch(@fun,b0,options);
% disp('Exit condition:')
% disp(flag)
% disp('Smallest value of the error:');
% disp(fmin)
fun(b); % obtain x ped
beta = b(1);
gamma = b(2);
S0 = b(3);
N = S0 + I0 + R0;
end
function f = fun( b)
%FUN Optimization function
global beta gamma S0 I0 R0
global C Ca
% set parameters
beta = b(1);
gamma = b(2);
S0 = b(3);
tend = length(C);
tspan = 0:tend-1; % time interval
ic = [S0 I0 R0]'; % initial conditions
% solve ODE
try
[tsol,zsol] = ode45(@SIR,tspan,ic);
catch
f = NaN;
return
end
% check if calculation time equals sample time
if length(tsol) ~= length(tspan)
f = NaN;
return
end
Ca = (zsol(:,2)+zsol(:,3))'; % calculated number of cases
f = norm(C - Ca);
end
function r = Rmax()
%FSOLVE Calculate number of recoverd individuals after t=inf
global N S0 beta gamma R0
RN = beta/gamma;
r = fzero(@f,[0,S0]);
%-----------------------
function z = f(x)
21.03.2020 21:57
9
z = x - (N - S0*exp(-RN*(x - R0)/N));
end
end
function [C,date0] = getData()
%GETDATA Coronavirus data
% data from 16 Jan to 21 Jan https://i.redd.it/f4nukz4ou9d41.png
% data from from 22 Jan 2020 to 13 Feb 2020 are from
% https://www.worldometers.info/coronavirus/
date0=datenum('2020/01/16'); % start date
C = [
45
62
121
198
291
440
580
845
1317
2015
2800
4581
6058
7813
9821
11948
14551
17389
20628
24553
28276
31439
34876
37552
40553
43099
44919
60326
64437
67100
69197
71329
73332
75198
75700
76676
77673
78651
79400
80088
%----------- add new data here
]';
end
References
21.03.2020 21:57
10
[1] M. Batista, Estimation of the final size of the coronavirus epidemic by the logistic
model, medRxiv (2020) 2020.02.16.20023606.
[2] H.W. Hethcote, The Mathematics of Infectious Diseases, SIAM Review 42(4)
(2000) 599-653.
[3] I. Nesteruk, Statistics based predictions of coronavirus 2019-nCoV spreading in
mainland China, medRxiv (2020) 2020.02.12.20021931.
[4] D. Shanks, Non-linear Transformations of Divergent and Slowly Convergent
Sequences, Journal of Mathematics and Physics 34(1-4) (1955) 1-42.
[5] C.M. Bender, S.A. Orszag, Advanced mathematical methods for scientists and
engineers I asymptotic methods and perturbation theory, Springer, New York, 1999.
[6] M. Batista, Estimation of the final size of the coronavirus epidemic by the logistic
model, 2020 DOI: 10.13140/RG.2.2.36053.37603.
... They concluded that the individuals infected from coronavirus infectious disease must be referred to isolated compartment at various rates. A logistic growth model of COVID-19 was studied by Batista [10] and was employed to predict the ultimate volume of the epidemic. The dynamical behavior of coronavirus has been studied by several researchers through various COVID-19 transmission models [7][8][9][10][11]. ...
... A logistic growth model of COVID-19 was studied by Batista [10] and was employed to predict the ultimate volume of the epidemic. The dynamical behavior of coronavirus has been studied by several researchers through various COVID-19 transmission models [7][8][9][10][11]. ...
... Solving the following necessary conditions we obtain the values of C 1 and C 2 presented in Tab. 10. ...
Article
Full-text available
This study employs a semi-analytical approach, called Optimal Homotopy Asymptotic Method (OHAM), to analyze a coronavirus (COVID-19) transmission model of fractional order. The proposed method employs Caputo's fractional derivatives and Reimann-Liouville fractional integral sense to solve the underlying model. To the best of our knowledge, this work presents the first application of an optimal homotopy asymptotic scheme for better estimation of the future dynamics of the COVID-19 pandemic. Our proposed fractional-order scheme for the parameterized model is based on the available number of infected cases from January 21 to January 28, 2020, in Wuhan City of China. For the considered real-time data, the basic reproduction number is R0 ≈ 2.48293 that is quite high. The proposed fractional-order scheme for solving the COVID-19 fractional-order model possesses some salient features like producing closed-form semi-analytical solutions, fast convergence and non-dependence on the discretization of the domain. Several graphical presentations have demonstrated the dynamical behaviors of subpopulations involved in the underlying fractional COVID-19 model. The successful application of the scheme presented in this work reveals new horizons of its application to several other fractional-order epidemiological models.
... Social distancing or less crowded places can reduce the risk of spreading of coronavirus because it is more likely to spread in compact places. Batista constructed the SIR model to guess the finishing measurements of coronavirus [9]. Moreover, the basic reproductive number ðR 0 Þ is very effective in approximating the transmission rate of an infection; i.e., it is also useful in estimating the ratio of occupants required to be immunized in order to wipe out the infection. ...
Article
Full-text available
For the analysis of the recent deadly pandemic Sars-Cov-2, we constructed the mathematical model containing the whole population, partitioned into five different compartments, represented by SEIQR model. This current model especially contains the quarantined class and the factor of loss of immunity. Further, we discussed the stability of the SEIQR model (constructed on the basis of system of coupled differential equations). The basic reproduction that indicates the behavior of the disease is also estimated by the use of next-generation matrix method. Numerical simulation of this model is provided, the results are analyzed by theoretically strong numerical methods, and computationally known tool MATLAB Simulink is also used for visualization of the results. Validation of results by Simulink software and numerical methods shows that our model and adopted methodology are appropriate and accurate and could be used for further predictions on COVID-19. Our results suggest that the isolation of the active cases and strong immunization of patients or individuals play a major role to fight against the deadly Sars-Cov-2.
... 31 The logistic growth regression model has also been used to estimate the final size and the peak of the pandemic in many countries and results have been found to be similar to those obtained by the SIR model. 32 Studies carried out by Sharma et al., Swain et al., Newtonraj et al. [33][34][35] revealed that ARIMA model is also appropriate for forecasting COVID-19 in India. Mishra et al. 36 have advocated SARIMA along with ARIMA for the modelling and forecasting of COVID-19 pandemic whereas Bedi P et al. 37 have used SEIRD and LSTM models. ...
Article
Full-text available
Background: COVID-19 is a disasterous pandemic that the world has ever faced. It is affecting the global health system irrespective of race, ethnicity, environment, and economic status. This study is conducted with the aim of assessing the progression of the COVID-19 pandemic in India. Methods: This article uses the functional concurrent regression analysis approach to describe the pattern of daily reported confirmed cases of COVID-19 in India. The approach provides an excellent fit to the daily reported confirmed cases of the disease. The data used in this study have been taken from covid19india.org. Results: Estimated value of the parameter kbof the model is highly volatile. During the first phase of the pandemic which last up to 31st March 2020, value was very high. During 31st March to 19th July 2020 except for a few exceptions. Its value again increased rapidly from 17th February 2021 to 16th April 2021 and started decreasing after mid-March, 2021 and continued decreasing till present. Conclusion: The data-driven approach used in this study is purely empirical and does not make any assumption about the progression of the pandemic or about the data. The article suggests that based on the parameter of the model, an early warning system may be developed and institutionalised to undertake the necessary measures to control the spread of the disease, thereby controlling the pandemic.
... Définition 1. 2 Une équation différentielle est dite linéaire si la fonction F associée est linéaire. Autrement dit, une équation différentielle linéaire d'ordre n est une équation de la forme : a n (t)y n + a n−1 (t)y n−1 + ... + a 1 (t)y + a 0 (t)y = f (t) ...
Technical Report
Full-text available
L'objectif de ce travail est de prédire l'évolution de la propagation de la nouvelle pandémie Covid19 au Maroc dans une période précise en se basant sur des modèles de précision variante ( déterministes et stochastiques ).
... Mathematical models that have been employed to predict the spread of the pandemic include logistic models [12,13] and Susceptible-Infected-Recovered (SIR) models [8,14]. In modelling using the SIR approach, we assume that the population is a compartment of interacting individuals in which the disease spread from the infected to the susceptible, and the infected either recover and build an immunity toward the infectious agent or succumb to the infection [15,16]. ...
Article
Full-text available
This research aims to model the COVID-19 in different countries, including Italy, Puerto Rico, and Singapore. Due to the great applicability of the discrete distributions in analyzing count data, we model a new novel discrete distribution by using the survival discretization method. Because of importance Marshall- Olkin family and the inverse Toppe-Leone distribution, both of them were used to introduce a new discrete distribution called Marshall–Olkin inverse Toppe-Leone distribution, this new distribution namely the new discrete distribution called discrete Marshall- Olkin Inverse Toppe-Leone (DMOITL). This new model posses only two parameters, also many properties have been obtained such as reliability measures and moment functions. The classical method as likelihood method and Bayesian estimation methods are applied to estimate the unknown parameters of DMOITL distributions. The Monte–Carlo simulation procedure is carried out to compare the maximum likelihood and Bayesian estimation methods. The highest posterior density (HPD) confidence intervals are used to discuss credible confidence intervals of parameters of new discrete distribution for the results of the Markov Chain Monte Carlo technique (MCMC).
Chapter
In this chapter, we consider COVID-19. We are particularly interested in the SIR and SEIR models which use differential equations to capture the history of COVID-19 and its transmission. We present the models in order to set the ground work for their extension to not only a mathematics of uncertainty setting, but to their extension to nonstandard analysis and soft logic.
Article
Full-text available
Background. The epidemic outbreak caused by coronavirus COVID-19 is of great interest to researches because of the high rate of the infection spread and the significant number of fatalities. A detailed scientific analysis of the phenomenon is yet to come, but the public is already interested in the questions of the epidemic duration, the expected number of patients and deaths. Long-time predictions require complicated mathematical models that need a lot of effort to identify and calculate unknown parameters. This article will present some preliminary estimates. Objective. Since the long-time data are available only for mainland China, we will try to predict the epidemic characteristics only in this area. We will estimate some of the epidemic characteristics and present the dependen­cies for victim numbers, infected and removed persons versus time. Methods. In this study we use the known SIR model for the dynamics of an epidemic, the known exact solution of the linear differential equations and statistical approach developed before for investigation of the children disease, which occurred in Chernivtsi (Ukraine) in 1988–1989. Results. The optimal values of the SIR model parameters were identified with the use of statistical approach. The numbers of infected, susceptible and removed persons versus time were predicted and compared with the new data obtained after February 10, 2020, when the calculations were completed. Conclusions. The simple mathematical model was used to predict the characteristics of the epidemic caused by coronavirus in mainland China. Unfortunately, the number of coronavirus victims is expected to be much higher than that predicted on February 10, 2020, since 12289 new cases (not previously included in official counts) have been added two days later. Further research should focus on updating the predictions with the use of up-to-date data and using more complicated mathematical models.
Method
Full-text available
In the note, the logistic growth regression model is used for the estimation of the final size of the coronavirus epidemic. The program used for forecasting is freely available from https://www.mathworks.com/matlabcentral/fileexchange/74411-fitvirus. Currently, the program contains data for China, Germany, Iran, Italy, Slovenia, South Korea , Spain, and countries outside of China. Daily forecast are available from https://www.fpp.uni-lj.si/en/research/researh-laboratories-and-the-programme-team/research-programme-team/
Article
Many models for the spread of infectious diseases in populations have been analyzed mathematically and applied to specific diseases. Threshold theorems involving the basic reproduction number R<sub>0</sub>, the contact number σ, and the replacement number R are reviewed for the classic SIR epidemic and endemic models. Similar results with new expressions for R<sub>0</sub> are obtained for MSEIR and SEIR endemic models with either continuous age or age groups. Values of R<sub>0</sub> and σ are estimated for various diseases including measles in Niger and pertussis in the United States. Previous models with age structure, heterogeneity, and spatial structure are surveyed.
Article
Typescript. Thesis--University of Maryland, College Park. Vita. "References": leaves [64]-66.
  • H W Hethcote
H.W. Hethcote, The Mathematics of Infectious Diseases, SIAM Review 42(4) 02.2020 10:23