PreprintPDF Available

Applying ARIMA-GARCH models for time series analysis on Seasonal and Nonseasonal datasets

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
Applying ARIMA-GARCH models for time series analysis on
Seasonal and Nonseasonal datasets
Reeva Andipara
Department of Mathematical Sciences, Stevens Institute of Technology, Hoboken,
NJ
Supervisor: Dr. Hadi Safari Katesari
Abstract
Time series analysis is a method for analyzing data in order to spot trends and predict what will happen
in the future. We’re going to carry out time series analysis on two types of data i.e. seasonal and
non-seasonal data. This project will provide a procedure to analyze and model time series in R. The first
part covers analysis and forecast of monthly sales of Perrin Freres Champagne. The data is the number
of champagne sales every month for nearly ten years. Approaches of time series analysis are
autoregressive integrated moving average(ARIMA), seasonal autoregressive integrated moving
average(SARIMA), autoregressive moving average(ARMA), moving average(MA) and
autoregression(AR). The second part deals with the time series of Natural Gas Prices to analyze and
forecast the monthly price of natural gas. The core of the project is to provide a guide to ARIMA and
ARCH-GARCH and look at the combined model’s output and effectiveness in time series modeling and
forecasting.
Part A:
Seasonal Dataset: Perrin Freres Monthly Champagne Sales
Introduction
This project consists of predicting the monthly sales of champagne by using time series and will also
predict the future monthly sales of champagne. The problem statement is estimating the number of
champagne sales per month for the Perrin Freres brand. The main purpose of this project is to find the
future forecasting of the monthly sales of champagne. The data set used covers a period of nearly ten
years, from January 1964 to September 1972, and includes the number of champagne sales every month.
1
Dataset:
Dataset I've used is taken from Kaggle.
https://www.kaggle.com/datasets/galibce003/perrin-freres-monthly-champagne-sales
(summary(data))
Month Perrin.Freres.monthly.champagne.sales.millions..64..72
Length:107 Min. : 1413
Class :character 1st Qu.: 3113
Mode :character Median : 4217
Mean : 4761
3rd Qu.: 5221
Max. :13916
NA's :2
For the sales column, there are two missing values in the number of observations (count).
data = na.omit(data)
df = data$Perrin.Freres.monthly.champagne.sales.millions..64..72
Data Analysis:
For a preliminary analysis, let's plot the time series.
plot(ts(df, frequency=12, start=(1964)), ylab='Montly Champagne Sales')
2
From the above plot, we can observe that there is a peak in sales towards the end of every year. This is a
good example of seasonality as the pattern is repeating annually.
When looking at the line plot, it appears that there is an increasing sales trend throughout the historical
shift, as well as seasonality and the height of the cycles, implying that it is multiplicative. We can
observe that there was an increasing trend between 1964 and 1970 on this time series graph. Seasonality
shows that the series is almost non-stationary. If the series is not stationary, it is necessary to make the
series stationary. Further to verify the stationarity of data, I divided the data into 3 segments and
calculated the mean and variance. For the series to be stationary, the mean and variance should be
constant which is not the case here. So, we can say the series is non-stationary.
segment1 = temp[1:35]
segment2 = temp[36:70]
segment3 = temp[71:105]
print(mean(segment1))
print(mean(segment2))
print(mean(segment3))
print(var(segment1))
print(var(segment2))
print(var(segment3))
Output:
[1] 3740.171
[1] 5078.143
[1] 5465.143
[1] 2667015
[1] 5359573
[1] 10231412
The next step in our time series analysis is to review Autocorrelation Function (ACF) and Partial
Autocorrelation Function (PACF) plots. Autocorrelation functions (ACF) and partial autocorrelation
functions (PACF) both summarize the strength of the relationship between two variables over time.
acf(df,lag.max=50)
3
pacf(df,lag.max=50)
The sample ACF and PACF for series is shown in the above plots. We can see the seasonal
autocorrelation in the ACF plot. Also, there is another relationship that must be modeled. So, we take
one seasonal difference.
differenced_data = diff(df, lag=12)
seasonal_data = ts(differenced_data)
seasonal_data = ts(seasonal_data, frequency=12, start=(1964))
plot(seasonal_data, ylab='Montly Champagne Sales')
4
This is the time series plot after taking one seasonal difference of sales data. We can now see that the
increasing trend between 1964 and 1970 is no more.
adf.test(seasonal_data)
Output:
Warning message in adf.test(seasonal_data):
“p-value smaller than printed p-value”
Augmented Dickey-Fuller Test
data: seasonal_data
Dickey-Fuller = -4.0804, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary
ADF test gives p-value less than 0.05 so we reject the null hypothesis. Data is now stationary. Since
stationarity has been achieved at the first difference, the ARIMA model of order (p,1,q) will be used
where 'p' is the order of the autoregressive term and 'q' is the order of the moving average term.
Let’s have a look at ACF & PACF again.
acf(as.vector(seasonal_data), lag.max=100)
5
The plot of ACF after 1 difference suggests that there is very less autocorrelation. There are no
significant lags (or we can consider 1 lag) outside the confidence interval i.e. we can consider MA(0) (or
MA(1)) for non-seasonal part of SARIMA and for seasonal part MA(1) can be considered because we
do not see any seasonality trend or repetition. For the non-seasonal part, we may consider two models
i.e. MA(0) or MA(1) and for the seasonal part we may consider MA(1).
pacf(as.vector(seasonal_data),lag.max=100)
6
The plot of PACF after 1 difference suggests that there is very less autocorrelation. There are no
significant lags (or we can consider 1 lag) outside the confidence interval i.e. we can consider AR(0) (or
AR(1)) for non-seasonal part of SARIMA and for seasonal part AR(1) can be considered because we do
not see any seasonality trend or repetition. For the non-seasonal part, we may consider two models i.e.
AR(0) or AR(1) and for the seasonal part we may consider AR(1).
eacf(seasonal_data)
AR/MA
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 x o o o o o o o o o o x o o
1 o o o o o o o o o o o o o o
2 x x o o o o o o o o o o o o
3 o x o o o o o o o o o o o o
4 o x o o o o o o o o o o o o
5 o x o o o o o o o o o o o o
6 o x o o o o o o o o o o o o
7 x x o o o o o o o o o o o o
Looking at EACF plot we get the following models:
ARMA(0,1)
ARMA(1,1)
ARMA(1,0)
Models of Time Series
Because the series is seasonal, SARIMA (Seasonal ARIMA) will be used instead of ARIMA.
From ACF and PACF plot I chose below models to fit to my data:
Model 1: SARIMA(1,0,0)(1,1,0)[12]
Model 2: SARIMA(1,0,0)(1,1,1)[12]
Since the data is monthly data, seasonality is 12.
model1 = arima(seasonal_data, order=c(1,0,0), seasonal=c(1,0,0),method = 'ML')
print(model1)
Output:
7
Call:
arima(x = seasonal_data, order = c(1, 0, 0), seasonal = c(1, 0, 0), method =
"ML")
Coefficients:
ar1 sar1 intercept
0.2881 -0.2920 286.7916
s.e. 0.1017 0.0988 81.3043
sigma^2 estimated as 494084: log likelihood = -742.18, aic = 1490.35
bic = AIC(model1, k = log(length(seasonal_data)))
1502.48172313175
model2 = arima(seasonal_data, order=c(1,0,0), seasonal=c(1,0,1))
print(model2)
Call:
arima(x = seasonal_data, order = c(1, 0, 0), seasonal = c(1, 0, 1))
Coefficients:
ar1 sar1 sma1 intercept
0.2898 -0.5320 0.2731 287.2478
s.e. 0.1015 0.3968 0.4679 86.2407
sigma^2 estimated as 492002: log likelihood = -742.02, aic = 1492.04
bic = AIC(model2, k = log(length(seasonal_data)))
1506.70785176533
model3 = arima(seasonal_data, order=c(1,0,1), seasonal=c(1,0,0))
print(model3)
8
Call:
arima(x = seasonal_data, order = c(1, 0, 1), seasonal = c(1, 0, 0))
Coefficients:
ar1 ma1 sar1 intercept
0.3774 -0.0987 -0.2885 286.0528
s.e. 0.3822 0.4192 0.1001 83.9712
sigma^2 estimated as 493945: log likelihood = -742.15, aic = 1492.3
bic = AIC(model3, k = log(length(seasonal_data)))
1506.96237063021
Comparing the above models based on AIC and BIC values:
Model 1 has the least AIC and BIC values.
Residual Analysis:
The “residuals” in a time series model are what is left over after fitting a model. Residuals are useful for
determining if a model has captured all of the data's information. A good forecasting method will yield
residuals with the following properties:
1) The residuals are uncorrelated.
2) The residuals have zero mean.
We plot sample ACF and PACF to check if there's any correlation in residuals and normality of the
residuals is checked using histogram and qq plots.
residuals_model1 = model1$residuals
residuals_model2 = model2$residuals
residuals_model3 = model3$residuals
Plotting the residuals:
For model 1:
plot(residuals_model1, type='o')
9
Examining a plot of the residuals over time serves as our first diagnostic check. If the model is accurate,
we anticipate that the plot will show a rectangle dispersion around a horizontal level of zero with no
visible trends.
acf(as.vector(residuals_model1), lag.max = 50)
From the ACF of residuals, we can see that there is almost no autocorrelation between residuals.
qqnorm(residuals_model1)
qqline(residuals_model1)
10
Here, extreme values look suspect.
hist(residuals_model1)
QQ plot and histogram suggest the normality of residuals.
print(shapiro.test(residuals_model1))
Shapiro-Wilk normality test
data: residuals_model1
W = 0.96675, p-value = 0.01797
tsdiag(model1)
11
It is helpful to have a check that considers residual correlations' aggregate magnitudes in order to look at
them at specific lags.
For model 2:
Let’s follow the same procedure for analysis of residuals of model 2.
plot(residuals_model2, type='o')
acf(as.vector(residuals_model2))
12
qqnorm(residuals_model2)
qqline(residuals_model2)
hist(residuals_model2)
13
From histogram we can see that the graph is not exactly bell-shaped. Also from qq plots we can see
some outliers. So, we use the Shapiro-Wilk test to test the normality of residuals.
print(shapiro.test(residuals_model2))
Shapiro-Wilk normality test
data: residuals_model2
W = 0.97024, p-value = 0.03194
We plot the sample ACF of the residuals. We can not see any significant correlation here except at lag 13
which is just one out of 50 so it is reasonable. The lack of correlation suggests the models are good. So,
we consider model 1 for forecasting.
Forecasting
After the fit process is completed and after getting the best model we use that model on our original data
to forecast.
forecast_model1<-arima(ts(df,frequency=12,start=(1964)),order=c(1,1,0),
seasonal=c(1,1,0))
forecast_model1$x<-ts(df, frequency=12, start=(1964))
14
forecast_model1$x
I have used the forecast() function to produce forecasts from the model. The forecast() method can
handle a wide range of inputs. It usually uses a time series or a time series model as its core argument
and generates appropriate forecasts. I have used a time series model to forecast the next 24 values of the
series with 95% confidence interval.
forecast_values = forecast(forecast_model1,level=c(95),h=24)
forecast_values
Point Forecast Lo 95 Hi 95
Oct 1972 6807.862 5208.1371 8407.587
Nov 1972 9897.162 8001.5337 11792.791
Dec 1972 12820.546 10561.2306 15079.861
Jan 1973 4260.002 1723.9659 6796.039
Feb 1973 3476.941 679.7747 6274.108
Mar 1973 4524.189 1492.2431 7556.134
Apr 1973 4788.501 1537.3473 8039.655
May 1973 4769.733 1313.7064 8225.759
Jun 1973 5214.846 1565.2712 8864.420
Jul 1973 4432.614 599.3059 8265.922
Aug 1973 1520.925 -2487.7232 5529.573
Sep 1973 5933.360 1756.7334 10109.987
15
Oct 1973 6893.944 2129.4162 11658.471
Nov 1973 9917.496 4793.7517 15041.241
Dec 1973 12809.585 7297.1328 18322.037
Jan 1974 4320.559 -1536.4248 10177.542
Feb 1974 3537.216 -2651.4701 9725.902
Mar 1974 4574.196 -1927.0927 11075.484
Apr 1974 4822.525 -1977.7762 11622.826
May 1974 4758.416 -2328.0245 11844.856
Jun 1974 5278.147 -2083.4128 12639.707
Jul 1974 4426.429 -3200.2991 12053.158
Aug 1974 1522.742 -6360.2521 9405.736
Sep 1974 5950.637 -2180.5460 14081.820
The above table shows the next 24 forecasted values of the time series.
plot(forecast_values)
In the above figure, we see the 24-month forecast results from October 1972 to September 1974 on the
line graph in blue. If we look at the forecast values and visualization, it can be called a satisfactory
16
result. So, the model SARIMA(1,0,0)(1,1,0)[12] is a good fit to the data and we are getting adequate
forecast results.
Part B:
Nonseasonal Dataset: Natural gas prices
Introduction
This project uses a time series of Natural Gas Prices that includes US Henry Hub. The problem
statement is to forecast the monthly price of natural gas using time series analysis techniques.
Methodology is to first come up with possible models to describe the change in the gas price and then
pick the best model by comparing the performance of each model. The dataset used covers a period of
24 years, from January 1997 to August 2020. I used such a wide range of times since the price tends to
fluctuate much and wanted to see the big trend of the price change.
Dataset: Data comes from U.S. Energy Information Administration
https://datahub.io/core/natural-gas
df = read.csv("/content/monthly.csv")
Data Analysis
summary(df)
Month Price
Length:284 Min. : 1.630
Class :character 1st Qu.: 2.660
Mode :character Median : 3.560
Mean : 4.208
3rd Qu.: 5.327
Max. :13.420
nonseasonal_data = ts(nonseasonal_data,frequency=12,start=c(1997,1))
plot(nonseasonal_data)
17
From the above plot, we can see that there is no trend in data, we can see that there is a significant pump
in 2005 but there's no seasonality.
acf(nonseasonal_data)
18
The above ACF is “decaying”, or decreasing, very slowly, and remains well above the significance
range (dotted blue lines). This is indicative of a non-stationary series.
pacf(nonseasonal_data)
adf.test(nonseasonal_data)
Augmented Dickey-Fuller Test
data: nonseasonal_data
Dickey-Fuller = -2.9978, Lag order = 6, p-value = 0.1557
alternative hypothesis: stationary
When we looked at the gas price graph and ACF, it did not look stationary so we ran the ADF test and
the p-value was 0.1557, indicating that it is not a stationary process. So, I took the difference in gas
prices. When we ran the ADF test of difference in gas prices, the p-value was less than 0.05, indicating a
stationary process with a possible linear trend. However, from the plot, it was clear that there is no linear
trend.
nonseasonal_diff = diff(nonseasonal_data)
plot(nonseasonal_diff)
19
acf(as.vector(nonseasonal_diff))
The plot of ACF after 1 difference suggests that there is very less autocorrelation. There is 1 lag that is
significantly outside the confidence interval i.e. we can consider MA(1).
pacf(as.vector(nonseasonal_diff))
20
The plot of PACF after 1 difference suggests that there is very less autocorrelation. There is 1 lag
significantly outside the confidence interval i.e. we can consider AR(1) (or AR(2) there's lag #5 outside
CI).
adf.test(nonseasonal_diff)
“p-value smaller than printed p-value”
Augmented Dickey-Fuller Test
data: nonseasonal_diff
Dickey-Fuller = -6.9687, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary
ADF test gives p-value less than 0.05 so we reject the null hypothesis. Data is now stationary. Since
stationarity has been achieved at the first difference, the ARIMA model of order (p,1,q) will be used
where 'p' is the order of the autoregressive term and 'q' is the order of the moving average term.
eacf(nonseasonal_diff)
AR/MA
21
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 o o o o x o o o x o o o o o
1 x o o o x o o o x o o o o o
2 x o o o o o o o x o o o o o
3 x x o o o o o o x o o o o o
4 x o x o o o o o x o o o x o
5 x x x x x o o o x o o o x o
6 x x o x o x o o x o o o x o
7 x x o x o x x o o o o o o o
Models of Time Series
Because the series is non-seasonal, the ARIMA model will be used.
From ACF and PACF plot we chose below models to fit to my data:
Model 1: ARIMA(1,0,1)
Model 2: ARIMA(2,0,1)
ns_model1 = arima(nonseasonal_diff, order=c(1,0,1))
print(ns_model1)
Call:
arima(x = nonseasonal_diff, order = c(1, 0, 1))
Coefficients:
ar1 ma1 intercept
-0.9769 0.9437 -0.0057
s.e. 0.0268 0.0429 0.0440
sigma^2 estimated as 0.5478: log likelihood = -305.31, aic = 616.63
bic = AIC(ns_model1, k = log(length(nonseasonal_diff)))
633.066517367315
ns_model2 = arima(nonseasonal_diff, order=c(2,0,1))
print(ns_model2)
22
Call:
arima(x = nonseasonal_diff, order = c(2, 0, 1))
Coefficients:
ar1 ar2 ma1 intercept
-0.9463 0.0296 0.9407 -0.0041
s.e. 0.0756 0.0638 0.0463 0.0453
sigma^2 estimated as 0.5473: log likelihood = -305.2, aic = 618.4
bic = AIC(ns_model2, k = log(length(nonseasonal_diff)))
638.451797725991
Comparing the above models based on AIC and BIC values:
Model 1 has the least AIC and BIC values.
Residual Analysis:
We plot sample ACF and PACF to check if there's any correlation in residuals and normality of the
residuals is checked using histogram and qq plots
nonseasonal_residuals_model1 = ns_model1$residuals
nonseasonal_residuals_model2 = ns_model2$residuals
For model 1:
plot(nonseasonal_residuals_model1, type='o')
23
There is no visible trend in the plot of residuals.
acf(nonseasonal_residuals_model1)
pacf(as.vector(nonseasonal_residuals_model1))
24
The above ACF and PACF plots of residuals suggest that there is no significant correlation among the
residuals. Now, for checking the normality of the residuals let’s plot qq plots and histogram.
qqnorm(nonseasonal_residuals_model1)
qqline(nonseasonal_residuals_model1)
Here, extreme values look suspect as it seems they do not follow normality.
hist(nonseasonal_residuals_model1)
25
print(shapiro.test(nonseasonal_residuals_model1))
Shapiro-Wilk normality test
data: nonseasonal_residuals_model1
W = 0.86943, p-value = 1.783e-14
For model 2:
Let’s follow the same procedure to analyze the residuals of model 2.
plot(nonseasonal_residuals_model2, type='o')
26
acf(as.vector(nonseasonal_residuals_model2), lag.max=50)
pacf(as.vector(nonseasonal_residuals_model2))
27
qqnorm(nonseasonal_residuals_model2)
qqline(nonseasonal_residuals_model2)
hist(nonseasonal_residuals_model2)
28
print(shapiro.test(nonseasonal_residuals_model2))
Shapiro-Wilk normality test
data: nonseasonal_residuals_model2
W = 0.86694, p-value = 1.267e-14
We plot the sample ACF of the residuals. We can see that there is dependency in residuals. Histograms
show that it follows approximately normal distribution. Looking at the residual analysis of both the
models, model 1 seems a good fit. So, we will consider model 1 to further forecast the series.
Forecasting
model1<-arima(ts(nonseasonal_data,frequency=12,start=c(1997,1)),
order=c(1,1,1))
model1$x<-ts(nonseasonal_data, frequency=12,start=c(1997))
I have used a time series model to forecast the next 24 values of the series with 95% confidence interval.
f=forecast(model1,level=c(95),h=24)
Point Forecast Lo 95 Hi 95
Nov 2019 2.386313 0.9357244 3.836901
Dec 2019 2.331194 0.3121041 4.350283
Jan 2020 2.385144 -0.1004326 4.870721
29
Feb 2020 2.332337 -0.5230609 5.187735
Mar 2020 2.384025 -0.8178421 5.585892
Apr 2020 2.333433 -1.1636714 5.830536
May 2020 2.382953 -1.4019901 6.167896
Jun 2020 2.334482 -1.7035925 6.372557
Jul 2020 2.381926 -1.9075517 6.671403
Aug 2020 2.335488 -2.1791815 6.850157
Sep 2020 2.380941 -2.3596730 7.121556
Oct 2020 2.336451 -2.6090840 7.281986
Nov 2020 2.379998 -2.7724012 7.532398
Dec 2020 2.337374 -3.0043759 7.679124
Jan 2021 2.379095 -3.1545306 7.912720
Feb 2021 2.338258 -3.3722731 8.048790
Mar 2021 2.378229 -3.5119994 8.268458
Apr 2021 2.339105 -3.7177876 8.395998
May 2021 2.377400 -3.8490416 8.603842
Jun 2021 2.339917 -4.0445682 8.724402
Jul 2021 2.376606 -4.1688016 8.922013
Aug 2021 2.340695 -4.3553691 9.036758
Sep 2021 2.375845 -4.4736910 9.225380
Oct 2021 2.341440 -4.6523299 9.335209
plot(f)
30
In the above figure, we see the 24-month forecast results from November 2019 to October 2021 on the
line graph in blue. If we look at the forecast values and visualization, we can see that the result is not
satisfactory because of the conditional variance present in data. So, we prefer fitting a suitable GARCH
model to the data to achieve a satisfactory forecast.
GARCH
GARCH stands for Generalized Autoregressive Conditional Heteroskedasticity Models.
ARCH-GARCH is a method to measure volatility of the series, or more specifically, to model the noise
term of the ARIMA model. ARCHGARCH incorporates new data and analyzes the series using
conditional variances, allowing users to estimate future values using current data. We use the function
ugarchspec() for the model specification and ugarchfit() for the model fitting. I have considered the
GARCH(1,1) model.
spec <- ugarchspec(variance.model = list(model = "sGARCH",
garchOrder = c(1,1),
submodel = NULL,
external.regressors = NULL,
variance.targeting = FALSE),
31
mean.model = list(armaOrder = c(1,1),
external.regressors = NULL,
distribution.model = "norm"))
garch2 <- ugarchfit(spec = spec, data = temp_data_garch, solver.control =
list(trace=0))
*---------------------------------*
* GARCH Model Fit *
*---------------------------------*
Conditional Variance Dynamics
-----------------------------------
GARCH Model : sGARCH(1,1)
Mean Model : ARFIMA(1,0,1)
Distribution : norm
Optimal Parameters
------------------------------------
Estimate Std. Error t value Pr(>|t|)
mu 3.185522 0.681580 4.6737 0.000003
ar1 0.940684 0.027055 34.7688 0.000000
ma1 0.104821 0.087345 1.2001 0.230105
omega 0.064248 0.033063 1.9432 0.051990
alpha1 0.526372 0.155477 3.3855 0.000710
beta1 0.433065 0.160756 2.6939 0.007062
Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 3.185522 1.630688 1.95348 0.050762
ar1 0.940684 0.056062 16.77929 0.000000
ma1 0.104821 0.118274 0.88626 0.375480
omega 0.064248 0.089558 0.71739 0.473134
alpha1 0.526372 0.425682 1.23654 0.216259
beta1 0.433065 0.453003 0.95599 0.339079
LogLikelihood : -241.9342
32
Information Criteria
------------------------------------
Akaike 1.7460
Bayes 1.8231
Shibata 1.7451
Hannan-Quinn 1.7769
Weighted Ljung-Box Test on Standardized Residuals
------------------------------------
statistic p-value
Lag[1] 0.05491 0.8147
Lag[2*(p+q)+(p+q)-1][5] 2.10143 0.9360
Lag[4*(p+q)+(p+q)-1][9] 4.88106 0.4809
d.o.f=2
H0 : No serial correlation
Weighted Ljung-Box Test on Standardized Squared Residuals
------------------------------------
statistic p-value
Lag[1] 0.1689 0.6811
Lag[2*(p+q)+(p+q)-1][5] 0.8137 0.9002
Lag[4*(p+q)+(p+q)-1][9] 1.6380 0.9435
d.o.f=2
Weighted ARCH LM Tests
------------------------------------
Statistic Shape Scale P-Value
ARCH Lag[3] 0.2015 0.500 2.000 0.6535
ARCH Lag[5] 1.2899 1.440 1.667 0.6493
ARCH Lag[7] 1.6973 2.315 1.543 0.7810
Nyblom stability test
------------------------------------
Joint Statistic: 0.8687
Individual Statistics:
mu 0.1152
ar1 0.1061
33
ma1 0.2102
omega 0.4438
alpha1 0.2238
beta1 0.4395
Asymptotic Critical Values (10% 5% 1%)
Joint Statistic: 1.49 1.68 2.12
Individual Statistic: 0.35 0.47 0.75
Sign Bias Test
------------------------------------
t-value prob sig
Sign Bias 0.04582 0.9635
Negative Sign Bias 0.92013 0.3583
Positive Sign Bias 0.58065 0.5619
Joint Effect 1.91979 0.5892
Adjusted Pearson Goodness-of-Fit Test:
------------------------------------
group statistic p-value(g-1)
1 20 58.39 6.925e-06
2 30 71.56 1.847e-05
3 40 85.86 2.250e-05
4 50 94.87 9.304e-05
The output above displays the calculated parameter's significance. It displays the Akaike (AIC), Bayes
(BIC), Hannan-Quinn and Shibata criteria for the model estimation. The lower these values, the better
the model is in terms of fitting.
plot(garch2, which='all')
34
Here are the outcomes of the additional graphs demonstrating the model's functionality. We can observe
that the QQ-plot depicts a distribution that is more in line with the straight line.
forecast1 = ugarchforecast(fitORspec = garch2, n.ahead = 20)
fitted(forecast1)
35
plot(fitted(forecast1),type='l')
36
The plot above displays the values forecasted by garch fit.
plot(sigma(forecast1),type='l')
The above plot displays standard deviation of the forecasted values.
series<- c(nonseasonal_data,rep(NA,length(fitted(forecast1))))
forecastseries<- c(rep(NA,length(nonseasonal_data)),fitted(forecast1))
plot(series, type = "l")
lines(forecastseries, col = "green")
37
series<- c(tail(nonseasonal_data,100),rep(NA,length(fitted(forecast1))))
forecastseries<- c(rep(NA,100),fitted(forecast1))
plot(series, type = "l")
lines(forecastseries, col = "green")
In the above figure, we see the forecast results on the line graph in green. If we look at the forecast
values and visualization, it can be called a satisfactory result. So, the model
ARIMA(1,1,1)-GARCH(1,1) is a good fit to the data and we are getting adequate forecast results.
Conclusion
In this project, there was the forecasting of seasonal data of Perrin Freres monthly champagne sales
using time series forecasting. In this, the forecasting model was built using the SARIMA model.
Forecasting is essential in many sectors, especially in business sectors, because if a person knows about
facing profit or loss in the coming years, it can be easily handled to meet the problem efficiently. The
models used in this project were giving accurate results and projected the sales of champagne after ten
years. Also, the project used time series models to analyze the pattern of Henry Hub natural spot price.
We examined the GARCH model. We discovered that natural gas prices fluctuate dramatically. The
GARCH model gave us better performance as compared to ARIMA mode.
References
1. Cryer, J. D., & Chan, K. S. (2008). Time series analysis: with applications in R (Vol. 2). New
York: Springer
38
2. Katesari, H. S., & Vajargah, B. F. (2015). Testing adverse selection using frank copula approach
in Iran insurance markets. Mathematics and Computer Science,15(2), 154-158
3. Katesari, H. S., & Zarodi, S. (2016). Effects of coverage choice by predictive modeling on
frequency of accidents. Caspian Journal of Applied Sciences Research,5(3), 28-33
4. Safari-Katesari, H., Samadi, S. Y., & Zaroudi, S. (2020). Modelling count data via copulas.
Statistics,54(6), 1329-1355
5. Shumway, R. H., Stoffer, D. S., & Stoffer, D. S. (2000). Time series analysis and its applications
(Vol. 3). New York: springer
6. Safari-Katesari, H., & Zaroudi, S. (2020). Count copula regression model using generalized beta
distribution of the second kind. Statistics,21, 1-12
7. Safari-Katesari, H., & Zaroudi, S. (2021). Analysing the impact of dependency on conditional
survival functions using copulas. Statistics in Transition New Series,22(1)
8. Safari Katesari, H., (2021) Bayesian dynamic factor analysis and copula-based models for mixed
data, PhD dissertation, Southern Illinois University Carbondale
9. Tsay, R. S. (2013). Multivariate time series analysis: with R and financial applications. John
Wiley & Sons
10. Zaroudi, S., Faridrohani, M. R., Behzadi, M. H., & Safari-Katesari, H. (2022). Copula-based
Modeling for IBNR Claim Loss Reserving. arXiv preprint arXiv:2203.12750
Links
1. https://www.kaggle.com/datasets/galibce003/perrin-freres-monthly-champagne-sales
2. https://rpubs.com/kkim22/668273
3. https://www.idrisstsafack.com/post/garch-models-with-r-programming-a-practical-example-with-t
esla-stock
4. https://quant.stackexchange.com/questions/4948/how-to-fit-armagarch-model-in-r
5. https://talksonmarkets.files.wordpress.com/2012/09/time-series-analysis-with-arima-e28093-arc
h013.pdf
39
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Nowadays, insurance contract reserves for coupled lives are considered jointly, which has a significant influence on the process of determining actuarial reserves. In this paper, conditional survival distributions of life insurance reserves are computed using copulas. Subsequently, the results are compared with an independence case. These calculations are based on selected Archimedean copulas and apply when the ‘death of one individual’ condition exists. The estimation outcome indicates that the insurer reserves calculated by means of Archimedean copulas are far more effective than those resulting from an independence assumption. The study demonstrates that copula-based dependency modelling improves the calculations of reserves made for actuarial purposes.
Article
Full-text available
In the work of Valdez and Shi (2011) and Safari-atesari and Fathi-Vajargah (2015), copula model was fitted on empirical evidence and a predictive model was developed. In this article, we anticipate accident probability after viewing the accidents for the year. This type of actuarial application is predictive modeling for considering the effect of a policyholder's choice of coverage on frequency of accidents which can be used by using Bayes' rule. We can compute the probability by the Frank copula expression and based on the marginal distribution of policyholder's choice of coverage. According to the results, the largest conditional accident probability is observed for the "first level" and the lowest is observed for the "third level". Additionally, we derive the conditional expected frequency of claims for each policyholder and to examine the effect of policy selection on frequency of accidents, we carry out a pairwise comparison for the three types of coverage. Also, we investigate the effects of covariates on the accident probability without and with the information on the coverage choice for each single policyholder.
Article
Full-text available
Modelling claims severity for obtaining insurance premium is one of the major concerns of the insurance industry. There is a considerable amount of literature on the actuarial application of the copula model to calculate the pure premium. In this paper, we model claims severity for computing the pure premium in the collision market by means of the count copula model. Moreover, we apply a regression model using a generalized beta distribution of the second kind (GB2) to compute the premium for an average claim and the conditional computation for all coverage levels. Like many other researchers, we assume that the number of accidents is independent from the size of claims. For real data application, we use a portfolio of a major automobile insurer in Iran in 2007-2008, with a subsample of 59,547 policies available in their portfolio. We then proceed to compare the estimated premiums with the real premiums. The results demonstrate that there is strong positive dependency between the real premium and the estimated one.
Article
Full-text available
Existence of adverse selection in insurance markets could have irreversible effects on enterprise decision-making process and obligations of insurance companies. In this article, testing adverse selection is done by jointly modeling the coverage selection and accidents frequency using Frank's copula, where the dependence parameter states the existence of relationship between coverage selection and the frequency of accidents. Our margins are modeled by ordered logistic regression model for the coverage selection and negative binomial regression model for the accidents frequency. The copula model is calibrated using 59,547 one-year cross-sectional cases of collision insurance coverage of Iran Insurance co. The results indicate a significant positive coverage selection-accidents frequency relationship.
Article
Copula models have been widely used to model the dependence between continuous random variables, but modelling count data via copulas has recently become popular in the statistics literature. Spearman's rho is an appropriate and effective tool to measure the degree of dependence between two random variables. In this paper, we derive the population version of Spearman's rho via copulas when both random variables are discrete. The closed-form expressions of the Spearman correlation are obtained for some copulas with different marginal distributions. We derive the upper and lower bounds of Spearman's rho for Bernoulli marginals. The proposed Spearman's rho correlations are compared with their corresponding Kendall's tau values and their functional relationships are characterized in some special cases. An extensive simulation study is conducted to demonstrate the validity of our theoretical results. Finally, we propose a bivariate copula regression model to analyse the count data of a cervical cancer dataset.
Bayesian dynamic factor analysis and copula-based models for mixed data
  • H Safari Katesari
Safari Katesari, H., (2021) Bayesian dynamic factor analysis and copula-based models for mixed data, PhD dissertation, Southern Illinois University Carbondale
  • S Zaroudi
  • M R Faridrohani
  • M H Behzadi
  • H Safari-Katesari
Zaroudi, S., Faridrohani, M. R., Behzadi, M. H., & Safari-Katesari, H. (2022). Copula-based Modeling for IBNR Claim Loss Reserving. arXiv preprint arXiv:2203.12750