Content uploaded by Reeva Andipara
Author content
All content in this area was uploaded by Reeva Andipara on Jun 29, 2022
Content may be subject to copyright.
Applying ARIMA-GARCH models for time series analysis on
Seasonal and Nonseasonal datasets
Reeva Andipara
Department of Mathematical Sciences, Stevens Institute of Technology, Hoboken,
NJ
Supervisor: Dr. Hadi Safari Katesari
Abstract
Time series analysis is a method for analyzing data in order to spot trends and predict what will happen
in the future. We’re going to carry out time series analysis on two types of data i.e. seasonal and
non-seasonal data. This project will provide a procedure to analyze and model time series in R. The first
part covers analysis and forecast of monthly sales of Perrin Freres Champagne. The data is the number
of champagne sales every month for nearly ten years. Approaches of time series analysis are
autoregressive integrated moving average(ARIMA), seasonal autoregressive integrated moving
average(SARIMA), autoregressive moving average(ARMA), moving average(MA) and
autoregression(AR). The second part deals with the time series of Natural Gas Prices to analyze and
forecast the monthly price of natural gas. The core of the project is to provide a guide to ARIMA and
ARCH-GARCH and look at the combined model’s output and effectiveness in time series modeling and
forecasting.
Part A:
Seasonal Dataset: Perrin Freres Monthly Champagne Sales
Introduction
This project consists of predicting the monthly sales of champagne by using time series and will also
predict the future monthly sales of champagne. The problem statement is estimating the number of
champagne sales per month for the Perrin Freres brand. The main purpose of this project is to find the
future forecasting of the monthly sales of champagne. The data set used covers a period of nearly ten
years, from January 1964 to September 1972, and includes the number of champagne sales every month.
1
Dataset:
Dataset I've used is taken from Kaggle.
https://www.kaggle.com/datasets/galibce003/perrin-freres-monthly-champagne-sales
(summary(data))
Month Perrin.Freres.monthly.champagne.sales.millions..64..72
Length:107 Min. : 1413
Class :character 1st Qu.: 3113
Mode :character Median : 4217
Mean : 4761
3rd Qu.: 5221
Max. :13916
NA's :2
For the sales column, there are two missing values in the number of observations (count).
data = na.omit(data)
df = data$Perrin.Freres.monthly.champagne.sales.millions..64..72
Data Analysis:
For a preliminary analysis, let's plot the time series.
plot(ts(df, frequency=12, start=(1964)), ylab='Montly Champagne Sales')
2
From the above plot, we can observe that there is a peak in sales towards the end of every year. This is a
good example of seasonality as the pattern is repeating annually.
When looking at the line plot, it appears that there is an increasing sales trend throughout the historical
shift, as well as seasonality and the height of the cycles, implying that it is multiplicative. We can
observe that there was an increasing trend between 1964 and 1970 on this time series graph. Seasonality
shows that the series is almost non-stationary. If the series is not stationary, it is necessary to make the
series stationary. Further to verify the stationarity of data, I divided the data into 3 segments and
calculated the mean and variance. For the series to be stationary, the mean and variance should be
constant which is not the case here. So, we can say the series is non-stationary.
segment1 = temp[1:35]
segment2 = temp[36:70]
segment3 = temp[71:105]
print(mean(segment1))
print(mean(segment2))
print(mean(segment3))
print(var(segment1))
print(var(segment2))
print(var(segment3))
Output:
[1] 3740.171
[1] 5078.143
[1] 5465.143
[1] 2667015
[1] 5359573
[1] 10231412
The next step in our time series analysis is to review Autocorrelation Function (ACF) and Partial
Autocorrelation Function (PACF) plots. Autocorrelation functions (ACF) and partial autocorrelation
functions (PACF) both summarize the strength of the relationship between two variables over time.
acf(df,lag.max=50)
3
pacf(df,lag.max=50)
The sample ACF and PACF for series is shown in the above plots. We can see the seasonal
autocorrelation in the ACF plot. Also, there is another relationship that must be modeled. So, we take
one seasonal difference.
differenced_data = diff(df, lag=12)
seasonal_data = ts(differenced_data)
seasonal_data = ts(seasonal_data, frequency=12, start=(1964))
plot(seasonal_data, ylab='Montly Champagne Sales')
4
This is the time series plot after taking one seasonal difference of sales data. We can now see that the
increasing trend between 1964 and 1970 is no more.
adf.test(seasonal_data)
Output:
Warning message in adf.test(seasonal_data):
“p-value smaller than printed p-value”
Augmented Dickey-Fuller Test
data: seasonal_data
Dickey-Fuller = -4.0804, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary
ADF test gives p-value less than 0.05 so we reject the null hypothesis. Data is now stationary. Since
stationarity has been achieved at the first difference, the ARIMA model of order (p,1,q) will be used
where 'p' is the order of the autoregressive term and 'q' is the order of the moving average term.
Let’s have a look at ACF & PACF again.
acf(as.vector(seasonal_data), lag.max=100)
5
The plot of ACF after 1 difference suggests that there is very less autocorrelation. There are no
significant lags (or we can consider 1 lag) outside the confidence interval i.e. we can consider MA(0) (or
MA(1)) for non-seasonal part of SARIMA and for seasonal part MA(1) can be considered because we
do not see any seasonality trend or repetition. For the non-seasonal part, we may consider two models
i.e. MA(0) or MA(1) and for the seasonal part we may consider MA(1).
pacf(as.vector(seasonal_data),lag.max=100)
6
The plot of PACF after 1 difference suggests that there is very less autocorrelation. There are no
significant lags (or we can consider 1 lag) outside the confidence interval i.e. we can consider AR(0) (or
AR(1)) for non-seasonal part of SARIMA and for seasonal part AR(1) can be considered because we do
not see any seasonality trend or repetition. For the non-seasonal part, we may consider two models i.e.
AR(0) or AR(1) and for the seasonal part we may consider AR(1).
eacf(seasonal_data)
AR/MA
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 x o o o o o o o o o o x o o
1 o o o o o o o o o o o o o o
2 x x o o o o o o o o o o o o
3 o x o o o o o o o o o o o o
4 o x o o o o o o o o o o o o
5 o x o o o o o o o o o o o o
6 o x o o o o o o o o o o o o
7 x x o o o o o o o o o o o o
Looking at EACF plot we get the following models:
ARMA(0,1)
ARMA(1,1)
ARMA(1,0)
Models of Time Series
Because the series is seasonal, SARIMA (Seasonal ARIMA) will be used instead of ARIMA.
From ACF and PACF plot I chose below models to fit to my data:
Model 1: SARIMA(1,0,0)(1,1,0)[12]
Model 2: SARIMA(1,0,0)(1,1,1)[12]
Since the data is monthly data, seasonality is 12.
model1 = arima(seasonal_data, order=c(1,0,0), seasonal=c(1,0,0),method = 'ML')
print(model1)
Output:
7
Call:
arima(x = seasonal_data, order = c(1, 0, 0), seasonal = c(1, 0, 0), method =
"ML")
Coefficients:
ar1 sar1 intercept
0.2881 -0.2920 286.7916
s.e. 0.1017 0.0988 81.3043
sigma^2 estimated as 494084: log likelihood = -742.18, aic = 1490.35
bic = AIC(model1, k = log(length(seasonal_data)))
1502.48172313175
model2 = arima(seasonal_data, order=c(1,0,0), seasonal=c(1,0,1))
print(model2)
Call:
arima(x = seasonal_data, order = c(1, 0, 0), seasonal = c(1, 0, 1))
Coefficients:
ar1 sar1 sma1 intercept
0.2898 -0.5320 0.2731 287.2478
s.e. 0.1015 0.3968 0.4679 86.2407
sigma^2 estimated as 492002: log likelihood = -742.02, aic = 1492.04
bic = AIC(model2, k = log(length(seasonal_data)))
1506.70785176533
model3 = arima(seasonal_data, order=c(1,0,1), seasonal=c(1,0,0))
print(model3)
8
Call:
arima(x = seasonal_data, order = c(1, 0, 1), seasonal = c(1, 0, 0))
Coefficients:
ar1 ma1 sar1 intercept
0.3774 -0.0987 -0.2885 286.0528
s.e. 0.3822 0.4192 0.1001 83.9712
sigma^2 estimated as 493945: log likelihood = -742.15, aic = 1492.3
bic = AIC(model3, k = log(length(seasonal_data)))
1506.96237063021
Comparing the above models based on AIC and BIC values:
Model 1 has the least AIC and BIC values.
Residual Analysis:
The “residuals” in a time series model are what is left over after fitting a model. Residuals are useful for
determining if a model has captured all of the data's information. A good forecasting method will yield
residuals with the following properties:
1) The residuals are uncorrelated.
2) The residuals have zero mean.
We plot sample ACF and PACF to check if there's any correlation in residuals and normality of the
residuals is checked using histogram and qq plots.
residuals_model1 = model1$residuals
residuals_model2 = model2$residuals
residuals_model3 = model3$residuals
Plotting the residuals:
For model 1:
plot(residuals_model1, type='o')
9
Examining a plot of the residuals over time serves as our first diagnostic check. If the model is accurate,
we anticipate that the plot will show a rectangle dispersion around a horizontal level of zero with no
visible trends.
acf(as.vector(residuals_model1), lag.max = 50)
From the ACF of residuals, we can see that there is almost no autocorrelation between residuals.
qqnorm(residuals_model1)
qqline(residuals_model1)
10
Here, extreme values look suspect.
hist(residuals_model1)
QQ plot and histogram suggest the normality of residuals.
print(shapiro.test(residuals_model1))
Shapiro-Wilk normality test
data: residuals_model1
W = 0.96675, p-value = 0.01797
tsdiag(model1)
11
It is helpful to have a check that considers residual correlations' aggregate magnitudes in order to look at
them at specific lags.
For model 2:
Let’s follow the same procedure for analysis of residuals of model 2.
plot(residuals_model2, type='o')
acf(as.vector(residuals_model2))
12
qqnorm(residuals_model2)
qqline(residuals_model2)
hist(residuals_model2)
13
From histogram we can see that the graph is not exactly bell-shaped. Also from qq plots we can see
some outliers. So, we use the Shapiro-Wilk test to test the normality of residuals.
print(shapiro.test(residuals_model2))
Shapiro-Wilk normality test
data: residuals_model2
W = 0.97024, p-value = 0.03194
We plot the sample ACF of the residuals. We can not see any significant correlation here except at lag 13
which is just one out of 50 so it is reasonable. The lack of correlation suggests the models are good. So,
we consider model 1 for forecasting.
Forecasting
After the fit process is completed and after getting the best model we use that model on our original data
to forecast.
forecast_model1<-arima(ts(df,frequency=12,start=(1964)),order=c(1,1,0),
seasonal=c(1,1,0))
forecast_model1$x<-ts(df, frequency=12, start=(1964))
14
forecast_model1$x
I have used the forecast() function to produce forecasts from the model. The forecast() method can
handle a wide range of inputs. It usually uses a time series or a time series model as its core argument
and generates appropriate forecasts. I have used a time series model to forecast the next 24 values of the
series with 95% confidence interval.
forecast_values = forecast(forecast_model1,level=c(95),h=24)
forecast_values
Point Forecast Lo 95 Hi 95
Oct 1972 6807.862 5208.1371 8407.587
Nov 1972 9897.162 8001.5337 11792.791
Dec 1972 12820.546 10561.2306 15079.861
Jan 1973 4260.002 1723.9659 6796.039
Feb 1973 3476.941 679.7747 6274.108
Mar 1973 4524.189 1492.2431 7556.134
Apr 1973 4788.501 1537.3473 8039.655
May 1973 4769.733 1313.7064 8225.759
Jun 1973 5214.846 1565.2712 8864.420
Jul 1973 4432.614 599.3059 8265.922
Aug 1973 1520.925 -2487.7232 5529.573
Sep 1973 5933.360 1756.7334 10109.987
15
Oct 1973 6893.944 2129.4162 11658.471
Nov 1973 9917.496 4793.7517 15041.241
Dec 1973 12809.585 7297.1328 18322.037
Jan 1974 4320.559 -1536.4248 10177.542
Feb 1974 3537.216 -2651.4701 9725.902
Mar 1974 4574.196 -1927.0927 11075.484
Apr 1974 4822.525 -1977.7762 11622.826
May 1974 4758.416 -2328.0245 11844.856
Jun 1974 5278.147 -2083.4128 12639.707
Jul 1974 4426.429 -3200.2991 12053.158
Aug 1974 1522.742 -6360.2521 9405.736
Sep 1974 5950.637 -2180.5460 14081.820
The above table shows the next 24 forecasted values of the time series.
plot(forecast_values)
In the above figure, we see the 24-month forecast results from October 1972 to September 1974 on the
line graph in blue. If we look at the forecast values and visualization, it can be called a satisfactory
16
result. So, the model SARIMA(1,0,0)(1,1,0)[12] is a good fit to the data and we are getting adequate
forecast results.
Part B:
Nonseasonal Dataset: Natural gas prices
Introduction
This project uses a time series of Natural Gas Prices that includes US Henry Hub. The problem
statement is to forecast the monthly price of natural gas using time series analysis techniques.
Methodology is to first come up with possible models to describe the change in the gas price and then
pick the best model by comparing the performance of each model. The dataset used covers a period of
24 years, from January 1997 to August 2020. I used such a wide range of times since the price tends to
fluctuate much and wanted to see the big trend of the price change.
Dataset: Data comes from U.S. Energy Information Administration
https://datahub.io/core/natural-gas
df = read.csv("/content/monthly.csv")
Data Analysis
summary(df)
Month Price
Length:284 Min. : 1.630
Class :character 1st Qu.: 2.660
Mode :character Median : 3.560
Mean : 4.208
3rd Qu.: 5.327
Max. :13.420
nonseasonal_data = ts(nonseasonal_data,frequency=12,start=c(1997,1))
plot(nonseasonal_data)
17
From the above plot, we can see that there is no trend in data, we can see that there is a significant pump
in 2005 but there's no seasonality.
acf(nonseasonal_data)
18
The above ACF is “decaying”, or decreasing, very slowly, and remains well above the significance
range (dotted blue lines). This is indicative of a non-stationary series.
pacf(nonseasonal_data)
adf.test(nonseasonal_data)
Augmented Dickey-Fuller Test
data: nonseasonal_data
Dickey-Fuller = -2.9978, Lag order = 6, p-value = 0.1557
alternative hypothesis: stationary
When we looked at the gas price graph and ACF, it did not look stationary so we ran the ADF test and
the p-value was 0.1557, indicating that it is not a stationary process. So, I took the difference in gas
prices. When we ran the ADF test of difference in gas prices, the p-value was less than 0.05, indicating a
stationary process with a possible linear trend. However, from the plot, it was clear that there is no linear
trend.
nonseasonal_diff = diff(nonseasonal_data)
plot(nonseasonal_diff)
19
acf(as.vector(nonseasonal_diff))
The plot of ACF after 1 difference suggests that there is very less autocorrelation. There is 1 lag that is
significantly outside the confidence interval i.e. we can consider MA(1).
pacf(as.vector(nonseasonal_diff))
20
The plot of PACF after 1 difference suggests that there is very less autocorrelation. There is 1 lag
significantly outside the confidence interval i.e. we can consider AR(1) (or AR(2) there's lag #5 outside
CI).
adf.test(nonseasonal_diff)
“p-value smaller than printed p-value”
Augmented Dickey-Fuller Test
data: nonseasonal_diff
Dickey-Fuller = -6.9687, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary
ADF test gives p-value less than 0.05 so we reject the null hypothesis. Data is now stationary. Since
stationarity has been achieved at the first difference, the ARIMA model of order (p,1,q) will be used
where 'p' is the order of the autoregressive term and 'q' is the order of the moving average term.
eacf(nonseasonal_diff)
AR/MA
21
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 o o o o x o o o x o o o o o
1 x o o o x o o o x o o o o o
2 x o o o o o o o x o o o o o
3 x x o o o o o o x o o o o o
4 x o x o o o o o x o o o x o
5 x x x x x o o o x o o o x o
6 x x o x o x o o x o o o x o
7 x x o x o x x o o o o o o o
Models of Time Series
Because the series is non-seasonal, the ARIMA model will be used.
From ACF and PACF plot we chose below models to fit to my data:
Model 1: ARIMA(1,0,1)
Model 2: ARIMA(2,0,1)
ns_model1 = arima(nonseasonal_diff, order=c(1,0,1))
print(ns_model1)
Call:
arima(x = nonseasonal_diff, order = c(1, 0, 1))
Coefficients:
ar1 ma1 intercept
-0.9769 0.9437 -0.0057
s.e. 0.0268 0.0429 0.0440
sigma^2 estimated as 0.5478: log likelihood = -305.31, aic = 616.63
bic = AIC(ns_model1, k = log(length(nonseasonal_diff)))
633.066517367315
ns_model2 = arima(nonseasonal_diff, order=c(2,0,1))
print(ns_model2)
22
Call:
arima(x = nonseasonal_diff, order = c(2, 0, 1))
Coefficients:
ar1 ar2 ma1 intercept
-0.9463 0.0296 0.9407 -0.0041
s.e. 0.0756 0.0638 0.0463 0.0453
sigma^2 estimated as 0.5473: log likelihood = -305.2, aic = 618.4
bic = AIC(ns_model2, k = log(length(nonseasonal_diff)))
638.451797725991
Comparing the above models based on AIC and BIC values:
Model 1 has the least AIC and BIC values.
Residual Analysis:
We plot sample ACF and PACF to check if there's any correlation in residuals and normality of the
residuals is checked using histogram and qq plots
nonseasonal_residuals_model1 = ns_model1$residuals
nonseasonal_residuals_model2 = ns_model2$residuals
For model 1:
plot(nonseasonal_residuals_model1, type='o')
23
There is no visible trend in the plot of residuals.
acf(nonseasonal_residuals_model1)
pacf(as.vector(nonseasonal_residuals_model1))
24
The above ACF and PACF plots of residuals suggest that there is no significant correlation among the
residuals. Now, for checking the normality of the residuals let’s plot qq plots and histogram.
qqnorm(nonseasonal_residuals_model1)
qqline(nonseasonal_residuals_model1)
Here, extreme values look suspect as it seems they do not follow normality.
hist(nonseasonal_residuals_model1)
25
print(shapiro.test(nonseasonal_residuals_model1))
Shapiro-Wilk normality test
data: nonseasonal_residuals_model1
W = 0.86943, p-value = 1.783e-14
For model 2:
Let’s follow the same procedure to analyze the residuals of model 2.
plot(nonseasonal_residuals_model2, type='o')
26
acf(as.vector(nonseasonal_residuals_model2), lag.max=50)
pacf(as.vector(nonseasonal_residuals_model2))
27
qqnorm(nonseasonal_residuals_model2)
qqline(nonseasonal_residuals_model2)
hist(nonseasonal_residuals_model2)
28
print(shapiro.test(nonseasonal_residuals_model2))
Shapiro-Wilk normality test
data: nonseasonal_residuals_model2
W = 0.86694, p-value = 1.267e-14
We plot the sample ACF of the residuals. We can see that there is dependency in residuals. Histograms
show that it follows approximately normal distribution. Looking at the residual analysis of both the
models, model 1 seems a good fit. So, we will consider model 1 to further forecast the series.
Forecasting
model1<-arima(ts(nonseasonal_data,frequency=12,start=c(1997,1)),
order=c(1,1,1))
model1$x<-ts(nonseasonal_data, frequency=12,start=c(1997))
I have used a time series model to forecast the next 24 values of the series with 95% confidence interval.
f=forecast(model1,level=c(95),h=24)
Point Forecast Lo 95 Hi 95
Nov 2019 2.386313 0.9357244 3.836901
Dec 2019 2.331194 0.3121041 4.350283
Jan 2020 2.385144 -0.1004326 4.870721
29
Feb 2020 2.332337 -0.5230609 5.187735
Mar 2020 2.384025 -0.8178421 5.585892
Apr 2020 2.333433 -1.1636714 5.830536
May 2020 2.382953 -1.4019901 6.167896
Jun 2020 2.334482 -1.7035925 6.372557
Jul 2020 2.381926 -1.9075517 6.671403
Aug 2020 2.335488 -2.1791815 6.850157
Sep 2020 2.380941 -2.3596730 7.121556
Oct 2020 2.336451 -2.6090840 7.281986
Nov 2020 2.379998 -2.7724012 7.532398
Dec 2020 2.337374 -3.0043759 7.679124
Jan 2021 2.379095 -3.1545306 7.912720
Feb 2021 2.338258 -3.3722731 8.048790
Mar 2021 2.378229 -3.5119994 8.268458
Apr 2021 2.339105 -3.7177876 8.395998
May 2021 2.377400 -3.8490416 8.603842
Jun 2021 2.339917 -4.0445682 8.724402
Jul 2021 2.376606 -4.1688016 8.922013
Aug 2021 2.340695 -4.3553691 9.036758
Sep 2021 2.375845 -4.4736910 9.225380
Oct 2021 2.341440 -4.6523299 9.335209
plot(f)
30
In the above figure, we see the 24-month forecast results from November 2019 to October 2021 on the
line graph in blue. If we look at the forecast values and visualization, we can see that the result is not
satisfactory because of the conditional variance present in data. So, we prefer fitting a suitable GARCH
model to the data to achieve a satisfactory forecast.
GARCH
GARCH stands for Generalized Autoregressive Conditional Heteroskedasticity Models.
ARCH-GARCH is a method to measure volatility of the series, or more specifically, to model the noise
term of the ARIMA model. ARCHGARCH incorporates new data and analyzes the series using
conditional variances, allowing users to estimate future values using current data. We use the function
ugarchspec() for the model specification and ugarchfit() for the model fitting. I have considered the
GARCH(1,1) model.
spec <- ugarchspec(variance.model = list(model = "sGARCH",
garchOrder = c(1,1),
submodel = NULL,
external.regressors = NULL,
variance.targeting = FALSE),
31
mean.model = list(armaOrder = c(1,1),
external.regressors = NULL,
distribution.model = "norm"))
garch2 <- ugarchfit(spec = spec, data = temp_data_garch, solver.control =
list(trace=0))
*---------------------------------*
* GARCH Model Fit *
*---------------------------------*
Conditional Variance Dynamics
-----------------------------------
GARCH Model : sGARCH(1,1)
Mean Model : ARFIMA(1,0,1)
Distribution : norm
Optimal Parameters
------------------------------------
Estimate Std. Error t value Pr(>|t|)
mu 3.185522 0.681580 4.6737 0.000003
ar1 0.940684 0.027055 34.7688 0.000000
ma1 0.104821 0.087345 1.2001 0.230105
omega 0.064248 0.033063 1.9432 0.051990
alpha1 0.526372 0.155477 3.3855 0.000710
beta1 0.433065 0.160756 2.6939 0.007062
Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 3.185522 1.630688 1.95348 0.050762
ar1 0.940684 0.056062 16.77929 0.000000
ma1 0.104821 0.118274 0.88626 0.375480
omega 0.064248 0.089558 0.71739 0.473134
alpha1 0.526372 0.425682 1.23654 0.216259
beta1 0.433065 0.453003 0.95599 0.339079
LogLikelihood : -241.9342
32
Information Criteria
------------------------------------
Akaike 1.7460
Bayes 1.8231
Shibata 1.7451
Hannan-Quinn 1.7769
Weighted Ljung-Box Test on Standardized Residuals
------------------------------------
statistic p-value
Lag[1] 0.05491 0.8147
Lag[2*(p+q)+(p+q)-1][5] 2.10143 0.9360
Lag[4*(p+q)+(p+q)-1][9] 4.88106 0.4809
d.o.f=2
H0 : No serial correlation
Weighted Ljung-Box Test on Standardized Squared Residuals
------------------------------------
statistic p-value
Lag[1] 0.1689 0.6811
Lag[2*(p+q)+(p+q)-1][5] 0.8137 0.9002
Lag[4*(p+q)+(p+q)-1][9] 1.6380 0.9435
d.o.f=2
Weighted ARCH LM Tests
------------------------------------
Statistic Shape Scale P-Value
ARCH Lag[3] 0.2015 0.500 2.000 0.6535
ARCH Lag[5] 1.2899 1.440 1.667 0.6493
ARCH Lag[7] 1.6973 2.315 1.543 0.7810
Nyblom stability test
------------------------------------
Joint Statistic: 0.8687
Individual Statistics:
mu 0.1152
ar1 0.1061
33
ma1 0.2102
omega 0.4438
alpha1 0.2238
beta1 0.4395
Asymptotic Critical Values (10% 5% 1%)
Joint Statistic: 1.49 1.68 2.12
Individual Statistic: 0.35 0.47 0.75
Sign Bias Test
------------------------------------
t-value prob sig
Sign Bias 0.04582 0.9635
Negative Sign Bias 0.92013 0.3583
Positive Sign Bias 0.58065 0.5619
Joint Effect 1.91979 0.5892
Adjusted Pearson Goodness-of-Fit Test:
------------------------------------
group statistic p-value(g-1)
1 20 58.39 6.925e-06
2 30 71.56 1.847e-05
3 40 85.86 2.250e-05
4 50 94.87 9.304e-05
The output above displays the calculated parameter's significance. It displays the Akaike (AIC), Bayes
(BIC), Hannan-Quinn and Shibata criteria for the model estimation. The lower these values, the better
the model is in terms of fitting.
plot(garch2, which='all')
34
Here are the outcomes of the additional graphs demonstrating the model's functionality. We can observe
that the QQ-plot depicts a distribution that is more in line with the straight line.
forecast1 = ugarchforecast(fitORspec = garch2, n.ahead = 20)
fitted(forecast1)
35
plot(fitted(forecast1),type='l')
36
The plot above displays the values forecasted by garch fit.
plot(sigma(forecast1),type='l')
The above plot displays standard deviation of the forecasted values.
series<- c(nonseasonal_data,rep(NA,length(fitted(forecast1))))
forecastseries<- c(rep(NA,length(nonseasonal_data)),fitted(forecast1))
plot(series, type = "l")
lines(forecastseries, col = "green")
37
series<- c(tail(nonseasonal_data,100),rep(NA,length(fitted(forecast1))))
forecastseries<- c(rep(NA,100),fitted(forecast1))
plot(series, type = "l")
lines(forecastseries, col = "green")
In the above figure, we see the forecast results on the line graph in green. If we look at the forecast
values and visualization, it can be called a satisfactory result. So, the model
ARIMA(1,1,1)-GARCH(1,1) is a good fit to the data and we are getting adequate forecast results.
Conclusion
In this project, there was the forecasting of seasonal data of Perrin Freres monthly champagne sales
using time series forecasting. In this, the forecasting model was built using the SARIMA model.
Forecasting is essential in many sectors, especially in business sectors, because if a person knows about
facing profit or loss in the coming years, it can be easily handled to meet the problem efficiently. The
models used in this project were giving accurate results and projected the sales of champagne after ten
years. Also, the project used time series models to analyze the pattern of Henry Hub natural spot price.
We examined the GARCH model. We discovered that natural gas prices fluctuate dramatically. The
GARCH model gave us better performance as compared to ARIMA mode.
References
1. Cryer, J. D., & Chan, K. S. (2008). Time series analysis: with applications in R (Vol. 2). New
York: Springer
38
2. Katesari, H. S., & Vajargah, B. F. (2015). Testing adverse selection using frank copula approach
in Iran insurance markets. Mathematics and Computer Science,15(2), 154-158
3. Katesari, H. S., & Zarodi, S. (2016). Effects of coverage choice by predictive modeling on
frequency of accidents. Caspian Journal of Applied Sciences Research,5(3), 28-33
4. Safari-Katesari, H., Samadi, S. Y., & Zaroudi, S. (2020). Modelling count data via copulas.
Statistics,54(6), 1329-1355
5. Shumway, R. H., Stoffer, D. S., & Stoffer, D. S. (2000). Time series analysis and its applications
(Vol. 3). New York: springer
6. Safari-Katesari, H., & Zaroudi, S. (2020). Count copula regression model using generalized beta
distribution of the second kind. Statistics,21, 1-12
7. Safari-Katesari, H., & Zaroudi, S. (2021). Analysing the impact of dependency on conditional
survival functions using copulas. Statistics in Transition New Series,22(1)
8. Safari Katesari, H., (2021) Bayesian dynamic factor analysis and copula-based models for mixed
data, PhD dissertation, Southern Illinois University Carbondale
9. Tsay, R. S. (2013). Multivariate time series analysis: with R and financial applications. John
Wiley & Sons
10. Zaroudi, S., Faridrohani, M. R., Behzadi, M. H., & Safari-Katesari, H. (2022). Copula-based
Modeling for IBNR Claim Loss Reserving. arXiv preprint arXiv:2203.12750
Links
1. https://www.kaggle.com/datasets/galibce003/perrin-freres-monthly-champagne-sales
2. https://rpubs.com/kkim22/668273
3. https://www.idrisstsafack.com/post/garch-models-with-r-programming-a-practical-example-with-t
esla-stock
4. https://quant.stackexchange.com/questions/4948/how-to-fit-armagarch-model-in-r
5. https://talksonmarkets.files.wordpress.com/2012/09/time-series-analysis-with-arima-e28093-arc
h013.pdf
39