Content uploaded by Reeva Andipara

Author content

All content in this area was uploaded by Reeva Andipara on Jun 29, 2022

Content may be subject to copyright.

Applying ARIMA-GARCH models for time series analysis on

Seasonal and Nonseasonal datasets

Reeva Andipara

Department of Mathematical Sciences, Stevens Institute of Technology, Hoboken,

NJ

Supervisor: Dr. Hadi Safari Katesari

Abstract

Time series analysis is a method for analyzing data in order to spot trends and predict what will happen

in the future. We’re going to carry out time series analysis on two types of data i.e. seasonal and

non-seasonal data. This project will provide a procedure to analyze and model time series in R. The first

part covers analysis and forecast of monthly sales of Perrin Freres Champagne. The data is the number

of champagne sales every month for nearly ten years. Approaches of time series analysis are

autoregressive integrated moving average(ARIMA), seasonal autoregressive integrated moving

average(SARIMA), autoregressive moving average(ARMA), moving average(MA) and

autoregression(AR). The second part deals with the time series of Natural Gas Prices to analyze and

forecast the monthly price of natural gas. The core of the project is to provide a guide to ARIMA and

ARCH-GARCH and look at the combined model’s output and effectiveness in time series modeling and

forecasting.

Part A:

Seasonal Dataset: Perrin Freres Monthly Champagne Sales

Introduction

This project consists of predicting the monthly sales of champagne by using time series and will also

predict the future monthly sales of champagne. The problem statement is estimating the number of

champagne sales per month for the Perrin Freres brand. The main purpose of this project is to find the

future forecasting of the monthly sales of champagne. The data set used covers a period of nearly ten

years, from January 1964 to September 1972, and includes the number of champagne sales every month.

1

Dataset:

Dataset I've used is taken from Kaggle.

https://www.kaggle.com/datasets/galibce003/perrin-freres-monthly-champagne-sales

(summary(data))

Month Perrin.Freres.monthly.champagne.sales.millions..64..72

Length:107 Min. : 1413

Class :character 1st Qu.: 3113

Mode :character Median : 4217

Mean : 4761

3rd Qu.: 5221

Max. :13916

NA's :2

For the sales column, there are two missing values in the number of observations (count).

data = na.omit(data)

df = data$Perrin.Freres.monthly.champagne.sales.millions..64..72

Data Analysis:

For a preliminary analysis, let's plot the time series.

plot(ts(df, frequency=12, start=(1964)), ylab='Montly Champagne Sales')

2

From the above plot, we can observe that there is a peak in sales towards the end of every year. This is a

good example of seasonality as the pattern is repeating annually.

When looking at the line plot, it appears that there is an increasing sales trend throughout the historical

shift, as well as seasonality and the height of the cycles, implying that it is multiplicative. We can

observe that there was an increasing trend between 1964 and 1970 on this time series graph. Seasonality

shows that the series is almost non-stationary. If the series is not stationary, it is necessary to make the

series stationary. Further to verify the stationarity of data, I divided the data into 3 segments and

calculated the mean and variance. For the series to be stationary, the mean and variance should be

constant which is not the case here. So, we can say the series is non-stationary.

segment1 = temp[1:35]

segment2 = temp[36:70]

segment3 = temp[71:105]

print(mean(segment1))

print(mean(segment2))

print(mean(segment3))

print(var(segment1))

print(var(segment2))

print(var(segment3))

Output:

[1] 3740.171

[1] 5078.143

[1] 5465.143

[1] 2667015

[1] 5359573

[1] 10231412

The next step in our time series analysis is to review Autocorrelation Function (ACF) and Partial

Autocorrelation Function (PACF) plots. Autocorrelation functions (ACF) and partial autocorrelation

functions (PACF) both summarize the strength of the relationship between two variables over time.

acf(df,lag.max=50)

3

pacf(df,lag.max=50)

The sample ACF and PACF for series is shown in the above plots. We can see the seasonal

autocorrelation in the ACF plot. Also, there is another relationship that must be modeled. So, we take

one seasonal difference.

differenced_data = diff(df, lag=12)

seasonal_data = ts(differenced_data)

seasonal_data = ts(seasonal_data, frequency=12, start=(1964))

plot(seasonal_data, ylab='Montly Champagne Sales')

4

This is the time series plot after taking one seasonal difference of sales data. We can now see that the

increasing trend between 1964 and 1970 is no more.

adf.test(seasonal_data)

Output:

Warning message in adf.test(seasonal_data):

“p-value smaller than printed p-value”

Augmented Dickey-Fuller Test

data: seasonal_data

Dickey-Fuller = -4.0804, Lag order = 4, p-value = 0.01

alternative hypothesis: stationary

ADF test gives p-value less than 0.05 so we reject the null hypothesis. Data is now stationary. Since

stationarity has been achieved at the first difference, the ARIMA model of order (p,1,q) will be used

where 'p' is the order of the autoregressive term and 'q' is the order of the moving average term.

Let’s have a look at ACF & PACF again.

acf(as.vector(seasonal_data), lag.max=100)

5

The plot of ACF after 1 difference suggests that there is very less autocorrelation. There are no

significant lags (or we can consider 1 lag) outside the confidence interval i.e. we can consider MA(0) (or

MA(1)) for non-seasonal part of SARIMA and for seasonal part MA(1) can be considered because we

do not see any seasonality trend or repetition. For the non-seasonal part, we may consider two models

i.e. MA(0) or MA(1) and for the seasonal part we may consider MA(1).

pacf(as.vector(seasonal_data),lag.max=100)

6

The plot of PACF after 1 difference suggests that there is very less autocorrelation. There are no

significant lags (or we can consider 1 lag) outside the confidence interval i.e. we can consider AR(0) (or

AR(1)) for non-seasonal part of SARIMA and for seasonal part AR(1) can be considered because we do

not see any seasonality trend or repetition. For the non-seasonal part, we may consider two models i.e.

AR(0) or AR(1) and for the seasonal part we may consider AR(1).

eacf(seasonal_data)

AR/MA

0 1 2 3 4 5 6 7 8 9 10 11 12 13

0 x o o o o o o o o o o x o o

1 o o o o o o o o o o o o o o

2 x x o o o o o o o o o o o o

3 o x o o o o o o o o o o o o

4 o x o o o o o o o o o o o o

5 o x o o o o o o o o o o o o

6 o x o o o o o o o o o o o o

7 x x o o o o o o o o o o o o

Looking at EACF plot we get the following models:

ARMA(0,1)

ARMA(1,1)

ARMA(1,0)

Models of Time Series

Because the series is seasonal, SARIMA (Seasonal ARIMA) will be used instead of ARIMA.

From ACF and PACF plot I chose below models to fit to my data:

Model 1: SARIMA(1,0,0)(1,1,0)[12]

Model 2: SARIMA(1,0,0)(1,1,1)[12]

Since the data is monthly data, seasonality is 12.

model1 = arima(seasonal_data, order=c(1,0,0), seasonal=c(1,0,0),method = 'ML')

print(model1)

Output:

7

Call:

arima(x = seasonal_data, order = c(1, 0, 0), seasonal = c(1, 0, 0), method =

"ML")

Coefficients:

ar1 sar1 intercept

0.2881 -0.2920 286.7916

s.e. 0.1017 0.0988 81.3043

sigma^2 estimated as 494084: log likelihood = -742.18, aic = 1490.35

bic = AIC(model1, k = log(length(seasonal_data)))

1502.48172313175

model2 = arima(seasonal_data, order=c(1,0,0), seasonal=c(1,0,1))

print(model2)

Call:

arima(x = seasonal_data, order = c(1, 0, 0), seasonal = c(1, 0, 1))

Coefficients:

ar1 sar1 sma1 intercept

0.2898 -0.5320 0.2731 287.2478

s.e. 0.1015 0.3968 0.4679 86.2407

sigma^2 estimated as 492002: log likelihood = -742.02, aic = 1492.04

bic = AIC(model2, k = log(length(seasonal_data)))

1506.70785176533

model3 = arima(seasonal_data, order=c(1,0,1), seasonal=c(1,0,0))

print(model3)

8

Call:

arima(x = seasonal_data, order = c(1, 0, 1), seasonal = c(1, 0, 0))

Coefficients:

ar1 ma1 sar1 intercept

0.3774 -0.0987 -0.2885 286.0528

s.e. 0.3822 0.4192 0.1001 83.9712

sigma^2 estimated as 493945: log likelihood = -742.15, aic = 1492.3

bic = AIC(model3, k = log(length(seasonal_data)))

1506.96237063021

Comparing the above models based on AIC and BIC values:

Model 1 has the least AIC and BIC values.

Residual Analysis:

The “residuals” in a time series model are what is left over after fitting a model. Residuals are useful for

determining if a model has captured all of the data's information. A good forecasting method will yield

residuals with the following properties:

1) The residuals are uncorrelated.

2) The residuals have zero mean.

We plot sample ACF and PACF to check if there's any correlation in residuals and normality of the

residuals is checked using histogram and qq plots.

residuals_model1 = model1$residuals

residuals_model2 = model2$residuals

residuals_model3 = model3$residuals

Plotting the residuals:

For model 1:

plot(residuals_model1, type='o')

9

Examining a plot of the residuals over time serves as our first diagnostic check. If the model is accurate,

we anticipate that the plot will show a rectangle dispersion around a horizontal level of zero with no

visible trends.

acf(as.vector(residuals_model1), lag.max = 50)

From the ACF of residuals, we can see that there is almost no autocorrelation between residuals.

qqnorm(residuals_model1)

qqline(residuals_model1)

10

Here, extreme values look suspect.

hist(residuals_model1)

QQ plot and histogram suggest the normality of residuals.

print(shapiro.test(residuals_model1))

Shapiro-Wilk normality test

data: residuals_model1

W = 0.96675, p-value = 0.01797

tsdiag(model1)

11

It is helpful to have a check that considers residual correlations' aggregate magnitudes in order to look at

them at specific lags.

For model 2:

Let’s follow the same procedure for analysis of residuals of model 2.

plot(residuals_model2, type='o')

acf(as.vector(residuals_model2))

12

qqnorm(residuals_model2)

qqline(residuals_model2)

hist(residuals_model2)

13

From histogram we can see that the graph is not exactly bell-shaped. Also from qq plots we can see

some outliers. So, we use the Shapiro-Wilk test to test the normality of residuals.

print(shapiro.test(residuals_model2))

Shapiro-Wilk normality test

data: residuals_model2

W = 0.97024, p-value = 0.03194

We plot the sample ACF of the residuals. We can not see any significant correlation here except at lag 13

which is just one out of 50 so it is reasonable. The lack of correlation suggests the models are good. So,

we consider model 1 for forecasting.

Forecasting

After the fit process is completed and after getting the best model we use that model on our original data

to forecast.

forecast_model1<-arima(ts(df,frequency=12,start=(1964)),order=c(1,1,0),

seasonal=c(1,1,0))

forecast_model1$x<-ts(df, frequency=12, start=(1964))

14

forecast_model1$x

I have used the forecast() function to produce forecasts from the model. The forecast() method can

handle a wide range of inputs. It usually uses a time series or a time series model as its core argument

and generates appropriate forecasts. I have used a time series model to forecast the next 24 values of the

series with 95% confidence interval.

forecast_values = forecast(forecast_model1,level=c(95),h=24)

forecast_values

Point Forecast Lo 95 Hi 95

Oct 1972 6807.862 5208.1371 8407.587

Nov 1972 9897.162 8001.5337 11792.791

Dec 1972 12820.546 10561.2306 15079.861

Jan 1973 4260.002 1723.9659 6796.039

Feb 1973 3476.941 679.7747 6274.108

Mar 1973 4524.189 1492.2431 7556.134

Apr 1973 4788.501 1537.3473 8039.655

May 1973 4769.733 1313.7064 8225.759

Jun 1973 5214.846 1565.2712 8864.420

Jul 1973 4432.614 599.3059 8265.922

Aug 1973 1520.925 -2487.7232 5529.573

Sep 1973 5933.360 1756.7334 10109.987

15

Oct 1973 6893.944 2129.4162 11658.471

Nov 1973 9917.496 4793.7517 15041.241

Dec 1973 12809.585 7297.1328 18322.037

Jan 1974 4320.559 -1536.4248 10177.542

Feb 1974 3537.216 -2651.4701 9725.902

Mar 1974 4574.196 -1927.0927 11075.484

Apr 1974 4822.525 -1977.7762 11622.826

May 1974 4758.416 -2328.0245 11844.856

Jun 1974 5278.147 -2083.4128 12639.707

Jul 1974 4426.429 -3200.2991 12053.158

Aug 1974 1522.742 -6360.2521 9405.736

Sep 1974 5950.637 -2180.5460 14081.820

The above table shows the next 24 forecasted values of the time series.

plot(forecast_values)

In the above figure, we see the 24-month forecast results from October 1972 to September 1974 on the

line graph in blue. If we look at the forecast values and visualization, it can be called a satisfactory

16

result. So, the model SARIMA(1,0,0)(1,1,0)[12] is a good fit to the data and we are getting adequate

forecast results.

Part B:

Nonseasonal Dataset: Natural gas prices

Introduction

This project uses a time series of Natural Gas Prices that includes US Henry Hub. The problem

statement is to forecast the monthly price of natural gas using time series analysis techniques.

Methodology is to first come up with possible models to describe the change in the gas price and then

pick the best model by comparing the performance of each model. The dataset used covers a period of

24 years, from January 1997 to August 2020. I used such a wide range of times since the price tends to

fluctuate much and wanted to see the big trend of the price change.

Dataset: Data comes from U.S. Energy Information Administration

https://datahub.io/core/natural-gas

df = read.csv("/content/monthly.csv")

Data Analysis

summary(df)

Month Price

Length:284 Min. : 1.630

Class :character 1st Qu.: 2.660

Mode :character Median : 3.560

Mean : 4.208

3rd Qu.: 5.327

Max. :13.420

nonseasonal_data = ts(nonseasonal_data,frequency=12,start=c(1997,1))

plot(nonseasonal_data)

17

From the above plot, we can see that there is no trend in data, we can see that there is a significant pump

in 2005 but there's no seasonality.

acf(nonseasonal_data)

18

The above ACF is “decaying”, or decreasing, very slowly, and remains well above the significance

range (dotted blue lines). This is indicative of a non-stationary series.

pacf(nonseasonal_data)

adf.test(nonseasonal_data)

Augmented Dickey-Fuller Test

data: nonseasonal_data

Dickey-Fuller = -2.9978, Lag order = 6, p-value = 0.1557

alternative hypothesis: stationary

When we looked at the gas price graph and ACF, it did not look stationary so we ran the ADF test and

the p-value was 0.1557, indicating that it is not a stationary process. So, I took the difference in gas

prices. When we ran the ADF test of difference in gas prices, the p-value was less than 0.05, indicating a

stationary process with a possible linear trend. However, from the plot, it was clear that there is no linear

trend.

nonseasonal_diff = diff(nonseasonal_data)

plot(nonseasonal_diff)

19

acf(as.vector(nonseasonal_diff))

The plot of ACF after 1 difference suggests that there is very less autocorrelation. There is 1 lag that is

significantly outside the confidence interval i.e. we can consider MA(1).

pacf(as.vector(nonseasonal_diff))

20

The plot of PACF after 1 difference suggests that there is very less autocorrelation. There is 1 lag

significantly outside the confidence interval i.e. we can consider AR(1) (or AR(2) there's lag #5 outside

CI).

adf.test(nonseasonal_diff)

“p-value smaller than printed p-value”

Augmented Dickey-Fuller Test

data: nonseasonal_diff

Dickey-Fuller = -6.9687, Lag order = 6, p-value = 0.01

alternative hypothesis: stationary

ADF test gives p-value less than 0.05 so we reject the null hypothesis. Data is now stationary. Since

stationarity has been achieved at the first difference, the ARIMA model of order (p,1,q) will be used

where 'p' is the order of the autoregressive term and 'q' is the order of the moving average term.

eacf(nonseasonal_diff)

AR/MA

21

0 1 2 3 4 5 6 7 8 9 10 11 12 13

0 o o o o x o o o x o o o o o

1 x o o o x o o o x o o o o o

2 x o o o o o o o x o o o o o

3 x x o o o o o o x o o o o o

4 x o x o o o o o x o o o x o

5 x x x x x o o o x o o o x o

6 x x o x o x o o x o o o x o

7 x x o x o x x o o o o o o o

Models of Time Series

Because the series is non-seasonal, the ARIMA model will be used.

From ACF and PACF plot we chose below models to fit to my data:

Model 1: ARIMA(1,0,1)

Model 2: ARIMA(2,0,1)

ns_model1 = arima(nonseasonal_diff, order=c(1,0,1))

print(ns_model1)

Call:

arima(x = nonseasonal_diff, order = c(1, 0, 1))

Coefficients:

ar1 ma1 intercept

-0.9769 0.9437 -0.0057

s.e. 0.0268 0.0429 0.0440

sigma^2 estimated as 0.5478: log likelihood = -305.31, aic = 616.63

bic = AIC(ns_model1, k = log(length(nonseasonal_diff)))

633.066517367315

ns_model2 = arima(nonseasonal_diff, order=c(2,0,1))

print(ns_model2)

22

Call:

arima(x = nonseasonal_diff, order = c(2, 0, 1))

Coefficients:

ar1 ar2 ma1 intercept

-0.9463 0.0296 0.9407 -0.0041

s.e. 0.0756 0.0638 0.0463 0.0453

sigma^2 estimated as 0.5473: log likelihood = -305.2, aic = 618.4

bic = AIC(ns_model2, k = log(length(nonseasonal_diff)))

638.451797725991

Comparing the above models based on AIC and BIC values:

Model 1 has the least AIC and BIC values.

Residual Analysis:

We plot sample ACF and PACF to check if there's any correlation in residuals and normality of the

residuals is checked using histogram and qq plots

nonseasonal_residuals_model1 = ns_model1$residuals

nonseasonal_residuals_model2 = ns_model2$residuals

For model 1:

plot(nonseasonal_residuals_model1, type='o')

23

There is no visible trend in the plot of residuals.

acf(nonseasonal_residuals_model1)

pacf(as.vector(nonseasonal_residuals_model1))

24

The above ACF and PACF plots of residuals suggest that there is no significant correlation among the

residuals. Now, for checking the normality of the residuals let’s plot qq plots and histogram.

qqnorm(nonseasonal_residuals_model1)

qqline(nonseasonal_residuals_model1)

Here, extreme values look suspect as it seems they do not follow normality.

hist(nonseasonal_residuals_model1)

25

print(shapiro.test(nonseasonal_residuals_model1))

Shapiro-Wilk normality test

data: nonseasonal_residuals_model1

W = 0.86943, p-value = 1.783e-14

For model 2:

Let’s follow the same procedure to analyze the residuals of model 2.

plot(nonseasonal_residuals_model2, type='o')

26

acf(as.vector(nonseasonal_residuals_model2), lag.max=50)

pacf(as.vector(nonseasonal_residuals_model2))

27

qqnorm(nonseasonal_residuals_model2)

qqline(nonseasonal_residuals_model2)

hist(nonseasonal_residuals_model2)

28

print(shapiro.test(nonseasonal_residuals_model2))

Shapiro-Wilk normality test

data: nonseasonal_residuals_model2

W = 0.86694, p-value = 1.267e-14

We plot the sample ACF of the residuals. We can see that there is dependency in residuals. Histograms

show that it follows approximately normal distribution. Looking at the residual analysis of both the

models, model 1 seems a good fit. So, we will consider model 1 to further forecast the series.

Forecasting

model1<-arima(ts(nonseasonal_data,frequency=12,start=c(1997,1)),

order=c(1,1,1))

model1$x<-ts(nonseasonal_data, frequency=12,start=c(1997))

I have used a time series model to forecast the next 24 values of the series with 95% confidence interval.

f=forecast(model1,level=c(95),h=24)

Point Forecast Lo 95 Hi 95

Nov 2019 2.386313 0.9357244 3.836901

Dec 2019 2.331194 0.3121041 4.350283

Jan 2020 2.385144 -0.1004326 4.870721

29

Feb 2020 2.332337 -0.5230609 5.187735

Mar 2020 2.384025 -0.8178421 5.585892

Apr 2020 2.333433 -1.1636714 5.830536

May 2020 2.382953 -1.4019901 6.167896

Jun 2020 2.334482 -1.7035925 6.372557

Jul 2020 2.381926 -1.9075517 6.671403

Aug 2020 2.335488 -2.1791815 6.850157

Sep 2020 2.380941 -2.3596730 7.121556

Oct 2020 2.336451 -2.6090840 7.281986

Nov 2020 2.379998 -2.7724012 7.532398

Dec 2020 2.337374 -3.0043759 7.679124

Jan 2021 2.379095 -3.1545306 7.912720

Feb 2021 2.338258 -3.3722731 8.048790

Mar 2021 2.378229 -3.5119994 8.268458

Apr 2021 2.339105 -3.7177876 8.395998

May 2021 2.377400 -3.8490416 8.603842

Jun 2021 2.339917 -4.0445682 8.724402

Jul 2021 2.376606 -4.1688016 8.922013

Aug 2021 2.340695 -4.3553691 9.036758

Sep 2021 2.375845 -4.4736910 9.225380

Oct 2021 2.341440 -4.6523299 9.335209

plot(f)

30

In the above figure, we see the 24-month forecast results from November 2019 to October 2021 on the

line graph in blue. If we look at the forecast values and visualization, we can see that the result is not

satisfactory because of the conditional variance present in data. So, we prefer fitting a suitable GARCH

model to the data to achieve a satisfactory forecast.

GARCH

GARCH stands for Generalized Autoregressive Conditional Heteroskedasticity Models.

ARCH-GARCH is a method to measure volatility of the series, or more specifically, to model the noise

term of the ARIMA model. ARCHGARCH incorporates new data and analyzes the series using

conditional variances, allowing users to estimate future values using current data. We use the function

ugarchspec() for the model specification and ugarchfit() for the model fitting. I have considered the

GARCH(1,1) model.

spec <- ugarchspec(variance.model = list(model = "sGARCH",

garchOrder = c(1,1),

submodel = NULL,

external.regressors = NULL,

variance.targeting = FALSE),

31

mean.model = list(armaOrder = c(1,1),

external.regressors = NULL,

distribution.model = "norm"))

garch2 <- ugarchfit(spec = spec, data = temp_data_garch, solver.control =

list(trace=0))

*---------------------------------*

* GARCH Model Fit *

*---------------------------------*

Conditional Variance Dynamics

-----------------------------------

GARCH Model : sGARCH(1,1)

Mean Model : ARFIMA(1,0,1)

Distribution : norm

Optimal Parameters

------------------------------------

Estimate Std. Error t value Pr(>|t|)

mu 3.185522 0.681580 4.6737 0.000003

ar1 0.940684 0.027055 34.7688 0.000000

ma1 0.104821 0.087345 1.2001 0.230105

omega 0.064248 0.033063 1.9432 0.051990

alpha1 0.526372 0.155477 3.3855 0.000710

beta1 0.433065 0.160756 2.6939 0.007062

Robust Standard Errors:

Estimate Std. Error t value Pr(>|t|)

mu 3.185522 1.630688 1.95348 0.050762

ar1 0.940684 0.056062 16.77929 0.000000

ma1 0.104821 0.118274 0.88626 0.375480

omega 0.064248 0.089558 0.71739 0.473134

alpha1 0.526372 0.425682 1.23654 0.216259

beta1 0.433065 0.453003 0.95599 0.339079

LogLikelihood : -241.9342

32

Information Criteria

------------------------------------

Akaike 1.7460

Bayes 1.8231

Shibata 1.7451

Hannan-Quinn 1.7769

Weighted Ljung-Box Test on Standardized Residuals

------------------------------------

statistic p-value

Lag[1] 0.05491 0.8147

Lag[2*(p+q)+(p+q)-1][5] 2.10143 0.9360

Lag[4*(p+q)+(p+q)-1][9] 4.88106 0.4809

d.o.f=2

H0 : No serial correlation

Weighted Ljung-Box Test on Standardized Squared Residuals

------------------------------------

statistic p-value

Lag[1] 0.1689 0.6811

Lag[2*(p+q)+(p+q)-1][5] 0.8137 0.9002

Lag[4*(p+q)+(p+q)-1][9] 1.6380 0.9435

d.o.f=2

Weighted ARCH LM Tests

------------------------------------

Statistic Shape Scale P-Value

ARCH Lag[3] 0.2015 0.500 2.000 0.6535

ARCH Lag[5] 1.2899 1.440 1.667 0.6493

ARCH Lag[7] 1.6973 2.315 1.543 0.7810

Nyblom stability test

------------------------------------

Joint Statistic: 0.8687

Individual Statistics:

mu 0.1152

ar1 0.1061

33

ma1 0.2102

omega 0.4438

alpha1 0.2238

beta1 0.4395

Asymptotic Critical Values (10% 5% 1%)

Joint Statistic: 1.49 1.68 2.12

Individual Statistic: 0.35 0.47 0.75

Sign Bias Test

------------------------------------

t-value prob sig

Sign Bias 0.04582 0.9635

Negative Sign Bias 0.92013 0.3583

Positive Sign Bias 0.58065 0.5619

Joint Effect 1.91979 0.5892

Adjusted Pearson Goodness-of-Fit Test:

------------------------------------

group statistic p-value(g-1)

1 20 58.39 6.925e-06

2 30 71.56 1.847e-05

3 40 85.86 2.250e-05

4 50 94.87 9.304e-05

The output above displays the calculated parameter's significance. It displays the Akaike (AIC), Bayes

(BIC), Hannan-Quinn and Shibata criteria for the model estimation. The lower these values, the better

the model is in terms of fitting.

plot(garch2, which='all')

34

Here are the outcomes of the additional graphs demonstrating the model's functionality. We can observe

that the QQ-plot depicts a distribution that is more in line with the straight line.

forecast1 = ugarchforecast(fitORspec = garch2, n.ahead = 20)

fitted(forecast1)

35

plot(fitted(forecast1),type='l')

36

The plot above displays the values forecasted by garch fit.

plot(sigma(forecast1),type='l')

The above plot displays standard deviation of the forecasted values.

series<- c(nonseasonal_data,rep(NA,length(fitted(forecast1))))

forecastseries<- c(rep(NA,length(nonseasonal_data)),fitted(forecast1))

plot(series, type = "l")

lines(forecastseries, col = "green")

37

series<- c(tail(nonseasonal_data,100),rep(NA,length(fitted(forecast1))))

forecastseries<- c(rep(NA,100),fitted(forecast1))

plot(series, type = "l")

lines(forecastseries, col = "green")

In the above figure, we see the forecast results on the line graph in green. If we look at the forecast

values and visualization, it can be called a satisfactory result. So, the model

ARIMA(1,1,1)-GARCH(1,1) is a good fit to the data and we are getting adequate forecast results.

Conclusion

In this project, there was the forecasting of seasonal data of Perrin Freres monthly champagne sales

using time series forecasting. In this, the forecasting model was built using the SARIMA model.

Forecasting is essential in many sectors, especially in business sectors, because if a person knows about

facing profit or loss in the coming years, it can be easily handled to meet the problem efficiently. The

models used in this project were giving accurate results and projected the sales of champagne after ten

years. Also, the project used time series models to analyze the pattern of Henry Hub natural spot price.

We examined the GARCH model. We discovered that natural gas prices fluctuate dramatically. The

GARCH model gave us better performance as compared to ARIMA mode.

References

1. Cryer, J. D., & Chan, K. S. (2008). Time series analysis: with applications in R (Vol. 2). New

York: Springer

38

2. Katesari, H. S., & Vajargah, B. F. (2015). Testing adverse selection using frank copula approach

in Iran insurance markets. Mathematics and Computer Science,15(2), 154-158

3. Katesari, H. S., & Zarodi, S. (2016). Effects of coverage choice by predictive modeling on

frequency of accidents. Caspian Journal of Applied Sciences Research,5(3), 28-33

4. Safari-Katesari, H., Samadi, S. Y., & Zaroudi, S. (2020). Modelling count data via copulas.

Statistics,54(6), 1329-1355

5. Shumway, R. H., Stoffer, D. S., & Stoffer, D. S. (2000). Time series analysis and its applications

(Vol. 3). New York: springer

6. Safari-Katesari, H., & Zaroudi, S. (2020). Count copula regression model using generalized beta

distribution of the second kind. Statistics,21, 1-12

7. Safari-Katesari, H., & Zaroudi, S. (2021). Analysing the impact of dependency on conditional

survival functions using copulas. Statistics in Transition New Series,22(1)

8. Safari Katesari, H., (2021) Bayesian dynamic factor analysis and copula-based models for mixed

data, PhD dissertation, Southern Illinois University Carbondale

9. Tsay, R. S. (2013). Multivariate time series analysis: with R and financial applications. John

Wiley & Sons

10. Zaroudi, S., Faridrohani, M. R., Behzadi, M. H., & Safari-Katesari, H. (2022). Copula-based

Modeling for IBNR Claim Loss Reserving. arXiv preprint arXiv:2203.12750

Links

1. https://www.kaggle.com/datasets/galibce003/perrin-freres-monthly-champagne-sales

2. https://rpubs.com/kkim22/668273

3. https://www.idrisstsafack.com/post/garch-models-with-r-programming-a-practical-example-with-t

esla-stock

4. https://quant.stackexchange.com/questions/4948/how-to-fit-armagarch-model-in-r

5. https://talksonmarkets.files.wordpress.com/2012/09/time-series-analysis-with-arima-e28093-arc

h013.pdf

39